Search | VHL Regional Portal

1.

Target and tissue selectivity of PROTAC degraders.

Guenette, Robert G; Yang, Seung Wook; Min, Jaeki; Pei, Baikang; Potts, Patrick Ryan.

Chem Soc Rev ; 51(14): 5740-5756, 2022 Jul 18.

Article in English | MEDLINE | ID: mdl-35587208

ABSTRACT

Targeted protein degradation (TPD) strategies have revolutionized how scientists tackle challenging protein targets deemed undruggable with traditional small molecule inhibitors. Many promising campaigns to inhibit proteins have failed due to factors surrounding inhibition selectivity and targeting of compounds to specific tissues and cell types. One of the major improvements that PROTAC (proteolysis targeting chimera) and molecular glue technology can exert is highly selective control of target inhibition. Multiple studies have shown that PROTACs can gain selectivity for their protein targets beyond that of their parent ligands via optimization of linker length and stabilization of ternary complexes. Due to the bifunctional nature of PROTACs, the tissue selective nature of E3 ligases can be exploited to uncover novel targeting mechanisms. In this review, we provide critical analysis of the recent progress towards making selective PROTAC molecules and new PROTAC technologies that will continue to push the boundaries of achieving selectivity. These efforts have wide implications in the future of treating disease as they will broaden the possible targets that can be addressed by small molecules, like undruggable proteins or broadly active targets that would benefit from degradation in specific tissue types.

Subject(s)

Proteolysis , Ubiquitin-Protein Ligases , Ligands , Ubiquitin-Protein Ligases/metabolism

2.

GENCODE 2021.

Frankish, Adam; Diekhans, Mark; Jungreis, Irwin; Lagarde, Julien; Loveland, Jane E; Mudge, Jonathan M; Sisu, Cristina; Wright, James C; Armstrong, Joel; Barnes, If; Berry, Andrew; Bignell, Alexandra; Boix, Carles; Carbonell Sala, Silvia; Cunningham, Fiona; Di Domenico, Tomás; Donaldson, Sarah; Fiddes, Ian T; García Girón, Carlos; Gonzalez, Jose Manuel; Grego, Tiago; Hardy, Matthew; Hourlier, Thibaut; Howe, Kevin L; Hunt, Toby; Izuogu, Osagie G; Johnson, Rory; Martin, Fergal J; Martínez, Laura; Mohanan, Shamika; Muir, Paul; Navarro, Fabio C P; Parker, Anne; Pei, Baikang; Pozo, Fernando; Riera, Ferriol Calvet; Ruffier, Magali; Schmitt, Bianca M; Stapleton, Eloise; Suner, Marie-Marthe; Sycheva, Irina; Uszczynska-Ratajczak, Barbara; Wolf, Maxim Y; Xu, Jinuri; Yang, Yucheng T; Yates, Andrew; Zerbino, Daniel; Zhang, Yan; Choudhary, Jyoti S; Gerstein, Mark.

Nucleic Acids Res ; 49(D1): D916-D923, 2021 01 08.

Article in English | MEDLINE | ID: mdl-33270111

ABSTRACT

The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.

Subject(s)

COVID-19/prevention & control , Computational Biology/methods , Databases, Genetic , Genomics/methods , Molecular Sequence Annotation/methods , SARS-CoV-2/genetics , Animals , COVID-19/epidemiology , COVID-19/virology , Epidemics , Humans , Internet , Mice , Pseudogenes/genetics , RNA, Long Noncoding/genetics , SARS-CoV-2/metabolism , SARS-CoV-2/physiology , Transcription, Genetic/genetics

3.

IConMHC: a deep learning convolutional neural network model to predict peptide and MHC-I binding affinity.

Pei, Baikang; Hsu, Yi-Hsiang.

Immunogenetics ; 72(5): 295-304, 2020 07.

Article in English | MEDLINE | ID: mdl-32577798

ABSTRACT

Tumor-specific neoantigens are mutated self-peptides presented by tumor cell major histocompatibility complex (MHC) molecules and are necessary to elicit host's anti-cancer cytotoxic T cell responses. It could be specifically recognized by neoantigen-specific T cell receptors (TCRs). However, current wet-lab assays for identifying peptide MHC binding are too expensive and time-consuming to meet the clinical needs. In this study, we developed an in silico method with a deep convolutional neural network (CNN) model, iConMHC, to predict peptide MHC binding affinity. Unlike other in silico methods that only learn from properties of amino acid in neoantigen peptides alone and/or MHCs alone, iConMHC learns from physical and chemical interaction properties between pairwise amino acids from the two molecules. These properties, such as contact potentials and distances in folded proteins, directly affect neoantigen-MHC binding affinity. In addition, IConMHC is a pan-allele model that is capable of making predictions for all the MHC alleles. Even for those rare MHC alleles without training data, iConMHC can make predictions with reasonable accuracy. We benchmarked iConMHC with other commonly used MHC-I binding predictors and found our model performs better than most of the pan-allele models.

Subject(s)

Deep Learning , Histocompatibility Antigens Class I/metabolism , Peptides/metabolism , Alleles , Amino Acid Sequence , Antigens, Neoplasm/chemistry , Antigens, Neoplasm/metabolism , Computer Simulation , Databases, Protein , Histocompatibility Antigens Class I/chemistry , Histocompatibility Antigens Class I/genetics , Humans , Neural Networks, Computer , Peptides/chemistry , Protein Binding , Reproducibility of Results

4.

GENCODE reference annotation for the human and mouse genomes.

Frankish, Adam; Diekhans, Mark; Ferreira, Anne-Maud; Johnson, Rory; Jungreis, Irwin; Loveland, Jane; Mudge, Jonathan M; Sisu, Cristina; Wright, James; Armstrong, Joel; Barnes, If; Berry, Andrew; Bignell, Alexandra; Carbonell Sala, Silvia; Chrast, Jacqueline; Cunningham, Fiona; Di Domenico, Tomás; Donaldson, Sarah; Fiddes, Ian T; García Girón, Carlos; Gonzalez, Jose Manuel; Grego, Tiago; Hardy, Matthew; Hourlier, Thibaut; Hunt, Toby; Izuogu, Osagie G; Lagarde, Julien; Martin, Fergal J; Martínez, Laura; Mohanan, Shamika; Muir, Paul; Navarro, Fabio C P; Parker, Anne; Pei, Baikang; Pozo, Fernando; Ruffier, Magali; Schmitt, Bianca M; Stapleton, Eloise; Suner, Marie-Marthe; Sycheva, Irina; Uszczynska-Ratajczak, Barbara; Xu, Jinuri; Yates, Andrew; Zerbino, Daniel; Zhang, Yan; Aken, Bronwen; Choudhary, Jyoti S; Gerstein, Mark; Guigó, Roderic; Hubbard, Tim J P.

Nucleic Acids Res ; 47(D1): D766-D773, 2019 01 08.

Article in English | MEDLINE | ID: mdl-30357393

ABSTRACT

The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as https://www.gencodegenes.org.

Subject(s)

Databases, Genetic , Genome, Human/genetics , Genomics , Pseudogenes/genetics , Animals , Computational Biology , Humans , Internet , Mice , Molecular Sequence Annotation , Software

5.

Enhanced transcriptome maps from multiple mouse tissues reveal evolutionary constraint in gene expression.

Pervouchine, Dmitri D; Djebali, Sarah; Breschi, Alessandra; Davis, Carrie A; Barja, Pablo Prieto; Dobin, Alex; Tanzer, Andrea; Lagarde, Julien; Zaleski, Chris; See, Lei-Hoon; Fastuca, Meagan; Drenkow, Jorg; Wang, Huaien; Bussotti, Giovanni; Pei, Baikang; Balasubramanian, Suganthi; Monlong, Jean; Harmanci, Arif; Gerstein, Mark; Beer, Michael A; Notredame, Cedric; Guigó, Roderic; Gingeras, Thomas R.

Nat Commun ; 6: 5903, 2015 Jan 13.

Article in English | MEDLINE | ID: mdl-25582907

ABSTRACT

Mice have been a long-standing model for human biology and disease. Here we characterize, by RNA sequencing, the transcriptional profiles of a large and heterogeneous collection of mouse tissues, augmenting the mouse transcriptome with thousands of novel transcript candidates. Comparison with transcriptome profiles in human cell lines reveals substantial conservation of transcriptional programmes, and uncovers a distinct class of genes with levels of expression that have been constrained early in vertebrate evolution. This core set of genes captures a substantial fraction of the transcriptional output of mammalian cells, and participates in basic functional and structural housekeeping processes common to all cell types. Perturbation of these constrained genes is associated with significant phenotypes including embryonic lethality and cancer. Evolutionary constraint in gene expression levels is not reflected in the conservation of the genomic sequences, but is associated with conserved epigenetic marking, as well as with characteristic post-transcriptional regulatory programme, in which sub-cellular localization and alternative splicing play comparatively large roles.

Subject(s)

Evolution, Molecular , Gene Expression Regulation , Transcriptome , Alternative Splicing , Animals , Biological Evolution , Cell Line , Epigenesis, Genetic , Gene Expression Profiling , Gene Library , Genome , Histones/chemistry , Humans , Mice , Mice, Inbred C57BL , Models, Genetic , Oligonucleotides, Antisense , Phenotype , Sequence Analysis, RNA

6.

Comparative analysis of pseudogenes across three phyla.

Sisu, Cristina; Pei, Baikang; Leng, Jing; Frankish, Adam; Zhang, Yan; Balasubramanian, Suganthi; Harte, Rachel; Wang, Daifeng; Rutenberg-Schoenberg, Michael; Clark, Wyatt; Diekhans, Mark; Rozowsky, Joel; Hubbard, Tim; Harrow, Jennifer; Gerstein, Mark B.

Proc Natl Acad Sci U S A ; 111(37): 13361-6, 2014 Sep 16.

Article in English | MEDLINE | ID: mdl-25157146

ABSTRACT

Pseudogenes are degraded fossil copies of genes. Here, we report a comparison of pseudogenes spanning three phyla, leveraging the completed annotations of the human, worm, and fly genomes, which we make available as an online resource. We find that pseudogenes are lineage specific, much more so than protein-coding genes, reflecting the different remodeling processes marking each organism's genome evolution. The majority of human pseudogenes are processed, resulting from a retrotranspositional burst at the dawn of the primate lineage. This burst can be seen in the largely uniform distribution of pseudogenes across the genome, their preservation in areas with low recombination rates, and their preponderance in highly expressed gene families. In contrast, worm and fly pseudogenes tell a story of numerous duplication events. In worm, these duplications have been preserved through selective sweeps, so we see a large number of pseudogenes associated with highly duplicated families such as chemoreceptors. However, in fly, the large effective population size and high deletion rate resulted in a depletion of the pseudogene complement. Despite large variations between these species, we also find notable similarities. Overall, we identify a broad spectrum of biochemical activity for pseudogenes, with the majority in each organism exhibiting varying degrees of partial activity. In particular, we identify a consistent amount of transcription (â¼15%) across all species, suggesting a uniform degradation process. Also, we see a uniform decay of pseudogene promoter activity relative to their coding counterparts and identify a number of pseudogenes with conserved upstream sequences and activity, hinting at potential regulatory roles.

Subject(s)

Caenorhabditis elegans/genetics , Drosophila melanogaster/genetics , Phylogeny , Pseudogenes/genetics , Animals , Evolution, Molecular , Genetic Association Studies , Humans , Molecular Sequence Annotation , Promoter Regions, Genetic/genetics , Sequence Homology, Nucleic Acid

7.

Comparative analysis of the transcriptome across distant species.

Gerstein, Mark B; Rozowsky, Joel; Yan, Koon-Kiu; Wang, Daifeng; Cheng, Chao; Brown, James B; Davis, Carrie A; Hillier, LaDeana; Sisu, Cristina; Li, Jingyi Jessica; Pei, Baikang; Harmanci, Arif O; Duff, Michael O; Djebali, Sarah; Alexander, Roger P; Alver, Burak H; Auerbach, Raymond; Bell, Kimberly; Bickel, Peter J; Boeck, Max E; Boley, Nathan P; Booth, Benjamin W; Cherbas, Lucy; Cherbas, Peter; Di, Chao; Dobin, Alex; Drenkow, Jorg; Ewing, Brent; Fang, Gang; Fastuca, Megan; Feingold, Elise A; Frankish, Adam; Gao, Guanjun; Good, Peter J; Guigó, Roderic; Hammonds, Ann; Harrow, Jen; Hoskins, Roger A; Howald, Cédric; Hu, Long; Huang, Haiyan; Hubbard, Tim J P; Huynh, Chau; Jha, Sonali; Kasper, Dionna; Kato, Masaomi; Kaufman, Thomas C; Kitchen, Robert R; Ladewig, Erik; Lagarde, Julien.

Nature ; 512(7515): 445-8, 2014 Aug 28.

Article in English | MEDLINE | ID: mdl-25164755

ABSTRACT

The transcriptome is the readout of the genome. Identifying common features in it across distant species can reveal fundamental principles. To this end, the ENCODE and modENCODE consortia have generated large amounts of matched RNA-sequencing data for human, worm and fly. Uniform processing and comprehensive annotation of these data allow comparison across metazoan phyla, extending beyond earlier within-phylum transcriptome comparisons and revealing ancient, conserved features. Specifically, we discover co-expression modules shared across animals, many of which are enriched in developmental genes. Moreover, we use expression patterns to align the stages in worm and fly development and find a novel pairing between worm embryo and fly pupae, in addition to the embryo-to-embryo and larvae-to-larvae pairings. Furthermore, we find that the extent of non-canonical, non-coding transcription is similar in each organism, per base pair. Finally, we find in all three organisms that the gene-expression levels, both coding and non-coding, can be quantitatively predicted from chromatin features at the promoter using a 'universal model' based on a single set of organism-independent parameters.

Subject(s)

Caenorhabditis elegans/genetics , Drosophila melanogaster/genetics , Gene Expression Profiling , Transcriptome/genetics , Animals , Caenorhabditis elegans/embryology , Caenorhabditis elegans/growth & development , Chromatin/genetics , Cluster Analysis , Drosophila melanogaster/growth & development , Gene Expression Regulation, Developmental/genetics , Histones/metabolism , Humans , Larva/genetics , Larva/growth & development , Models, Genetic , Molecular Sequence Annotation , Promoter Regions, Genetic/genetics , Pupa/genetics , Pupa/growth & development , RNA, Untranslated/genetics , Sequence Analysis, RNA

8.

Analysis of variable retroduplications in human populations suggests coupling of retrotransposition to cell division.

Abyzov, Alexej; Iskow, Rebecca; Gokcumen, Omer; Radke, David W; Balasubramanian, Suganthi; Pei, Baikang; Habegger, Lukas; Lee, Charles; Gerstein, Mark.

Genome Res ; 23(12): 2042-52, 2013 Dec.

Article in English | MEDLINE | ID: mdl-24026178

ABSTRACT

In primates and other animals, reverse transcription of mRNA followed by genomic integration creates retroduplications. Expressed retroduplications are either "retrogenes" coding for functioning proteins, or expressed "processed pseudogenes," which can function as noncoding RNAs. To date, little is known about the variation in retroduplications in terms of their presence or absence across individuals in the human population. We have developed new methodologies that allow us to identify "novel" retroduplications (i.e., those not present in the reference genome), to find their insertion points, and to genotype them. Using these methods, we catalogued and analyzed 174 retroduplication variants in almost one thousand humans, which were sequenced as part of Phase 1 of The 1000 Genomes Project Consortium. The accuracy of our data set was corroborated by (1) multiple lines of sequencing evidence for retroduplication (e.g., depth of coverage in exons vs. introns), (2) experimental validation, and (3) the fact that we can reconstruct a correct phylogenetic tree of human subpopulations based solely on retroduplications. We also show that parent genes of retroduplication variants tend to be expressed at the M-to-G1 transition in the cell cycle and that M-to-G1 expressed genes have more copies of fixed retroduplications than genes expressed at other times. These findings suggest that cell division is coupled to retrotransposition and, perhaps, is even a requirement for it.

Subject(s)

Cell Division/genetics , Gene Duplication , Retroelements/genetics , Computational Biology/methods , Evolution, Molecular , Genome, Human , Genotype , Humans , Phylogeny , Pseudogenes , Reproducibility of Results , Sequence Analysis, DNA

9.

Reconstruction of biological networks by incorporating prior knowledge into Bayesian network models.

Pei, Baikang; Shin, Dong-Guk.

J Comput Biol ; 19(12): 1324-34, 2012 Dec.

Article in English | MEDLINE | ID: mdl-23210479

ABSTRACT

Bayesian network model is widely used for reverse engineering of biological network structures. An advantage of this model is its capability to integrate prior knowledge into the model learning process, which can lead to improving the quality of the network reconstruction outcome. Some previous works have explored this area with focus on using prior knowledge of the direct molecular links, except for a few recent ones proposing to examine the effects of molecular orderings. In this study, we propose a Bayesian network model that can integrate both direct links and orderings into the model. Random weights are assigned to these two types of prior knowledge to alleviate bias toward certain types of information. We evaluate our model performance using both synthetic data and biological data for the RAF signaling network, and illustrate the significant improvement on network structure reconstruction of the proposing models over the existing methods. We also examine the correlation between the improvement and the abundance of ordering prior knowledge. To address the issue of generating prior knowledge, we propose an approach to automatically extract potential molecular orderings from knowledge resources such as Kyoto Encyclopedia of Genes and Genomes (KEGG) database and Gene Ontology (GO) annotation.

Subject(s)

Bayes Theorem , Computational Biology/methods , Gene Regulatory Networks , Genome , Models, Biological , Signal Transduction , Databases, Genetic , MAP Kinase Signaling System , raf Kinases/metabolism

10.

The GENCODE pseudogene resource.

Pei, Baikang; Sisu, Cristina; Frankish, Adam; Howald, Cédric; Habegger, Lukas; Mu, Xinmeng Jasmine; Harte, Rachel; Balasubramanian, Suganthi; Tanzer, Andrea; Diekhans, Mark; Reymond, Alexandre; Hubbard, Tim J; Harrow, Jennifer; Gerstein, Mark B.

Genome Biol ; 13(9): R51, 2012 Sep 26.

Article in English | MEDLINE | ID: mdl-22951037

ABSTRACT

BACKGROUND: Pseudogenes have long been considered as nonfunctional genomic sequences. However, recent evidence suggests that many of them might have some form of biological activity, and the possibility of functionality has increased interest in their accurate annotation and integration with functional genomics data. RESULTS: As part of the GENCODE annotation of the human genome, we present the first genome-wide pseudogene assignment for protein-coding genes, based on both large-scale manual annotation and in silico pipelines. A key aspect of this coupled approach is that it allows us to identify pseudogenes in an unbiased fashion as well as untangle complex events through manual evaluation. We integrate the pseudogene annotations with the extensive ENCODE functional genomics information. In particular, we determine the expression level, transcription-factor and RNA polymerase II binding, and chromatin marks associated with each pseudogene. Based on their distribution, we develop simple statistical models for each type of activity, which we validate with large-scale RT-PCR-Seq experiments. Finally, we compare our pseudogenes with conservation and variation data from primate alignments and the 1000 Genomes project, producing lists of pseudogenes potentially under selection. CONCLUSIONS: At one extreme, some pseudogenes possess conventional characteristics of functionality; these may represent genes that have recently died. On the other hand, we find interesting patterns of partial activity, which may suggest that dead genes are being resurrected as functioning non-coding RNAs. The activity data of each pseudogene are stored in an associated resource, psiDR, which will be useful for the initial identification of potentially functional pseudogenes.

Subject(s)

Genome, Human , Pseudogenes , Transcription, Genetic , Animals , Binding Sites , Chromatin/chemistry , Chromatin/genetics , Humans , Models, Genetic , Models, Statistical , Molecular Sequence Annotation , Phylogeny , Primates , RNA Polymerase II/metabolism , Regulatory Sequences, Nucleic Acid , Selection, Genetic , Sequence Analysis, DNA , Transcription Factors/metabolism

11.

GENCODE: the reference human genome annotation for The ENCODE Project.

Harrow, Jennifer; Frankish, Adam; Gonzalez, Jose M; Tapanari, Electra; Diekhans, Mark; Kokocinski, Felix; Aken, Bronwen L; Barrell, Daniel; Zadissa, Amonida; Searle, Stephen; Barnes, If; Bignell, Alexandra; Boychenko, Veronika; Hunt, Toby; Kay, Mike; Mukherjee, Gaurab; Rajan, Jeena; Despacio-Reyes, Gloria; Saunders, Gary; Steward, Charles; Harte, Rachel; Lin, Michael; Howald, Cédric; Tanzer, Andrea; Derrien, Thomas; Chrast, Jacqueline; Walters, Nathalie; Balasubramanian, Suganthi; Pei, Baikang; Tress, Michael; Rodriguez, Jose Manuel; Ezkurdia, Iakes; van Baren, Jeltje; Brent, Michael; Haussler, David; Kellis, Manolis; Valencia, Alfonso; Reymond, Alexandre; Gerstein, Mark; Guigó, Roderic; Hubbard, Tim J.

Genome Res ; 22(9): 1760-74, 2012 Sep.

Article in English | MEDLINE | ID: mdl-22955987

ABSTRACT

The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.

Subject(s)

Databases, Genetic , Genome, Human , Genomics/methods , Molecular Sequence Annotation , Animals , Computational Biology/methods , DNA, Complementary/chemistry , DNA, Complementary/genetics , Evolution, Molecular , Exons , Genetic Loci , Humans , Internet , Models, Molecular , Open Reading Frames , Pseudogenes , Quality Control , RNA Splice Sites , RNA, Long Noncoding , Reproducibility of Results , Untranslated Regions

12.

A Bayesian Approach to Pathway Analysis by Integrating Gene-Gene Functional Directions and Microarray Data.

Zhao, Yifang; Chen, Ming-Hui; Pei, Baikang; Rowe, David; Shin, Dong-Guk; Xie, Wangang; Yu, Fang; Kuo, Lynn.

Stat Biosci ; 4(1): 105-131, 2012 May 01.

Article in English | MEDLINE | ID: mdl-23482678

ABSTRACT

Many statistical methods have been developed to screen for differentially expressed genes associated with specific phenotypes in the microarray data. However, it remains a major challenge to synthesize the observed expression patterns with abundant biological knowledge for more complete understanding of the biological functions among genes. Various methods including clustering analysis on genes, neural network, Bayesian network and pathway analysis have been developed toward this goal. In most of these procedures, the activation and inhibition relationships among genes have hardly been utilized in the modeling steps. We propose two novel Bayesian models to integrate the microarray data with the putative pathway structures obtained from the KEGG database and the directional gene-gene interactions in the medical literature. We define the symmetric Kullback-Leibler divergence of a pathway, and use it to identify the pathway(s) most supported by the microarray data. Monte Carlo Markov Chain sampling algorithm is given for posterior computation in the hierarchical model. The proposed method is shown to select the most supported pathway in an illustrative example. Finally, we apply the methodology to a real microarray data set to understand the gene expression profile of osteoblast lineage at defined stages of differentiation. We observe that our method correctly identifies the pathways that are reported to play essential roles in modulating bone mass.

13.

Learning Bayesian networks with integration of indirect prior knowledge.

Pei, Baikang; Rowe, David W; Shin, Dong-Guk.

Int J Data Min Bioinform ; 4(5): 505-19, 2010.

Article in English | MEDLINE | ID: mdl-21133038

ABSTRACT

A Bayesian network model can be used to study the structures of gene regulatory networks. It has the ability to integrate information from both prior knowledge and experimental data. In this study, we propose an approach to efficiently integrate global ordering information into model learning, where the ordering information specifies the indirect relationships among genes. We demonstrate that, compared with a traditional Bayesian network model that uses only local prior knowledge, utilising additional global ordering knowledge can significantly improve the model's performance. The magnitude of this improvement depends on abundance of global ordering information and data quality.

Subject(s)

Computational Biology/methods , Gene Regulatory Networks/genetics , Algorithms , Bayes Theorem , Databases, Factual

14.

Computing consistency between microarray data and known gene regulation relationships.

Shin, Dong-Guk; Kazmi, Saira A; Pei, Baikang; Kim, Yoo-Ah; Maddox, Jeffrey; Nori, Ravi; Wong, Alan; Krueger, Winfried; Rowe, David.

IEEE Trans Inf Technol Biomed ; 13(6): 1075-82, 2009 Nov.

Article in English | MEDLINE | ID: mdl-19783507

ABSTRACT

Microarray experiments produce expression patterns for thousands of genes at once. On the other hand, biomedical literature contains large amounts of gene regulation relationship information accumulated over the years. One obvious requirement is an automated way of comparing microarray data with the collection of known gene regulation relationships. Such an automated comparison is imperative because it can help biologists rapidly understand the context of a given microarray experiment. In addition, the consistency measure can be used to either validate or refute the hypothesis being tested using the microarray experiment. In this paper we present a systematic way of examining the consistency between a given set of microarray data and known gene regulation relationships. We first introduce a simple gene regulation network model with two separate algorithms designed to isolate a maximally consistent network. Subsequently, we extend the model to take into account multiple regulating factors for a single gene while highlighting both consistencies and inconsistencies. We illustrate the effectiveness of our approach with two practical examples, one that picks the peroxisome proliferator-activated receptor (PPAR) pathway as highly consistent from multiple pathways of Kyoto encyclopedia of genes and genomes (KEGG), and another that isolates key regulatory relationships involving nfkb1 and others known for macrophage's counter response to inflammation.

Subject(s)

Computational Biology/methods , Gene Regulatory Networks , Oligonucleotide Array Sequence Analysis/methods , Algorithms , Reproducibility of Results , Signal Transduction , User-Computer Interface

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL