Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 9 de 9
Filter
Add more filters










Database
Language
Publication year range
1.
FEBS Lett ; 590(10): 1428-37, 2016 05.
Article in English | MEDLINE | ID: mdl-27129600

ABSTRACT

Gene conservation, duplication and constitutive expression are intricately linked and strong predictors of essentiality. Here, we introduce metrics based on diversity indices to measure gene conservation, duplication and constitutive expression and validate them by measuring their performance in prediction of essential genes. Conservation and duplication were measured using the diversity indices on the bit score profile of Escherichia coli K12 orthologues, across the genomes, and paralogues, within the genome respectively. Constitutive expression was measured using expression diversity of E. coli K12 genes across different conditions. In addition, we developed a systematic method for enrichment analysis of gene-sets in a given ranked list of genes. The method was used to identify genome-wide functions of essential, conserved, constitutively expressed and duplicated genes. Furthermore, we also ranked various operons, complexes and pathways according to their essentiality, conservation, constitutive expression and duplication.


Subject(s)
Conserved Sequence , Escherichia coli Proteins/genetics , Escherichia coli/genetics , Gene Duplication , Genes, Essential , Computational Biology , Evolution, Molecular , Metabolic Networks and Pathways , Operon , ROC Curve
2.
Hum Genomics ; 9: 8, 2015 Jun 11.
Article in English | MEDLINE | ID: mdl-26063326

ABSTRACT

BACKGROUND: The current practice of using only a few strongly associated genetic markers in regression models results in generally low power in prediction or accounting for heritability of complex human traits. PURPOSE: We illustrate here a Bayesian joint estimation of single nucleotide polymorphism (SNP) effects principle to improve prediction of phenotype status from pathway-focused sets of SNPs. Chronic fatigue syndrome (CFS), a complex disease of unknown etiology with no laboratory methods for diagnosis, was chosen to demonstrate the power of this Bayesian method. For CFS, such a genetic predictive model in combination with clinical evidence might lead to an earlier diagnosis than one based solely on clinical findings. METHODS: One of our goals is to model disease status using Bayesian statistics which perform variable selection and parameter estimation simultaneously and which can induce the sparseness and smoothness of the SNP effects. Smoothness of the SNP effects is obtained by explicit modeling of the covariance structure of the SNP effects. RESULTS: The Bayesian model achieved perfect goodness of fit when tested within the sampled data. Tenfold cross-validation resulted in 80% accuracy, one of the best so far for CFS in comparison to previous prediction models. Model reduction aspects were investigated in a computationally feasible manner. Additionally, genetic variation estimates provided by the model identified specific genetic markers for their biological role in the disease pathophysiology. CONCLUSIONS: This proof-of-principle study provides a powerful approach combining Bayesian methods, SNPs representing multiple pathways and rigorous case ascertainment for accurate genetic risk prediction modeling of complex diseases like CFS and other chronic diseases.


Subject(s)
Biosynthetic Pathways/genetics , Fatigue Syndrome, Chronic/genetics , Genetic Markers , Models, Genetic , Adolescent , Adult , Aged , Bayes Theorem , Fatigue Syndrome, Chronic/pathology , Female , Genotype , Humans , Male , Middle Aged , Phenotype , Polymorphism, Single Nucleotide
3.
PLoS One ; 6(11): e26959, 2011.
Article in English | MEDLINE | ID: mdl-22087238

ABSTRACT

Both molecular marker and gene expression data were considered alone as well as jointly to serve as additive predictors for two pathogen-activity-phenotypes in real recombinant inbred lines of soybean. For unobserved phenotype prediction, we used a bayesian hierarchical regression modeling, where the number of possible predictors in the model was controlled by different selection strategies tested. Our initial findings were submitted for DREAM5 (the 5th Dialogue on Reverse Engineering Assessment and Methods challenge) and were judged to be the best in sub-challenge B3 wherein both functional genomic and genetic data were used to predict the phenotypes. In this work we further improve upon this previous work by considering various predictor selection strategies and cross-validation was used to measure accuracy of in-data and out-data predictions. The results from various model choices indicate that for this data use of both data types (namely functional genomic and genetic) simultaneously improves out-data prediction accuracy. Adequate goodness-of-fit can be easily achieved with more complex models for both phenotypes, since the number of potential predictors is large and the sample size is not small. We also further studied gene-set enrichment (for continuous phenotype) in the biological process in question and chromosomal enrichment of the gene set. The methodological contribution of this paper is in exploration of variable selection techniques to alleviate the problem of over-fitting. Different strategies based on the nature of covariates were explored and all methods were implemented under the bayesian hierarchical modeling framework with indicator-based covariate selection. All the models based in careful variable selection procedure were found to produce significant results based on permutation test.


Subject(s)
Bayes Theorem , Gene Expression Profiling , Glycine max/microbiology , Quantitative Trait, Heritable , Biomarkers , Models, Biological , Phenotype , Plant Diseases
4.
J Comput Biol ; 17(6): 825-40, 2010 Jun.
Article in English | MEDLINE | ID: mdl-20583928

ABSTRACT

In the absence of a comprehensive sequence-based map of a species' genome, genetic maps constitute the next best source of genetic information. Information derived from such maps can be used, for example, in identifying the genes that form quantitative trait loci (QTLs) and for performing comparative genomics between species. Integrating information from a collection of maps will provide more accurate inferences on, for example, marker locations. We describe a method for integrating (possibly conflicting) experimentally derived genetic maps. It assumes a fully probabilistic model that describes the relationship between experimentally derived genetic maps and the integrated map. The model views experimentally derived maps for a given species' chromosome as noisy realisations of a single "true" map, where the noise consists of possible linear distortions and measurement error on the marker locations. Bayesian statistical inference methodology is then used to infer the integrated map (the "true" map) and its attendant uncertainties in the marker locations by using data from a number of experimentally determined genetic maps. The method is shown to work well on simulated data and is used to integrate linkage maps of Pig chromosome 6 and also linkage and radiation hybrid maps of Cow chromosome 1.


Subject(s)
Bayes Theorem , Chromosome Mapping/methods , Animals , Cattle , Chromosomes, Mammalian/genetics , Computer Simulation , Genetic Markers , Markov Chains , Monte Carlo Method , Sus scrofa/genetics
5.
Cancer Res ; 69(5): 1739-47, 2009 Mar 01.
Article in English | MEDLINE | ID: mdl-19223557

ABSTRACT

Studies centered at the intersection of embryogenesis and carcinogenesis have identified striking parallels involving signaling pathways that modulate both developmental and neoplastic processes. In the prostate, reciprocal interactions between epithelium and stroma are known to influence neoplasia and also exert morphogenic effects via the urogenital sinus mesenchyme. In this study, we sought to determine molecular relationships between aspects of normal prostate development and prostate carcinogenesis. We first characterized the gene expression program associated with key points of murine prostate organogenesis spanning the initial in utero induction of prostate budding through maturity. We identified a highly reproducible temporal program of gene expression that partitioned according to the broad developmental stages of prostate induction, branching morphogenesis, and secretory differentiation. Comparisons of gene expression profiles of murine prostate cancers arising in the context of genetically engineered alterations in the Pten tumor suppressor and Myc oncogene identified significant associations between the profile of branching morphogenesis and both cancer models. Further, the expression of genes comprising the branching morphogenesis program, such as PRDX4, SLC43A1, and DNMT3A, was significantly altered in human neoplastic prostate epithelium. These results indicate that components of normal developmental processes are active in prostate neoplasia and provide further rationale for exploiting molecular features of organogenesis to understand cancer phenotypes.


Subject(s)
Gene Expression Profiling , Prostate/embryology , Prostatic Neoplasms/etiology , Amino Acid Transport System y+L/genetics , Animals , Cell Differentiation , DNA (Cytosine-5-)-Methyltransferases/genetics , DNA Methyltransferase 3A , Genes, myc , Genetic Engineering , Humans , Male , Mice , Mice, Inbred C57BL , Morphogenesis , Neoplasm Proteins/genetics , Peroxiredoxins/genetics , Prostate/metabolism , Prostatic Neoplasms/genetics
6.
Pac Symp Biocomput ; : 178-89, 2008.
Article in English | MEDLINE | ID: mdl-18229685

ABSTRACT

In this paper we present a framework for integrating diverse data sets under a coherent probabilistic setup. The necessity of a probabilistic modeling arises from the fact that data integration does not restrict to compiling information from data bases with data that are typically thought to be non-random. Currently wide range of experimental data is also available however rarely these data sets can be summarized in simple output data, e.g. in categorical form. Moreover it may not even be appropriate to do so. The proposed setup allows modeling not only the observed data and parameters of interest but most importantly to incorporate prior knowledge. Additionally the setup easily extends to facilitate more popular data-driven analysis.


Subject(s)
Models, Biological , Organogenesis , Androgens/metabolism , Animals , Bayes Theorem , Computational Biology , Data Interpretation, Statistical , Female , Gene Expression Profiling/statistics & numerical data , Male , Mice , Models, Statistical , Prostate/embryology , Prostate/growth & development , Prostate/metabolism
7.
Genetics ; 174(3): 1597-611, 2006 Nov.
Article in English | MEDLINE | ID: mdl-17028339

ABSTRACT

A novel method for Bayesian analysis of genetic heterogeneity and multilocus association in random population samples is presented. The method is valid for quantitative and binary traits as well as for multiallelic markers. In the method, individuals are stochastically assigned into two etiological groups that can have both their own, and possibly different, subsets of trait-associated (disease-predisposing) loci or alleles. The method is favorable especially in situations when etiological models are stratified by the factors that are unknown or went unmeasured, that is, if genetic heterogeneity is due to, for example, unknown genes x environment or genes x gene interactions. Additionally, a heterogeneity structure for the phenotype does not need to follow the structure of the general population; it can have a distinct selection history. The performance of the method is illustrated with simulated example of genes x environment interaction (quantitative trait with loosely linked markers) and compared to the results of single-group analysis in the presence of missing data. Additionally, example analyses with previously analyzed cystic fibrosis and type 2 diabetes data sets (binary traits with closely linked markers) are presented. The implementation (written in WinBUGS) is freely available for research purposes from http://www.rni.helsinki.fi/ approximately mjs/.


Subject(s)
Chromosome Mapping , Models, Genetic , Quantitative Trait Loci , Alleles , Bayes Theorem , Computer Simulation , Cystic Fibrosis/genetics , Diabetes Mellitus, Type 2/genetics , Epistasis, Genetic , Genetic Heterogeneity , Genetic Markers , Humans , Selection, Genetic , Stochastic Processes
8.
Genetics ; 169(1): 427-39, 2005 Jan.
Article in English | MEDLINE | ID: mdl-15371355

ABSTRACT

A Bayesian method for fine mapping is presented, which deals with multiallelic markers (with two or more alleles), unknown phase, missing data, multiple causal variants, and both continuous and binary phenotypes. We consider small chromosomal segments spanned by a dense set of closely linked markers and putative genes only at marker points. In the phenotypic model, locus-specific indicator variables are used to control inclusion in or exclusion from marker contributions. To account for covariance between consecutive loci and to control fluctuations in association signals along a candidate region we introduce a joint prior for the indicators that depends on genetic or physical map distances. The potential of the method, including posterior estimation of trait-associated loci, their effects, linkage disequilibrium pattern due to close linkage of loci, and the age of a causal variant (time to most recent common ancestor), is illustrated with the well-known cystic fibrosis and Friedreich ataxia data sets by assuming that haplotypes were not available. In addition, simulation analysis with large genetic distances is shown. Estimation of model parameters is based on Markov chain Monte Carlo (MCMC) sampling and is implemented using WinBUGS. The model specification code is freely available for research purposes from http://www.rni.helsinki.fi/~mjs/.


Subject(s)
Bayes Theorem , Chromosome Mapping , Genetic Linkage , Models, Genetic , Mutation/genetics , Algorithms , Alleles , Cystic Fibrosis/genetics , Friedreich Ataxia/genetics , Genetic Markers , Genotype , Haplotypes , Humans , Phenotype , Quantitative Trait, Heritable
9.
Bioinformatics ; 20(17): 2943-53, 2004 Nov 22.
Article in English | MEDLINE | ID: mdl-15180937

ABSTRACT

MOTIVATION: The statistical analysis of microarray data usually proceeds in a sequential manner, with the output of the previous step always serving as the input of the next one. However, the methods currently used in such analyses do not properly account for the fact that the intermediate results may not always be correct, then leading to cumulating error in the inferences drawn based on such steps. RESULTS: Here we show that, by an application of hierarchical Bayesian methodology, this sequential procedure can be replaced by a single joint analysis, while systematically accounting for the uncertainties in this process. Moreover, we can also integrate relevant functional information available from databases into such an analysis, thereby increasing the reliability of the biological conclusions that are drawn. We illustrate these points by analysing real data and by showing that the genes can be divided into categories of interest, with the defining characteristic depending on the biological question that is considered. We contend that the proposed method has advantages at two levels. First, there are gains in the statistical and biological results from the analysis of this particular dataset. Second, it opens up new possibilities in analysing microarray data in general.


Subject(s)
Algorithms , Bayes Theorem , Gene Expression Profiling/methods , Gene Expression Regulation/physiology , Oligonucleotide Array Sequence Analysis/methods , Proteome/metabolism , Animals , Kidney/metabolism , Liver/metabolism , Male , Mice , Organ Specificity , Proteome/chemistry , Structure-Activity Relationship , Systems Integration , Testis/metabolism , Tissue Distribution
SELECTION OF CITATIONS
SEARCH DETAIL
...