Search | VHL Regional Portal

Multivariate adaptive shrinkage improves cross-population transcriptome prediction and association studies in underrepresented populations.

Araujo, Daniel S; Nguyen, Chris; Hu, Xiaowei; Mikhaylova, Anna V; Gignoux, Chris; Ardlie, Kristin; Taylor, Kent D; Durda, Peter; Liu, Yongmei; Papanicolaou, George; Cho, Michael H; Rich, Stephen S; Rotter, Jerome I; Im, Hae Kyung; Manichaikul, Ani; Wheeler, Heather E.

HGG Adv ; 4(4): 100216, 2023 Oct 12.

Article in English | MEDLINE | ID: mdl-37869564

ABSTRACT

Transcriptome prediction models built with data from European-descent individuals are less accurate when applied to different populations because of differences in linkage disequilibrium patterns and allele frequencies. We hypothesized that methods that leverage shared regulatory effects across different conditions, in this case, across different populations, may improve cross-population transcriptome prediction. To test this hypothesis, we made transcriptome prediction models for use in transcriptome-wide association studies (TWASs) using different methods (elastic net, joint-tissue imputation [JTI], matrix expression quantitative trait loci [Matrix eQTL], multivariate adaptive shrinkage in R [MASHR], and transcriptome-integrated genetic association resource [TIGAR]) and tested their out-of-sample transcriptome prediction accuracy in population-matched and cross-population scenarios. Additionally, to evaluate model applicability in TWASs, we integrated publicly available multiethnic genome-wide association study (GWAS) summary statistics from the Population Architecture using Genomics and Epidemiology (PAGE) study and Pan-ancestry genetic analysis of the UK Biobank (PanUKBB) with our developed transcriptome prediction models. In regard to transcriptome prediction accuracy, MASHR models performed better or the same as other methods in both population-matched and cross-population transcriptome predictions. Furthermore, in multiethnic TWASs, MASHR models yielded more discoveries that replicate in both PAGE and PanUKBB across all methods analyzed, including loci previously mapped in GWASs and loci previously not found in GWASs. Overall, our study demonstrates the importance of using methods that benefit from different populations' effect size estimates in order to improve TWASs for multiethnic or underrepresented populations.

Subject(s)

Genome-Wide Association Study , Transcriptome , Humans , Transcriptome/genetics , Quantitative Trait Loci/genetics , Gene Frequency , Linkage Disequilibrium

Multivariate adaptive shrinkage improves cross-population transcriptome prediction for transcriptome-wide association studies in underrepresented populations.

bioRxiv ; 2023 May 20.

Article in English | MEDLINE | ID: mdl-36798214

ABSTRACT

Transcriptome prediction models built with data from European-descent individuals are less accurate when applied to different populations because of differences in linkage disequilibrium patterns and allele frequencies. We hypothesized methods that leverage shared regulatory effects across different conditions, in this case, across different populations may improve cross-population transcriptome prediction. To test this hypothesis, we made transcriptome prediction models for use in transcriptome-wide association studies (TWAS) using different methods (Elastic Net, Joint-Tissue Imputation (JTI), Matrix eQTL, Multivariate Adaptive Shrinkage in R (MASHR), and Transcriptome-Integrated Genetic Association Resource (TIGAR)) and tested their out-of-sample transcriptome prediction accuracy in population-matched and cross-population scenarios. Additionally, to evaluate model applicability in TWAS, we integrated publicly available multi-ethnic genome-wide association study (GWAS) summary statistics from the Population Architecture using Genomics and Epidemiology Study (PAGE) and Pan-UK Biobank with our developed transcriptome prediction models. In regard to transcriptome prediction accuracy, MASHR models performed better or the same as other methods in both population-matched and cross-population transcriptome predictions. Furthermore, in multi-ethnic TWAS, MASHR models yielded more discoveries that replicate in both PAGE and PanUKBB across all methods analyzed, including loci previously mapped in GWAS and new loci previously not found in GWAS. Overall, our study demonstrates the importance of using methods that benefit from different populations' effect size estimates in order to improve TWAS for multi-ethnic or underrepresented populations.

Phylogenomics and gene selection in Aspergillus welwitschiae: Possible implications in the pathogenicity in Agave sisalana.

Quintanilha-Peixoto, Gabriel; Marone, Marina Püpke; Raya, Fábio Trigo; José, Juliana; Oliveira, Adriele; Fonseca, Paula Luize Camargos; Tomé, Luiz Marcelo Ribeiro; Bortolini, Dener Eduardo; Kato, Rodrigo Bentes; Araújo, Daniel S; De-Paula, Ruth B; Cuesta-Astroz, Yesid; Duarte, Elizabeth A A; Badotti, Fernanda; de Carvalho Azevedo, Vasco Ariston; Brenig, Bertram; Soares, Ana Cristina Fermino; Carazzolle, Marcelo Falsarella; Pereira, Gonçalo Amarante Guimarães; Aguiar, Eric Roberto Guimarães Rocha; Góes-Neto, Aristóteles.

Genomics ; 114(6): 110517, 2022 11.

Article in English | MEDLINE | ID: mdl-36306958

ABSTRACT

Aspergillus welwitschiae causes bole rot disease in sisal (Agave sisalana and related species) which affects the production of natural fibers in Brazil, the main worldwide producer of sisal fibers. This fungus is a saprotroph with a broad host range. Previous research established A. welwitschiae as the only causative agent of bole rot in the field, but little is known about the evolution of this species and its strains. In this work, we performed a comparative genomics analysis of 40 Aspergillus strains. We show the conflicting molecular identity of this species, with one sisal-infecting strain sharing its last common ancestor with Aspergillus niger, having diverged only 833 thousand years ago. Furthermore, our analysis of positive selection reveals sites under selection in genes coding for siderophore transporters, Sodiumcalcium exchangers, and Phosphatidylethanolamine-binding proteins (PEBPs). Herein, we discuss the possible impacts of these gene functions on the pathogenicity in sisal.

Subject(s)

Agave , Agave/genetics , Brazil , Aspergillus/genetics

Genetic and environmental variation impact transferability of polygenic risk scores.

Araújo, Daniel S; Wheeler, Heather E.

Cell Rep Med ; 3(7): 100687, 2022 07 19.

Article in English | MEDLINE | ID: mdl-35858592

ABSTRACT

Even when polygenic risk scores (PRSs) are trained in African ancestral populations, Kamiza and colleagues showed that genetic and environmental variation within sub-Saharan African populations impacts prediction performance, highlighting the challenges of clinical implementation of PRSs for risk assessment.

Subject(s)

Genetic Predisposition to Disease , Multifactorial Inheritance , Black People , Genetic Predisposition to Disease/genetics , Humans , Multifactorial Inheritance/genetics , Risk Assessment , Risk Factors

Global Characterization of Fungal Mitogenomes: New Insights on Genomic Diversity and Dynamism of Coding Genes and Accessory Elements.

Fonseca, Paula L C; De-Paula, Ruth B; Araújo, Daniel S; Tomé, Luiz Marcelo Ribeiro; Mendes-Pereira, Thairine; Rodrigues, Wenderson Felipe Costa; Del-Bem, Luiz-Eduardo; Aguiar, Eric R G R; Góes-Neto, Aristóteles.

Front Microbiol ; 12: 787283, 2021.

Article in English | MEDLINE | ID: mdl-34925295

ABSTRACT

Fungi comprise a great diversity of species with distinct ecological functions and lifestyles. Similar to other eukaryotes, fungi rely on interactions with prokaryotes and one of the most important symbiotic events was the acquisition of mitochondria. Mitochondria are organelles found in eukaryotic cells whose main function is to generate energy through aerobic respiration. Mitogenomes (mtDNAs) are double-stranded circular or linear DNA from mitochondria that may contain core genes and accessory elements that can be replicated, transcribed, and independently translated from the nuclear genome. Despite their importance, investigative studies on the diversity of fungal mitogenomes are scarce. Herein, we have evaluated 788 curated fungal mitogenomes available at NCBI database to assess discrepancies and similarities among them and to better understand the mechanisms involved in fungal mtDNAs variability. From a total of 12 fungal phyla, four do not have any representative with available mitogenomes, which highlights the underrepresentation of some groups in the current available data. We selected representative and non-redundant mitogenomes based on the threshold of 90% similarity, eliminating 81 mtDNAs. Comparative analyses revealed considerable size variability of mtDNAs with a difference of up to 260 kb in length. Furthermore, variation in mitogenome length and genomic composition are generally related to the number and length of accessory elements (introns, HEGs, and uORFs). We identified an overall average of 8.0 (0-39) introns, 8.0 (0-100) HEGs, and 8.2 (0-102) uORFs per genome, with high variation among phyla. Even though the length of the core protein-coding genes is considerably conserved, approximately 36.3% of the mitogenomes evaluated have at least one of the 14 core coding genes absent. Also, our results revealed that there is not even a single gene shared among all mitogenomes. Other unusual genes in mitogenomes were also detected in many mitogenomes, such as dpo and rpo, and displayed diverse evolutionary histories. Altogether, the results presented in this study suggest that fungal mitogenomes are diverse, contain accessory elements and are absent of a conserved gene that can be used for the taxonomic classification of the Kingdom Fungi.

Comparative mitogenomics of Agaricomycetes: Diversity, abundance, impact and coding potential of putative open-reading frames.

Araújo, Daniel S; De-Paula, Ruth B; Tomé, Luiz M R; Quintanilha-Peixoto, Gabriel; Salvador-Montoya, Carlos A; Del-Bem, Luiz-Eduardo; Badotti, Fernanda; Azevedo, Vasco A C; Brenig, Bertram; Aguiar, Eric R G R; Drechsler-Santos, Elisandro R; Fonseca, Paula L C; Góes-Neto, Aristóteles.

Mitochondrion ; 58: 1-13, 2021 05.

Article in English | MEDLINE | ID: mdl-33582235

ABSTRACT

The mitochondrion is an organelle found in eukaryote organisms, and it is vital for different cellular pathways. The mitochondrion has its own DNA molecule and, because its genetic content is relatively conserved, despite the variation of size and structure, mitogenome sequences have been widely used as a promising molecular biomarker for taxonomy and evolution in fungi. In this study, the mitogenomes of two fungal species of Agaricomycetes class, Phellinotus piptadeniae and Trametes villosa, were assembled and annotated for the first time. We used these newly sequenced mitogenomes for comparative analyses with other 55 mitogenomes of Agaricomycetes available in public databases. Mitochondrial DNA (mtDNA) size and content are highly variable and non-coding and intronic regions, homing endonucleases (HEGs), and unidentified ORFs (uORFs) significantly contribute to the total size of the mitogenome. Furthermore, accessory genes (most of them as HEGs) are shared between distantly related species, most likely as a consequence of horizontal gene transfer events. Conversely, uORFs are only shared between taxonomically related species, most probably as a result of vertical evolutionary inheritance. Additionally, codon usage varies among mitogenomes and the GC content of mitochondrial features may be used to distinguish coding from non-coding sequences. Our results also indicated that transposition events of mitochondrial genes to the nuclear genome are not common. Despite the variation of size and content of the mitogenomes, mitochondrial genes seemed to be reliable molecular markers in our time-divergence analysis, even though the nucleotide substitution rates of mitochondrial and nuclear genomes of fungi are quite different. We also showed that many events of mitochondrial gene shuffling probably happened amongst the Agaricomycetes during evolution, which created differences in the gene order among species, even those of the same genus. Altogether, our study revealed new information regarding evolutionary dynamics in Agaricomycetes.

Subject(s)

Basidiomycota/genetics , Genes, Fungal , Genome, Mitochondrial , Polyporaceae/genetics , Codon , DNA, Mitochondrial/genetics , Introns , Open Reading Frames

Exploring the Relationship Among Divergence Time and Coding and Non-coding Elements in the Shaping of Fungal Mitochondrial Genomes.

Fonseca, Paula L C; Badotti, Fernanda; De-Paula, Ruth B; Araújo, Daniel S; Bortolini, Dener E; Del-Bem, Luiz-Eduardo; Azevedo, Vasco A; Brenig, Bertram; Aguiar, Eric R G R; Góes-Neto, Aristóteles.

Front Microbiol ; 11: 765, 2020.

Article in English | MEDLINE | ID: mdl-32411111

ABSTRACT

The order Hypocreales (Ascomycota) is composed of ubiquitous and ecologically diverse fungi such as saprobes, biotrophs, and pathogens. Despite their phylogenetic relationship, these species exhibit high variability in biomolecules production, lifestyle, and fitness. The mitochondria play an important role in the fungal biology, providing energy to the cells and regulating diverse processes, such as immune response. In spite of its importance, the mechanisms that shape fungal mitogenomes are still poorly understood. Herein, we investigated the variability and evolution of mitogenomes and its relationship with the divergence time using the order Hypocreales as a study model. We sequenced and annotated for the first time Trichoderma harzianum mitochondrial genome (mtDNA), which was compared to other 34 mtDNAs species that were publicly available. Comparative analysis revealed a substantial structural and size variation on non-coding mtDNA regions, despite the conservation of copy number, length, and structure of protein-coding elements. Interestingly, we observed a highly significant correlation between mitogenome length, and the number and size of non-coding sequences in mitochondrial genome. Among the non-coding elements, group I and II introns and homing endonucleases genes (HEGs) were the main contributors to discrepancies in mitogenomes structure and length. Several intronic sequences displayed sequence similarity among species, and some of them are conserved even at gene position, and were present in the majority of mitogenomes, indicating its origin in a common ancestor. On the other hand, we also identified species-specific introns that advocate for the origin by different mechanisms. Investigation of mitochondrial gene transfer to the nuclear genome revealed that nuclear copies of the nad5 are the most frequent while atp8, atp9, and cox3 could not be identified in any of the nuclear genomes analyzed. Moreover, we also estimated the divergence time of each species and investigated its relationship with coding and non-coding elements as well as with the length of mitogenomes. Altogether, our results demonstrated that introns and HEGs are key elements on mitogenome shaping and its presence on fast-evolving mtDNAs could be mostly explained by its divergence time, although the intron sharing profile suggests the involvement of other mechanisms on the mitochondrial genome evolution, such as horizontal transference.

flowDiv: a new pipeline for analyzing flow cytometric diversity.

Wanderley, Bruno M S; A Araújo, Daniel S; Quiroga, María V; Amado, André M; Neto, Adrião D D; Sarmento, Hugo; Metz, Sebastián D; Unrein, Fernando.

BMC Bioinformatics ; 20(1): 274, 2019 May 28.

Article in English | MEDLINE | ID: mdl-31138128

ABSTRACT

BACKGROUND: Flow cytometry (FCM) is one of the most commonly used technologies for analysis of numerous biological systems at the cellular level, from cancer cells to microbial communities. Its high potential and wide applicability led to the development of various analytical protocols, which are often not interchangeable between fields of expertise. Environmental science in particular faces difficulty in adapting to non-specific protocols, mainly because of the highly heterogeneous nature of environmental samples. This variety, although it is intrinsic to environmental studies, makes it difficult to adjust analytical protocols to maintain both mathematical formalism and comprehensible biological interpretations, principally for questions that rely on the evaluation of differences between cytograms, an approach also termed cytometric diversity. Despite the availability of promising bioinformatic tools conceived for or adapted to cytometric diversity, most of them still cannot deal with common technical issues such as the integration of differently acquired datasets, the optimal number of bins, and the effective correlation of bins to previously known cytometric populations. RESULTS: To address these and other questions, we have developed flowDiv, an R language pipeline for analysis of environmental flow cytometry data. Here, we present the rationale for flowDiv and apply the method to a real dataset from 31 freshwater lakes in Patagonia, Argentina, to reveal significant aspects of their cytometric diversities. CONCLUSIONS: flowDiv provides a rather intuitive way of proceeding with FCM analysis, as it combines formal mathematical solutions and biological rationales in an intuitive framework specifically designed to explore cytometric diversity.

Subject(s)

Biodiversity , Flow Cytometry/methods , Software , Humans , Lakes , Microbiota , Principal Component Analysis , Statistics, Nonparametric

Clustering cancer gene expression data: a comparative study.

de Souto, Marcilio C P; Costa, Ivan G; de Araujo, Daniel S A; Ludermir, Teresa B; Schliep, Alexander.

BMC Bioinformatics ; 9: 497, 2008 Nov 27.

Article in English | MEDLINE | ID: mdl-19038021

ABSTRACT

BACKGROUND: The use of clustering methods for the discovery of cancer subtypes has drawn a great deal of attention in the scientific community. While bioinformaticians have proposed new clustering methods that take advantage of characteristics of the gene expression data, the medical community has a preference for using "classic" clustering methods. There have been no studies thus far performing a large-scale evaluation of different clustering methods in this context. RESULTS/CONCLUSION: We present the first large-scale analysis of seven different clustering methods and four proximity measures for the analysis of 35 cancer gene expression data sets. Our results reveal that the finite mixture of Gaussians, followed closely by k-means, exhibited the best performance in terms of recovering the true structure of the data sets. These methods also exhibited, on average, the smallest difference between the actual number of classes in the data sets and the best number of clusters as indicated by our validation criteria. Furthermore, hierarchical methods, which have been widely used by the medical community, exhibited a poorer recovery performance than that of the other methods evaluated. Moreover, as a stable basis for the assessment and comparison of different clustering methods for cancer gene expression data, this study provides a common group of data sets (benchmark data sets) to be shared among researchers and used for comparisons with new methods. The data sets analyzed in this study are available at http://algorithmics.molgen.mpg.de/Supplements/CompCancer/.

Subject(s)

Computational Biology/methods , Gene Expression Profiling , Neoplasms/diagnosis , Algorithms , Cluster Analysis , DNA, Complementary/metabolism , Gene Expression Regulation, Neoplastic , Genes, Neoplasm , Humans , Models, Biological , Models, Statistical , Multigene Family , Neoplasms/genetics , Normal Distribution , Oligonucleotide Array Sequence Analysis , Pattern Recognition, Automated/methods

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL