Pesquisa | Portal Regional da BVS

A Deep-Learning Approach for Inference of Selective Sweeps from the Ancestral Recombination Graph.

Hejase, Hussein A; Mo, Ziyi; Campagna, Leonardo; Siepel, Adam.

Mol Biol Evol ; 39(1)2022 01 07.

Artigo em Inglês | MEDLINE | ID: mdl-34888675

RESUMO

Detecting signals of selection from genomic data is a central problem in population genetics. Coupling the rich information in the ancestral recombination graph (ARG) with a powerful and scalable deep-learning framework, we developed a novel method to detect and quantify positive selection: Selection Inference using the Ancestral recombination graph (SIA). Built on a Long Short-Term Memory (LSTM) architecture, a particular type of a Recurrent Neural Network (RNN), SIA can be trained to explicitly infer a full range of selection coefficients, as well as the allele frequency trajectory and time of selection onset. We benchmarked SIA extensively on simulations under a European human demographic model, and found that it performs as well or better as some of the best available methods, including state-of-the-art machine-learning and ARG-based methods. In addition, we used SIA to estimate selection coefficients at several loci associated with human phenotypes of interest. SIA detected novel signals of selection particular to the European (CEU) population at the MC1R and ABCC11 loci. In addition, it recapitulated signals of selection at the LCT locus and several pigmentation-related genes. Finally, we reanalyzed polymorphism data of a collection of recently radiated southern capuchino seedeater taxa in the genus Sporophila to quantify the strength of selection and improved the power of our previous methods to detect partial soft sweeps. Overall, SIA uses deep learning to leverage the ARG and thereby provides new insight into how selective sweeps shape genomic diversity.

Assuntos

Aprendizado Profundo , Seleção Genética , Genética Populacional , Modelos Genéticos , Recombinação Genética

Genomic islands of differentiation in a rapid avian radiation have been driven by recent selective sweeps.

Hejase, Hussein A; Salman-Minkov, Ayelet; Campagna, Leonardo; Hubisz, Melissa J; Lovette, Irby J; Gronau, Ilan; Siepel, Adam.

Proc Natl Acad Sci U S A ; 117(48): 30554-30565, 2020 12 01.

Artigo em Inglês | MEDLINE | ID: mdl-33199636

RESUMO

Numerous studies of emerging species have identified genomic "islands" of elevated differentiation against a background of relative homogeneity. The causes of these islands remain unclear, however, with some signs pointing toward "speciation genes" that locally restrict gene flow and others suggesting selective sweeps that have occurred within nascent species after speciation. Here, we examine this question through the lens of genome sequence data for five species of southern capuchino seedeaters, finch-like birds from South America that have undergone a species radiation during the last â¼50,000 generations. By applying newly developed statistical methods for ancestral recombination graph inference and machine-learning methods for the prediction of selective sweeps, we show that previously identified islands of differentiation in these birds appear to be generally associated with relatively recent, species-specific selective sweeps, most of which are predicted to be soft sweeps acting on standing genetic variation. Many of these sweeps coincide with genes associated with melanin-based variation in plumage, suggesting a prominent role for sexual selection. At the same time, a few loci also exhibit indications of possible selection against gene flow. These observations shed light on the complex manner in which natural selection shapes genome sequences during speciation.

Assuntos

Ilhas Genômicas , Modelos Genéticos , Animais , Biodiversidade , Variação Genética , Aprendizado de Máquina

Non-parametric and semi-parametric support estimation using SEquential RESampling random walks on biomolecular sequences.

Wang, Wei; Smith, Jack; Hejase, Hussein A; Liu, Kevin J.

Algorithms Mol Biol ; 15: 7, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-32322294

RESUMO

Non-parametric and semi-parametric resampling procedures are widely used to perform support estimation in computational biology and bioinformatics. Among the most widely used methods in this class is the standard bootstrap method, which consists of random sampling with replacement. While not requiring assumptions about any particular parametric model for resampling purposes, the bootstrap and related techniques assume that sites are independent and identically distributed (i.i.d.). The i.i.d. assumption can be an over-simplification for many problems in computational biology and bioinformatics. In particular, sequential dependence within biomolecular sequences is often an essential biological feature due to biochemical function, evolutionary processes such as recombination, and other factors. To relax the simplifying i.i.d. assumption, we propose a new non-parametric/semi-parametric sequential resampling technique that generalizes "Heads-or-Tails" mirrored inputs, a simple but clever technique due to Landan and Graur. The generalized procedure takes the form of random walks along either aligned or unaligned biomolecular sequences. We refer to our new method as the SERES (or "SEquential RESampling") method. To demonstrate the performance of the new technique, we apply SERES to estimate support for the multiple sequence alignment problem. Using simulated and empirical data, we show that SERES-based support estimation yields comparable or typically better performance compared to state-of-the-art methods.

From Summary Statistics to Gene Trees: Methods for Inferring Positive Selection.

Hejase, Hussein A; Dukler, Noah; Siepel, Adam.

Trends Genet ; 36(4): 243-258, 2020 04.

Artigo em Inglês | MEDLINE | ID: mdl-31954511

RESUMO

Methods to detect signals of natural selection from genomic data have traditionally emphasized the use of simple summary statistics. Here, we review a new generation of methods that consider combinations of conventional summary statistics and/or richer features derived from inferred gene trees and ancestral recombination graphs (ARGs). We also review recent advances in methods for population genetic simulation and ARG reconstruction. Finally, we describe opportunities for future work on a variety of related topics, including the genetics of speciation, estimation of selection coefficients, and inference of selection on polygenic traits. Together, these emerging methods offer promising new directions in the study of natural selection.

Assuntos

Evolução Molecular , Genética Populacional/estatística & dados numéricos , Recombinação Genética/genética , Seleção Genética/genética , Algoritmos , Simulação por Computador , Modelos Genéticos , Herança Multifatorial/genética , Filogenia

A scalability study of phylogenetic network inference methods using empirical datasets and simulations involving a single reticulation.

Hejase, Hussein A; Liu, Kevin J.

BMC Bioinformatics ; 17(1): 422, 2016 Oct 13.

Artigo em Inglês | MEDLINE | ID: mdl-27737628

RESUMO

BACKGROUND: Branching events in phylogenetic trees reflect bifurcating and/or multifurcating speciation and splitting events. In the presence of gene flow, a phylogeny cannot be described by a tree but is instead a directed acyclic graph known as a phylogenetic network. Both phylogenetic trees and networks are typically reconstructed using computational analysis of multi-locus sequence data. The advent of high-throughput sequencing technologies has brought about two main scalability challenges: (1) dataset size in terms of the number of taxa and (2) the evolutionary divergence of the taxa in a study. The impact of both dimensions of scale on phylogenetic tree inference has been well characterized by recent studies; in contrast, the scalability limits of phylogenetic network inference methods are largely unknown. RESULTS: In this study, we quantify the performance of state-of-the-art phylogenetic network inference methods on large-scale datasets using empirical data sampled from natural mouse populations and a range of simulations using model phylogenies with a single reticulation. We find that, as in the case of phylogenetic tree inference, the performance of leading network inference methods is negatively impacted by both dimensions of dataset scale. In general, we found that topological accuracy degrades as the number of taxa increases; a similar effect was observed with increased sequence mutation rate. The most accurate methods were probabilistic inference methods which maximize either likelihood under coalescent-based models or pseudo-likelihood approximations to the model likelihood. The improved accuracy obtained with probabilistic inference methods comes at a computational cost in terms of runtime and main memory usage, which become prohibitive as dataset size grows past twenty-five taxa. None of the probabilistic methods completed analyses of datasets with 30 taxa or more after many weeks of CPU runtime. CONCLUSIONS: We conclude that the state of the art of phylogenetic network inference lags well behind the scope of current phylogenomic studies. New algorithmic development is critically needed to address this methodological gap.

Assuntos

Evolução Biológica , Biologia Computacional/métodos , Especiação Genética , Modelos Genéticos , Filogenia , Animais , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Probabilidade

Erratum to: 'Mapping the genomic architecture of adaptive traits with interspecific introgressive origin: a coalescent-based approach'.

Hejase, Hussein A; Liu, Kevin J.

BMC Genomics ; 17: 292, 2016 Apr 18.

Artigo em Inglês | MEDLINE | ID: mdl-27090376

Mapping the genomic architecture of adaptive traits with interspecific introgressive origin: a coalescent-based approach.

Hejase, Hussein A; Liu, Kevin J.

BMC Genomics ; 17 Suppl 1: 8, 2016 Jan 11.

Artigo em Inglês | MEDLINE | ID: mdl-26819241

RESUMO

Recent studies of eukaryotes including human and Neandertal, mice, and butterflies have highlighted the major role that interspecific introgression has played in adaptive trait evolution. A common question arises in each case: what is the genomic architecture of the introgressed traits? One common approach that can be used to address this question is association mapping, which looks for genotypic markers that have significant statistical association with a trait. It is well understood that sample relatedness can be a confounding factor in association mapping studies if not properly accounted for. Introgression and other evolutionary processes (e.g., incomplete lineage sorting) typically introduce variation among local genealogies, which can also differ from global sample structure measured across all genomic loci. In contrast, state-of-the-art association mapping methods assume fixed sample relatedness across the genome, which can lead to spurious inference. We therefore propose a new association mapping method called Coal-Map, which uses coalescent-based models to capture local genealogical variation alongside global sample structure. Using simulated and empirical data reflecting a range of evolutionary scenarios, we compare the performance of Coal-Map against EIGENSTRAT, a leading association mapping method in terms of its popularity, power, and type I error control. Our empirical data makes use of hundreds of mouse genomes for which adaptive interspecific introgression has recently been described. We found that Coal-Map's performance is comparable or better than EIGENSTRAT in terms of statistical power and false positive rate. Coal-Map's performance advantage was greatest on model conditions that most closely resembled empirically observed scenarios of adaptive introgression. These conditions had: (1) causal SNPs contained in one or a few introgressed genomic loci and (2) varying rates of gene flow - from high rates to very low rates where incomplete lineage sorting dominated as a primary cause of local genealogical variation.

Assuntos

Mapeamento Cromossômico , Genoma , Algoritmos , Animais , Loci Gênicos , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Camundongos , Modelos Genéticos , Fenótipo , Polimorfismo de Nucleotídeo Único

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA