Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Am J Hum Genet ; 110(2): 314-325, 2023 02 02.
Artigo em Inglês | MEDLINE | ID: mdl-36610401

RESUMO

Admixture estimation plays a crucial role in ancestry inference and genome-wide association studies (GWASs). Computer programs such as ADMIXTURE and STRUCTURE are commonly employed to estimate the admixture proportions of sample individuals. However, these programs can be overwhelmed by the computational burdens imposed by the 105 to 106 samples and millions of markers commonly found in modern biobanks. An attractive strategy is to run these programs on a set of ancestry-informative SNP markers (AIMs) that exhibit substantially different frequencies across populations. Unfortunately, existing methods for identifying AIMs require knowing ancestry labels for a subset of the sample. This supervised learning approach creates a chicken and the egg scenario. In this paper, we present an unsupervised, scalable framework that seamlessly carries out AIM selection and likelihood-based estimation of admixture proportions. Our simulated and real data examples show that this approach is scalable to modern biobank datasets. OpenADMIXTURE, our Julia implementation of the method, is open source and available for free.


Assuntos
Bancos de Espécimes Biológicos , Estudo de Associação Genômica Ampla , Humanos , Estudo de Associação Genômica Ampla/métodos , Funções Verossimilhança , Grupos Populacionais , Software , Genética Populacional
2.
BMC Bioinformatics ; 17: 218, 2016 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-27216439

RESUMO

BACKGROUND: A number of large genomic datasets are being generated for studies of human ancestry and diseases. The ADMIXTURE program is commonly used to infer individual ancestry from genomic data. RESULTS: We describe two improvements to the ADMIXTURE software. The first enables ADMIXTURE to infer ancestry for a new set of individuals using cluster allele frequencies from a reference set of individuals. Using data from the 1000 Genomes Project, we show that this allows ADMIXTURE to infer ancestry for 10,920 individuals in a few hours (a 5 × speedup). This mode also allows ADMIXTURE to correctly estimate individual ancestry and allele frequencies from a set of related individuals. The second modification allows ADMIXTURE to correctly handle X-chromosome (and other haploid) data from both males and females. We demonstrate increased power to detect sex-biased admixture in African-American individuals from the 1000 Genomes project using this extension. CONCLUSIONS: These modifications make ADMIXTURE more efficient and versatile, allowing users to extract more information from large genomic datasets.


Assuntos
Genética Populacional , Genômica/métodos , Software , Negro ou Afro-Americano/genética , Feminino , Frequência do Gene , Projeto HapMap , Humanos , Masculino , Sudoeste dos Estados Unidos
3.
Nat Methods ; 10(6): 563-9, 2013 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-23644548

RESUMO

We present a hierarchical genome-assembly process (HGAP) for high-quality de novo microbial genome assemblies using only a single, long-insert shotgun DNA library in conjunction with Single Molecule, Real-Time (SMRT) DNA sequencing. Our method uses the longest reads as seeds to recruit all other reads for construction of highly accurate preassembled reads through a directed acyclic graph-based consensus procedure, which we follow with assembly using off-the-shelf long-read assemblers. In contrast to hybrid approaches, HGAP does not require highly accurate raw reads for error correction. We demonstrate efficient genome assembly for several microorganisms using as few as three SMRT Cell zero-mode waveguide arrays of sequencing and for BACs using just one SMRT Cell. Long repeat regions can be successfully resolved with this workflow. We also describe a consensus algorithm that incorporates SMRT sequencing primary quality values to produce de novo genome sequence exceeding 99.999% accuracy.


Assuntos
Genoma Bacteriano , Análise de Sequência de DNA/métodos , Cromossomos Artificiais Bacterianos , Escherichia coli/genética , Biblioteca Gênica , Humanos , Sequências Repetitivas de Ácido Nucleico
4.
Genet Epidemiol ; 35(7): 722-8, 2011 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-22009793

RESUMO

This article applies the recently proposed "stability selection" procedure of Meinshausen and Bühlmann to the problem of variable selection in genome-wide association. In particular, it explores whether stability selection can identify new regions of interest originally missed or can call into legitimate question regions originally flagged. Our analysis of the seven data sets of the Wellcome Trust Case-Control Consortium suggests that stability selection effectively controls the family-wise error rate but suffers a loss of power. The extensive correlation structure among SNP markers induced by linkage disequilibrium renders the procedure too conservative, causing it to miss regions known to be highly significant from simple marginal analyses. As a remedy one can aggregate nearby SNPs into groups and select groups rather than individual SNPs. The modified procedure can accurately identify the most important regions of genome-wide association, but in a simulation study it still offers less power than simpler and less computationally intensive methods of marginal association testing.


Assuntos
Estudo de Associação Genômica Ampla , Modelos Genéticos , Estudos de Casos e Controles , Simulação por Computador , Humanos , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único
5.
BMC Bioinformatics ; 12: 246, 2011 Jun 18.
Artigo em Inglês | MEDLINE | ID: mdl-21682921

RESUMO

BACKGROUND: The estimation of individual ancestry from genetic data has become essential to applied population genetics and genetic epidemiology. Software programs for calculating ancestry estimates have become essential tools in the geneticist's analytic arsenal. RESULTS: Here we describe four enhancements to ADMIXTURE, a high-performance tool for estimating individual ancestries and population allele frequencies from SNP (single nucleotide polymorphism) data. First, ADMIXTURE can be used to estimate the number of underlying populations through cross-validation. Second, individuals of known ancestry can be exploited in supervised learning to yield more precise ancestry estimates. Third, by penalizing small admixture coefficients for each individual, one can encourage model parsimony, often yielding more interpretable results for small datasets or datasets with large numbers of ancestral populations. Finally, by exploiting multiple processors, large datasets can be analyzed even more rapidly. CONCLUSIONS: The enhancements we have described make ADMIXTURE a more accurate, efficient, and versatile tool for ancestry estimation.


Assuntos
Algoritmos , Genética Populacional , Polimorfismo de Nucleotídeo Único , Inteligência Artificial , Frequência do Gene , Genoma Humano , Humanos , Funções Verossimilhança , Grupos Populacionais , Software
6.
Genome Res ; 19(9): 1655-64, 2009 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-19648217

RESUMO

Population stratification has long been recognized as a confounding factor in genetic association studies. Estimated ancestries, derived from multi-locus genotype data, can be used to perform a statistical correction for population stratification. One popular technique for estimation of ancestry is the model-based approach embodied by the widely applied program structure. Another approach, implemented in the program EIGENSTRAT, relies on Principal Component Analysis rather than model-based estimation and does not directly deliver admixture fractions. EIGENSTRAT has gained in popularity in part owing to its remarkable speed in comparison to structure. We present a new algorithm and a program, ADMIXTURE, for model-based estimation of ancestry in unrelated individuals. ADMIXTURE adopts the likelihood model embedded in structure. However, ADMIXTURE runs considerably faster, solving problems in minutes that take structure hours. In many of our experiments, we have found that ADMIXTURE is almost as fast as EIGENSTRAT. The runtime improvements of ADMIXTURE rely on a fast block relaxation scheme using sequential quadratic programming for block updates, coupled with a novel quasi-Newton acceleration of convergence. Our algorithm also runs faster and with greater accuracy than the implementation of an Expectation-Maximization (EM) algorithm incorporated in the program FRAPPE. Our simulations show that ADMIXTURE's maximum likelihood estimates of the underlying admixture coefficients and ancestral allele frequencies are as accurate as structure's Bayesian estimates. On real-world data sets, ADMIXTURE's estimates are directly comparable to those from structure and EIGENSTRAT. Taken together, our results show that ADMIXTURE's computational speed opens up the possibility of using a much larger set of markers in model-based ancestry estimation and that its estimates are suitable for use in correcting for population stratification in association studies.


Assuntos
Algoritmos , Genética Populacional , Software , Biologia Computacional , Europa (Continente)/etnologia , Frequência do Gene , Estudos de Associação Genética , Genótipo , Humanos , Doenças Inflamatórias Intestinais/etnologia , Doenças Inflamatórias Intestinais/genética , Judeus/etnologia , Funções Verossimilhança , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Fatores de Tempo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...