Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
BMC Genomics ; 16: 143, 2015 Feb 28.
Artigo em Inglês | MEDLINE | ID: mdl-25765891

RESUMO

BACKGROUND: Identifying insertion/deletion polymorphisms (INDELs) with high confidence has been intrinsically challenging in short-read sequencing data. Here we report our approach for improving INDEL calling accuracy by using a machine learning algorithm to combine call sets generated with three independent methods, and by leveraging the strengths of each individual pipeline. Utilizing this approach, we generated a consensus exome INDEL call set from a large dataset generated by the 1000 Genomes Project (1000G), maximizing both the sensitivity and the specificity of the calls. RESULTS: This consensus exome INDEL call set features 7,210 INDELs, from 1,128 individuals across 13 populations included in the 1000 Genomes Phase 1 dataset, with a false discovery rate (FDR) of about 7.0%. CONCLUSIONS: In our study we further characterize the patterns and distributions of these exonic INDELs with respect to density, allele length, and site frequency spectrum, as well as the potential mutagenic mechanisms of coding INDELs in humans.


Assuntos
Exoma/genética , Mutação INDEL/genética , Mutagênese , Biologia Computacional , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Projeto Genoma Humano , Humanos , Aprendizado de Máquina
2.
BMC Genomics ; 13 Suppl 6: S19, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23134663

RESUMO

BACKGROUND: Until recently, sequencing has primarily been carried out in large genome centers which have invested heavily in developing the computational infrastructure that enables genomic sequence analysis. The recent advancements in next generation sequencing (NGS) have led to a wide dissemination of sequencing technologies and data, to highly diverse research groups. It is expected that clinical sequencing will become part of diagnostic routines shortly. However, limited accessibility to computational infrastructure and high quality bioinformatic tools, and the demand for personnel skilled in data analysis and interpretation remains a serious bottleneck. To this end, the cloud computing and Software-as-a-Service (SaaS) technologies can help address these issues. RESULTS: We successfully enabled the Atlas2 Cloud pipeline for personal genome analysis on two different cloud service platforms: a community cloud via the Genboree Workbench, and a commercial cloud via the Amazon Web Services using Software-as-a-Service model. We report a case study of personal genome analysis using our Atlas2 Genboree pipeline. We also outline a detailed cost structure for running Atlas2 Amazon on whole exome capture data, providing cost projections in terms of storage, compute and I/O when running Atlas2 Amazon on a large data set. CONCLUSIONS: We find that providing a web interface and an optimized pipeline clearly facilitates usage of cloud computing for personal genome analysis, but for it to be routinely used for large scale projects there needs to be a paradigm shift in the way we develop tools, in standard operating procedures, and in funding mechanisms.


Assuntos
Genoma Humano , Software , Bases de Dados Genéticas , Humanos , Internet , Interface Usuário-Computador
3.
BMC Bioinformatics ; 13: 8, 2012 Jan 12.
Artigo em Inglês | MEDLINE | ID: mdl-22239737

RESUMO

BACKGROUND: Whole exome capture sequencing allows researchers to cost-effectively sequence the coding regions of the genome. Although the exome capture sequencing methods have become routine and well established, there is currently a lack of tools specialized for variant calling in this type of data. RESULTS: Using statistical models trained on validated whole-exome capture sequencing data, the Atlas2 Suite is an integrative variant analysis pipeline optimized for variant discovery on all three of the widely used next generation sequencing platforms (SOLiD, Illumina, and Roche 454). The suite employs logistic regression models in conjunction with user-adjustable cutoffs to accurately separate true SNPs and INDELs from sequencing and mapping errors with high sensitivity (96.7%). CONCLUSION: We have implemented the Atlas2 Suite and applied it to 92 whole exome samples from the 1000 Genomes Project. The Atlas2 Suite is available for download at http://sourceforge.net/projects/atlas2/. In addition to a command line version, the suite has been integrated into the Genboree Workbench, allowing biomedical scientists with minimal informatics expertise to remotely call, view, and further analyze variants through a simple web interface. The existing genomic databases displayed via the Genboree browser also streamline the process from variant discovery to functional genomics analysis, resulting in an off-the-shelf toolkit for the broader community.


Assuntos
Exoma , Genoma Humano , Software , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Mutação INDEL , Fases de Leitura Aberta , Polimorfismo de Nucleotídeo Único
4.
Genome Biol ; 12(9): R84, 2011 Sep 14.
Artigo em Inglês | MEDLINE | ID: mdl-21917140

RESUMO

BACKGROUND: Rare coding variants constitute an important class of human genetic variation, but are underrepresented in current databases that are based on small population samples. Recent studies show that variants altering amino acid sequence and protein function are enriched at low variant allele frequency, 2 to 5%, but because of insufficient sample size it is not clear if the same trend holds for rare variants below 1% allele frequency. RESULTS: The 1000 Genomes Exon Pilot Project has collected deep-coverage exon-capture data in roughly 1,000 human genes, for nearly 700 samples. Although medical whole-exome projects are currently afoot, this is still the deepest reported sampling of a large number of human genes with next-generation technologies. According to the goals of the 1000 Genomes Project, we created effective informatics pipelines to process and analyze the data, and discovered 12,758 exonic SNPs, 70% of them novel, and 74% below 1% allele frequency in the seven population samples we examined. Our analysis confirms that coding variants below 1% allele frequency show increased population-specificity and are enriched for functional variants. CONCLUSIONS: This study represents a large step toward detecting and interpreting low frequency coding variation, clearly lays out technical steps for effective analysis of DNA capture data, and articulates functional and population properties of this important class of genetic variation.


Assuntos
Éxons , Genoma Humano , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA/métodos , Algoritmos , Alelos , Sequência de Bases , Frequência do Gene , Genética Populacional , Genótipo , Humanos , Mutação INDEL , Análise de Sequência com Séries de Oligonucleotídeos , Sensibilidade e Especificidade , Alinhamento de Sequência/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...