Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
ACS Synth Biol ; 10(11): 3190-3199, 2021 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-34739228

RESUMO

Synthetic genetic polymers (xeno-nucleic acids, XNAs) have the potential to transition aptamers from laboratory tools to therapeutic agents, but additional functionality is needed to compete with antibodies. Here, we describe the evolution of a biologically stable artificial genetic system composed of α-l-threofuranosyl nucleic acid (TNA) that facilitates the production of backbone- and base-modified aptamers termed "threomers" that function as high quality protein capture reagents. Threomers were discovered against two prototypical protein targets implicated in human diseases through a combination of in vitro selection and next-generation sequencing using uracil nucleotides that are uniformly equipped with aromatic side chains commonly found in the paratope of antibody-antigen crystal structures. Kinetic measurements reveal that the side chain modifications are critical for generating threomers with slow off-rate binding kinetics. These findings expand the chemical space of evolvable non-natural genetic systems to include functional groups that enhance protein target binding by mimicking the structural properties of traditional antibodies.


Assuntos
Aptâmeros de Nucleotídeos/química , Ácidos Nucleicos/química , Polímeros/química , Tetroses/química , Anticorpos/química , Cinética , Proteínas/química
3.
Quant Biol ; 8(1): 64-77, 2020 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-34084563

RESUMO

BACKGROUND: The recent development of metagenomic sequencing makes it possible to massively sequence microbial genomes including viral genomes without the need for laboratory culture. Existing reference-based and gene homology-based methods are not efficient in identifying unknown viruses or short viral sequences from metagenomic data. METHODS: Here we developed a reference-free and alignment-free machine learning method, DeepVirFinder, for identifying viral sequences in metagenomic data using deep learning. RESULTS: Trained based on sequences from viral RefSeq discovered before May 2015, and evaluated on those discovered after that date, DeepVirFinder outperformed the state-of-the-art method VirFinder at all contig lengths, achieving AUROC 0.93, 0.95, 0.97, and 0.98 for 300, 500, 1000, and 3000 bp sequences respectively. Enlarging the training data with additional millions of purified viral sequences from metavirome samples further improved the accuracy for identifying virus groups that are under-represented. Applying DeepVirFinder to real human gut metagenomic samples, we identified 51,138 viral sequences belonging to 175 bins in patients with colorectal carcinoma (CRC). Ten bins were found associated with the cancer status, suggesting viruses may play important roles in CRC. CONCLUSIONS: Powered by deep learning and high throughput sequencing metagenomic data, DeepVirFinder significantly improved the accuracy of viral identification and will assist the study of viruses in the era of metagenomics.

4.
Pac Symp Biocomput ; 24: 224-235, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30864325

RESUMO

Copy number variants (CNVs) are an important type of genetic variation that play a causal role in many diseases. The ability to identify high quality CNVs is of substantial clinical relevance. However, CNVs are notoriously difficult to identify accurately from array-based methods and next-generation sequencing (NGS) data, particularly for small (< 10kbp) CNVs. Manual curation by experts widely remains the gold standard but cannot scale with the pace of sequencing, particularly in fast-growing clinical applications. We present the first proof-of-principle study demonstrating high throughput manual curation of putative CNVs by non-experts. We developed a crowdsourcing framework, called CrowdVariant, that leverages Google's high-throughput crowdsourcing platform to create a high confidence set of deletions for NA24385 (NIST HG002/RM 8391), an Ashkenazim reference sample developed in partnership with the Genome In A Bottle (GIAB) Consortium. We show that non-experts tend to agree both with each other and with experts on putative CNVs. We show that crowdsourced non-expert classifications can be used to accurately assign copy number status to putative CNV calls and identify 1,781 high confidence deletions in a reference sample. Multiple lines of evidence suggest these calls are a substantial improvement over existing CNV callsets and can also be useful in benchmarking and improving CNV calling algorithms. Our crowdsourcing methodology takes the first step toward showing the clinical potential for manual curation of CNVs at scale and can further guide other crowdsourcing genomics applications.


Assuntos
Crowdsourcing/métodos , Variações do Número de Cópias de DNA , Algoritmos , Biologia Computacional/métodos , Curadoria de Dados , Genoma Humano , Genômica/métodos , Genômica/estatística & dados numéricos , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Humanos , Análise de Sequência de DNA/estatística & dados numéricos
5.
Bioinformatics ; 35(21): 4389-4391, 2019 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-30916319

RESUMO

SUMMARY: Reference genomes are refined to reflect error corrections and other improvements. While this process improves novel data generation and analysis, incorporating data analyzed on an older reference genome assembly requires transforming the coordinates and representations of the data to the new assembly. Multiple tools exist to perform this transformation for coordinate-only data types, but none supports accurate transformation of genome-wide short variation. Here we present GenomeWarp, a tool for efficiently transforming variants between genome assemblies. GenomeWarp transforms regions and short variants in a conservative manner to minimize false positive and negative variants in the target genome, and converts over 99% of regions and short variants from a representative human genome. AVAILABILITY AND IMPLEMENTATION: GenomeWarp is written in Java. All source code and the user manual are freely available at https://github.com/verilylifesciences/genomewarp. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genômica , Software , Genoma Humano , Humanos
6.
Nat Biotechnol ; 36(10): 983-987, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30247488

RESUMO

Despite rapid advances in sequencing technologies, accurately calling genetic variants present in an individual genome from billions of short, errorful sequence reads remains challenging. Here we show that a deep convolutional neural network can call genetic variation in aligned next-generation sequencing read data by learning statistical relationships between images of read pileups around putative variant and true genotype calls. The approach, called DeepVariant, outperforms existing state-of-the-art tools. The learned model generalizes across genome builds and mammalian species, allowing nonhuman sequencing projects to benefit from the wealth of human ground-truth data. We further show that DeepVariant can learn to call variants in a variety of sequencing technologies and experimental designs, including deep whole genomes from 10X Genomics and Ion Ampliseq exomes, highlighting the benefits of using more automated and generalizable techniques for variant calling.


Assuntos
Genoma Humano , Mamíferos/genética , Redes Neurais de Computação , Polimorfismo de Nucleotídeo Único , Animais , Análise Mutacional de DNA , Genômica , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Mutação INDEL , Análise de Sequência de DNA , Software
7.
Invest Ophthalmol Vis Sci ; 59(7): 2861-2868, 2018 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-30025129

RESUMO

Purpose: We evaluate how deep learning can be applied to extract novel information such as refractive error from retinal fundus imaging. Methods: Retinal fundus images used in this study were 45- and 30-degree field of view images from the UK Biobank and Age-Related Eye Disease Study (AREDS) clinical trials, respectively. Refractive error was measured by autorefraction in UK Biobank and subjective refraction in AREDS. We trained a deep learning algorithm to predict refractive error from a total of 226,870 images and validated it on 24,007 UK Biobank and 15,750 AREDS images. Our model used the "attention" method to identify features that are correlated with refractive error. Results: The resulting algorithm had a mean absolute error (MAE) of 0.56 diopters (95% confidence interval [CI]: 0.55-0.56) for estimating spherical equivalent on the UK Biobank data set and 0.91 diopters (95% CI: 0.89-0.93) for the AREDS data set. The baseline expected MAE (obtained by simply predicting the mean of this population) was 1.81 diopters (95% CI: 1.79-1.84) for UK Biobank and 1.63 (95% CI: 1.60-1.67) for AREDS. Attention maps suggested that the foveal region was one of the most important areas used by the algorithm to make this prediction, though other regions also contribute to the prediction. Conclusions: To our knowledge, the ability to estimate refractive error with high accuracy from retinal fundus photos has not been previously known and demonstrates that deep learning can be applied to make novel predictions from medical images.


Assuntos
Aprendizado Profundo , Fundo de Olho , Erros de Refração/diagnóstico , Retina/diagnóstico por imagem , Adulto , Idoso , Algoritmos , Conjuntos de Dados como Assunto , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Refração Ocular , Testes Visuais , Campos Visuais/fisiologia
8.
Cell ; 173(3): 792-803.e19, 2018 04 19.
Artigo em Inglês | MEDLINE | ID: mdl-29656897

RESUMO

Microscopy is a central method in life sciences. Many popular methods, such as antibody labeling, are used to add physical fluorescent labels to specific cellular constituents. However, these approaches have significant drawbacks, including inconsistency; limitations in the number of simultaneous labels because of spectral overlap; and necessary perturbations of the experiment, such as fixing the cells, to generate the measurement. Here, we show that a computational machine-learning approach, which we call "in silico labeling" (ISL), reliably predicts some fluorescent labels from transmitted-light images of unlabeled fixed or live biological samples. ISL predicts a range of labels, such as those for nuclei, cell type (e.g., neural), and cell state (e.g., cell death). Because prediction happens in silico, the method is consistent, is not limited by spectral overlap, and does not disturb the experiment. ISL generates biological measurements that would otherwise be problematic or impossible to acquire.


Assuntos
Corantes Fluorescentes/química , Processamento de Imagem Assistida por Computador/métodos , Microscopia de Fluorescência/métodos , Neurônios Motores/citologia , Algoritmos , Animais , Linhagem Celular Tumoral , Sobrevivência Celular , Córtex Cerebral/citologia , Humanos , Células-Tronco Pluripotentes Induzidas/citologia , Aprendizado de Máquina , Redes Neurais de Computação , Neurociências , Ratos , Software , Células-Tronco/citologia
9.
Nat Biomed Eng ; 2(3): 158-164, 2018 03.
Artigo em Inglês | MEDLINE | ID: mdl-31015713

RESUMO

Traditionally, medical discoveries are made by observing associations, making hypotheses from them and then designing and running experiments to test the hypotheses. However, with medical images, observing and quantifying associations can often be difficult because of the wide variety of features, patterns, colours, values and shapes that are present in real data. Here, we show that deep learning can extract new knowledge from retinal fundus images. Using deep-learning models trained on data from 284,335 patients and validated on two independent datasets of 12,026 and 999 patients, we predicted cardiovascular risk factors not previously thought to be present or quantifiable in retinal images, such as age (mean absolute error within 3.26 years), gender (area under the receiver operating characteristic curve (AUC) = 0.97), smoking status (AUC = 0.71), systolic blood pressure (mean absolute error within 11.23 mmHg) and major adverse cardiac events (AUC = 0.70). We also show that the trained deep-learning models used anatomical features, such as the optic disc or blood vessels, to generate each prediction.


Assuntos
Doenças Cardiovasculares , Aprendizado Profundo , Interpretação de Imagem Assistida por Computador/métodos , Retina/diagnóstico por imagem , Idoso , Idoso de 80 Anos ou mais , Algoritmos , Doenças Cardiovasculares/diagnóstico por imagem , Doenças Cardiovasculares/epidemiologia , Feminino , Fundo de Olho , Humanos , Masculino , Pessoa de Meia-Idade , Fatores de Risco
10.
Nature ; 536(7616): 285-91, 2016 08 18.
Artigo em Inglês | MEDLINE | ID: mdl-27535533

RESUMO

Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.


Assuntos
Exoma/genética , Variação Genética/genética , Análise Mutacional de DNA , Conjuntos de Dados como Assunto , Humanos , Fenótipo , Proteoma/genética , Doenças Raras/genética , Tamanho da Amostra
11.
Science ; 348(6235): 666-9, 2015 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-25954003

RESUMO

Accurate prediction of the functional effect of genetic variation is critical for clinical genome interpretation. We systematically characterized the transcriptome effects of protein-truncating variants, a class of variants expected to have profound effects on gene function, using data from the Genotype-Tissue Expression (GTEx) and Geuvadis projects. We quantitated tissue-specific and positional effects on nonsense-mediated transcript decay and present an improved predictive model for this decay. We directly measured the effect of variants both proximal and distal to splice junctions. Furthermore, we found that robustness to heterozygous gene inactivation is not due to dosage compensation. Our results illustrate the value of transcriptome data in the functional interpretation of genetic variants.


Assuntos
Regulação da Expressão Gênica , Variação Genética , Genoma Humano/genética , Proteínas/genética , Transcriptoma , Processamento Alternativo , Perfilação da Expressão Gênica , Inativação Gênica , Heterozigoto , Humanos , Degradação do RNAm Mediada por Códon sem Sentido , Fenótipo
12.
BMC Genomics ; 16: 143, 2015 Feb 28.
Artigo em Inglês | MEDLINE | ID: mdl-25765891

RESUMO

BACKGROUND: Identifying insertion/deletion polymorphisms (INDELs) with high confidence has been intrinsically challenging in short-read sequencing data. Here we report our approach for improving INDEL calling accuracy by using a machine learning algorithm to combine call sets generated with three independent methods, and by leveraging the strengths of each individual pipeline. Utilizing this approach, we generated a consensus exome INDEL call set from a large dataset generated by the 1000 Genomes Project (1000G), maximizing both the sensitivity and the specificity of the calls. RESULTS: This consensus exome INDEL call set features 7,210 INDELs, from 1,128 individuals across 13 populations included in the 1000 Genomes Phase 1 dataset, with a false discovery rate (FDR) of about 7.0%. CONCLUSIONS: In our study we further characterize the patterns and distributions of these exonic INDELs with respect to density, allele length, and site frequency spectrum, as well as the potential mutagenic mechanisms of coding INDELs in humans.


Assuntos
Exoma/genética , Mutação INDEL/genética , Mutagênese , Biologia Computacional , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Projeto Genoma Humano , Humanos , Aprendizado de Máquina
13.
PLoS Genet ; 9(4): e1003443, 2013 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-23593035

RESUMO

We report on results from whole-exome sequencing (WES) of 1,039 subjects diagnosed with autism spectrum disorders (ASD) and 870 controls selected from the NIMH repository to be of similar ancestry to cases. The WES data came from two centers using different methods to produce sequence and to call variants from it. Therefore, an initial goal was to ensure the distribution of rare variation was similar for data from different centers. This proved straightforward by filtering called variants by fraction of missing data, read depth, and balance of alternative to reference reads. Results were evaluated using seven samples sequenced at both centers and by results from the association study. Next we addressed how the data and/or results from the centers should be combined. Gene-based analyses of association was an obvious choice, but should statistics for association be combined across centers (meta-analysis) or should data be combined and then analyzed (mega-analysis)? Because of the nature of many gene-based tests, we showed by theory and simulations that mega-analysis has better power than meta-analysis. Finally, before analyzing the data for association, we explored the impact of population structure on rare variant analysis in these data. Like other recent studies, we found evidence that population structure can confound case-control studies by the clustering of rare variants in ancestry space; yet, unlike some recent studies, for these data we found that principal component-based analyses were sufficient to control for ancestry and produce test statistics with appropriate distributions. After using a variety of gene-based tests and both meta- and mega-analysis, we found no new risk genes for ASD in this sample. Our results suggest that standard gene-based tests will require much larger samples of cases and controls before being effective for gene discovery, even for a disorder like ASD.


Assuntos
Transtornos Globais do Desenvolvimento Infantil/genética , Exoma , Estudo de Associação Genômica Ampla , Estudos de Casos e Controles , Criança , Transtornos Globais do Desenvolvimento Infantil/fisiopatologia , Predisposição Genética para Doença , Variação Genética , Humanos , Controle da População , Análise de Sequência de DNA , Software
14.
Curr Protoc Bioinformatics ; 43: 11.10.1-11.10.33, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-25431634

RESUMO

This unit describes how to use BWA and the Genome Analysis Toolkit (GATK) to map genome sequencing data to a reference and produce high-quality variant calls that can be used in downstream analyses. The complete workflow includes the core NGS data processing steps that are necessary to make the raw data suitable for analysis by the GATK, as well as the key methods involved in variant discovery using the GATK.


Assuntos
Variação Genética , Genoma Humano , Software , Calibragem , Bases de Dados Genéticas , Haploidia , Haplótipos/genética , Humanos , Anotação de Sequência Molecular , Polimorfismo de Nucleotídeo Único/genética , Alinhamento de Sequência
15.
Nature ; 485(7397): 242-5, 2012 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-22495311

RESUMO

Autism spectrum disorders (ASD) are believed to have genetic and environmental origins, yet in only a modest fraction of individuals can specific causes be identified. To identify further genetic risk factors, here we assess the role of de novo mutations in ASD by sequencing the exomes of ASD cases and their parents (n = 175 trios). Fewer than half of the cases (46.3%) carry a missense or nonsense de novo variant, and the overall rate of mutation is only modestly higher than the expected rate. In contrast, the proteins encoded by genes that harboured de novo missense or nonsense mutations showed a higher degree of connectivity among themselves and to previous ASD genes as indexed by protein-protein interaction screens. The small increase in the rate of de novo events, when taken together with the protein interaction results, are consistent with an important but limited role for de novo point mutations in ASD, similar to that documented for de novo copy number variants. Genetic models incorporating these data indicate that most of the observed de novo events are unconnected to ASD; those that do confer risk are distributed across many genes and are incompletely penetrant (that is, not necessarily sufficient for disease). Our results support polygenic models in which spontaneous coding mutations in any of a large number of genes increases risk by 5- to 20-fold. Despite the challenge posed by such models, results from de novo events and a large parallel case-control study provide strong evidence in favour of CHD8 and KATNAL2 as genuine autism risk factors.


Assuntos
Transtorno Autístico/genética , Proteínas de Ligação a DNA/genética , Éxons/genética , Predisposição Genética para Doença/genética , Mutação/genética , Fatores de Transcrição/genética , Estudos de Casos e Controles , Exoma/genética , Saúde da Família , Humanos , Modelos Genéticos , Herança Multifatorial/genética , Fenótipo , Distribuição de Poisson , Mapas de Interação de Proteínas
16.
Nat Genet ; 43(5): 491-8, 2011 May.
Artigo em Inglês | MEDLINE | ID: mdl-21478889

RESUMO

Recent advances in sequencing technology make it possible to comprehensively catalog genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious, and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (i) initial read mapping; (ii) local realignment around indels; (iii) base quality score recalibration; (iv) SNP discovery and genotyping to find all potential variants; and (v) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We here discuss the application of these tools, instantiated in the Genome Analysis Toolkit, to deep whole-genome, whole-exome capture and multi-sample low-pass (∼4×) 1000 Genomes Project datasets.


Assuntos
Variação Genética , Genótipo , Análise de Sequência de DNA/métodos , Interpretação Estatística de Dados , Bases de Dados de Ácidos Nucleicos , Éxons , Genética Populacional/métodos , Genética Populacional/estatística & dados numéricos , Genoma Humano , Humanos , Polimorfismo de Nucleotídeo Único , Alinhamento de Sequência/métodos , Alinhamento de Sequência/estatística & dados numéricos , Análise de Sequência de DNA/estatística & dados numéricos , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...