Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
medRxiv ; 2024 May 06.
Artigo em Inglês | MEDLINE | ID: mdl-38766206

RESUMO

Coding de novo mutations (DNMs) contribute to the risk for autism spectrum disorders (ASD), but the contribution of noncoding DNMs remains relatively unexplored. Here we use whole genome sequencing (WGS) data of 12,411 individuals (including 3,508 probands and 2,218 unaffected siblings) from 3,357 families collected in Simons Foundation Powering Autism Research for Knowledge (SPARK) to detect DNMs associated with ASD, while examining Simons Simplex Collection (SSC) with 6383 individuals from 2274 families to replicate the results. For coding DNMs, SCN2A reached exome-wide significance (p=2.06×10-11) in SPARK. The 618 known dominant ASD genes as a group are strongly enriched for coding DNMs in cases than sibling controls (fold change=1.51, p =1.13×10-5 for SPARK; fold change=1.86, p =2.06×10-9 for SSC). For noncoding DNMs, we used two methods to assess statistical significance: a point-based test that analyzes sites with a Combined Annotation Dependent Depletion (CADD) score ≥15, and a segment-based test that analyzes 1kb genomic segments with segment-specific background mutation rates (inferred from expected rare mutations in Gnocchi genome constraint scores). The point-based test identified SCN2A as marginally significant (p=6.12×10-4) in SPARK, yet segment-based test identified CSMD1, RBFOX1 and CHD13 as exome-wide significant. We did not identify significant enrichment of noncoding DNMs (in all 1kb segments or those with Gnocchi>4) in the 618 known ASD genes as a group in cases than sibling controls. When combining evidence from both coding and noncoding DNMs, we found that SCN2A with 11 coding and 5 noncoding DNMs exhibited the strongest significance (p=4.15×10-13). In summary, we identified both coding and noncoding DNMs in SCN2A associated with ASD, while nominating additional candidates for further examination in future studies.

2.
Nat Commun ; 15(1): 1448, 2024 Feb 16.
Artigo em Inglês | MEDLINE | ID: mdl-38365920

RESUMO

Oxford Nanopore sequencing can detect DNA methylations from ionic current signal of single molecules, offering a unique advantage over conventional methods. Additionally, adaptive sampling, a software-controlled enrichment method for targeted sequencing, allows reduced representation methylation sequencing that can be applied to CpG islands or imprinted regions. Here we present DeepMod2, a comprehensive deep-learning framework for methylation detection using ionic current signal from Nanopore sequencing. DeepMod2 implements both a bidirectional long short-term memory (BiLSTM) model and a Transformer model and can analyze POD5 and FAST5 signal files generated on R9 and R10 flowcells. Additionally, DeepMod2 can run efficiently on central processing unit (CPU) through model pruning and can infer epihaplotypes or haplotype-specific methylation calls from phased reads. We use multiple publicly available and newly generated datasets to evaluate the performance of DeepMod2 under varying scenarios. DeepMod2 has comparable performance to Guppy and Dorado, which are the current state-of-the-art methods from Oxford Nanopore Technologies that remain closed-source. Moreover, we show a high correlation (r = 0.96) between reduced representation and whole-genome Nanopore sequencing. In summary, DeepMod2 is an open-source tool that enables fast and accurate DNA methylation detection from whole-genome or adaptive sequencing data on a diverse range of flowcell types.


Assuntos
Aprendizado Profundo , Sequenciamento por Nanoporos , Nanoporos , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Metilação de DNA
3.
Patterns (N Y) ; 4(12): 100860, 2023 Dec 08.
Artigo em Inglês | MEDLINE | ID: mdl-38106613

RESUMO

Judging whether an integer can be divided by prime numbers such as 2 or 3 may appear trivial to human beings, but it can be less straightforward for computers. Here, we tested multiple deep learning architectures and feature engineering approaches to classifying integers based on their residues when divided by small prime numbers. We found that the ability of classification critically depends on the feature space. We also evaluated automated machine learning (AutoML) platforms from Amazon, Google, and Microsoft and found that, without appropriately engineered features, they failed on this task. Furthermore, we introduced a method that utilizes linear regression on Fourier series basis vectors and demonstrated its effectiveness. Finally, we evaluated large language models (LLMs) such as GPT-4, GPT-J, LLaMA, and Falcon, and we demonstrated their failures. In conclusion, feature engineering remains an important task to improve performance and increase interpretability of machine learning models, even in the era of AutoML and LLMs.

4.
Genes (Basel) ; 14(10)2023 09 29.
Artigo em Inglês | MEDLINE | ID: mdl-37895242

RESUMO

Transposable elements, such as Long INterspersed Elements (LINEs), are DNA sequences that can replicate within genomes. LINEs replicate using an RNA intermediate followed by reverse transcription and are typically a few kilobases in length. LINE activity creates genomic structural variants in human populations and leads to somatic alterations in cancer genomes. Long-read RNA sequencing technologies, including Oxford Nanopore and PacBio, can directly sequence relatively long transcripts, thus providing the opportunity to examine full-length LINE transcripts. This study focuses on the development of a new bioinformatics pipeline for the identification and quantification of active, full-length LINE transcripts in diverse human tissues and cell lines. In our pipeline, we utilized RepeatMasker to identify LINE-1 (L1) transcripts from long-read transcriptome data and incorporated several criteria, such as transcript start position, divergence, and length, to remove likely false positives. Comparisons between cancerous and normal cell lines, as well as human tissue samples, revealed elevated expression levels of young LINEs in cancer, particularly at intact L1 loci. By employing bioinformatics methodologies on long-read transcriptome data, this study demonstrates the landscape of L1 expression in tissues and cell lines.


Assuntos
Elementos Nucleotídeos Longos e Dispersos , Neoplasias , Humanos , Elementos Nucleotídeos Longos e Dispersos/genética , Linhagem Celular , Transcriptoma/genética , RNA , Neoplasias/genética
5.
Nat Methods ; 20(8): 1143-1158, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37386186

RESUMO

As long-read sequencing technologies are becoming increasingly popular, a number of methods have been developed for the discovery and analysis of structural variants (SVs) from long reads. Long reads enable detection of SVs that could not be previously detected from short-read sequencing, but computational methods must adapt to the unique challenges and opportunities presented by long-read sequencing. Here, we summarize over 50 long-read-based methods for SV detection, genotyping and visualization, and discuss how new telomere-to-telomere genome assemblies and pangenome efforts can improve the accuracy and drive the development of SV callers in the future.


Assuntos
Algoritmos , Genoma , Humanos , Análise de Sequência de DNA/métodos , Variação Estrutural do Genoma , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Genoma Humano
6.
Genome Biol ; 22(1): 261, 2021 09 06.
Artigo em Inglês | MEDLINE | ID: mdl-34488830

RESUMO

Long-read sequencing enables variant detection in genomic regions that are considered difficult-to-map by short-read sequencing. To fully exploit the benefits of longer reads, here we present a deep learning method NanoCaller, which detects SNPs using long-range haplotype information, then phases long reads with called SNPs and calls indels with local realignment. Evaluation on 8 human genomes demonstrates that NanoCaller generally achieves better performance than competing approaches. We experimentally validate 41 novel variants in a widely used benchmarking genome, which could not be reliably detected previously. In summary, NanoCaller facilitates the discovery of novel variants in complex genomic regions from long-read sequencing.


Assuntos
Haplótipos/genética , Sequenciamento de Nucleotídeos em Larga Escala , Mutação INDEL/genética , Nanopartículas/química , Redes Neurais de Computação , Polimorfismo de Nucleotídeo Único/genética , Alelos , Sequência de Bases , Benchmarking , Mapeamento Cromossômico , Genoma Humano , Humanos , Complexo Principal de Histocompatibilidade/genética , Sequenciamento por Nanoporos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...