Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 134
Filtrar
1.
Nat Commun ; 15(1): 5267, 2024 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-38902246

RESUMO

During the early stages of the SARS-CoV-2 pandemic, before vaccines were available, nonpharmaceutical interventions (NPIs) such as reducing contacts or antigenic testing were used to control viral spread. Quantifying their success is therefore key for future pandemic preparedness. Using 1.8 million SARS-CoV-2 genomes from systematic surveillance, we study viral lineage importations into Germany for the third pandemic wave from late 2020 to early 2021, using large-scale Bayesian phylogenetic and phylogeographic analysis with a longitudinal assessment of lineage importation dynamics over multiple sampling strategies. All major nationwide NPIs were followed by fewer importations, with the strongest decreases seen for free rapid tests, the strengthening of regulations on mask-wearing in public transport and stores, as well as on internal movements and gatherings. Most SARS-CoV-2 lineages first appeared in the three most populous states with most cases, and spread from there within the country. Importations rose before and peaked shortly after the Christmas holidays. The substantial effects of free rapid tests and obligatory medical/surgical mask-wearing suggests these as key for pandemic preparedness, given their relatively few negative socioeconomic effects. The approach relates environmental factors at the host population level to viral lineage dissemination, facilitating similar analyses of rapidly evolving pathogens in the future.


Assuntos
COVID-19 , Filogenia , Filogeografia , SARS-CoV-2 , Humanos , COVID-19/epidemiologia , COVID-19/virologia , COVID-19/prevenção & controle , COVID-19/transmissão , SARS-CoV-2/genética , SARS-CoV-2/classificação , Alemanha/epidemiologia , Teorema de Bayes , Genoma Viral/genética , Pandemias/prevenção & controle
2.
Commun Biol ; 7(1): 516, 2024 Apr 30.
Artigo em Inglês | MEDLINE | ID: mdl-38693292

RESUMO

The success of deep learning in various applications depends on task-specific architecture design choices, including the types, hyperparameters, and number of layers. In computational biology, there is no consensus on the optimal architecture design, and decisions are often made using insights from more well-established fields such as computer vision. These may not consider the domain-specific characteristics of genome sequences, potentially limiting performance. Here, we present GenomeNet-Architect, a neural architecture design framework that automatically optimizes deep learning models for genome sequence data. It optimizes the overall layout of the architecture, with a search space specifically designed for genomics. Additionally, it optimizes hyperparameters of individual layers and the model training procedure. On a viral classification task, GenomeNet-Architect reduced the read-level misclassification rate by 19%, with 67% faster inference and 83% fewer parameters, and achieved similar contig-level accuracy with ~100 times fewer parameters compared to the best-performing deep learning baselines.


Assuntos
Aprendizado Profundo , Genômica , Genômica/métodos , Biologia Computacional/métodos , Humanos , Redes Neurais de Computação
4.
Methods Mol Biol ; 2802: 587-609, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38819573

RESUMO

Comparative analysis of (meta)genomes necessitates aggregation, integration, and synthesis of well-annotated data using standards. The Genomic Standards Consortium (GSC) collaborates with the research community to develop and maintain the Minimum Information about any (x) Sequence (MIxS) reporting standard for genomic data. To facilitate the use of the GSC's MIxS reporting standard, we provide a description of the structure and terminology, how to navigate ontologies for required terms in MIxS, and demonstrate practical usage through a soil metagenome example.


Assuntos
Genômica , Metagenoma , Metagenômica , Metagenômica/métodos , Metagenômica/normas , Genômica/métodos , Genômica/normas , Metagenoma/genética , Bases de Dados Genéticas , Microbiologia do Solo
5.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38706320

RESUMO

The advent of rapid whole-genome sequencing has created new opportunities for computational prediction of antimicrobial resistance (AMR) phenotypes from genomic data. Both rule-based and machine learning (ML) approaches have been explored for this task, but systematic benchmarking is still needed. Here, we evaluated four state-of-the-art ML methods (Kover, PhenotypeSeeker, Seq2Geno2Pheno and Aytan-Aktug), an ML baseline and the rule-based ResFinder by training and testing each of them across 78 species-antibiotic datasets, using a rigorous benchmarking workflow that integrates three evaluation approaches, each paired with three distinct sample splitting methods. Our analysis revealed considerable variation in the performance across techniques and datasets. Whereas ML methods generally excelled for closely related strains, ResFinder excelled for handling divergent genomes. Overall, Kover most frequently ranked top among the ML approaches, followed by PhenotypeSeeker and Seq2Geno2Pheno. AMR phenotypes for antibiotic classes such as macrolides and sulfonamides were predicted with the highest accuracies. The quality of predictions varied substantially across species-antibiotic combinations, particularly for beta-lactams; across species, resistance phenotyping of the beta-lactams compound, aztreonam, amoxicillin/clavulanic acid, cefoxitin, ceftazidime and piperacillin/tazobactam, alongside tetracyclines demonstrated more variable performance than the other benchmarked antibiotics. By organism, Campylobacter jejuni and Enterococcus faecium phenotypes were more robustly predicted than those of Escherichia coli, Staphylococcus aureus, Salmonella enterica, Neisseria gonorrhoeae, Klebsiella pneumoniae, Pseudomonas aeruginosa, Acinetobacter baumannii, Streptococcus pneumoniae and Mycobacterium tuberculosis. In addition, our study provides software recommendations for each species-antibiotic combination. It furthermore highlights the need for optimization for robust clinical applications, particularly for strains that diverge substantially from those used for training.


Assuntos
Antibacterianos , Fenótipo , Antibacterianos/farmacologia , Aprendizado de Máquina , Farmacorresistência Bacteriana/genética , Biologia Computacional/métodos , Genoma Bacteriano , Genoma Microbiano , Humanos , Bactérias/genética , Bactérias/efeitos dos fármacos
7.
Nat Microbiol ; 8(11): 1960-1970, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37783751

RESUMO

Microbiome data, metadata and analytical workflows have become 'big' in terms of volume and complexity. Although the infrastructure and technologies to share data have been established, the interdisciplinary and multi-omic nature of the field can make resources difficult to identify and use. Following best practices for data deposition requires substantial effort, with sometimes little obvious reward. Gaps remain where microbiome-specific resources for data sharing or reproducibility do not yet exist. We outline available best practices, challenges to their adoption and opportunities in data sharing in microbiome research. We showcase examples of best practices and advocate for their enforcement and incentivization for data sharing. This includes recognition of data curation and sharing endeavours by individuals, institutions, journals and funders. Opportunities for progress include enabling microbiome-specific databases to incorporate future methods for data analysis, integration and reuse.


Assuntos
Microbiota , Tecnologia , Humanos , Reprodutibilidade dos Testes , Disseminação de Informação , Bases de Dados Factuais
8.
Commun Biol ; 6(1): 928, 2023 09 11.
Artigo em Inglês | MEDLINE | ID: mdl-37696966

RESUMO

Deep learning in bioinformatics is often limited to problems where extensive amounts of labeled data are available for supervised classification. By exploiting unlabeled data, self-supervised learning techniques can improve the performance of machine learning models in the presence of limited labeled data. Although many self-supervised learning methods have been suggested before, they have failed to exploit the unique characteristics of genomic data. Therefore, we introduce Self-GenomeNet, a self-supervised learning technique that is custom-tailored for genomic data. Self-GenomeNet leverages reverse-complement sequences and effectively learns short- and long-term dependencies by predicting targets of different lengths. Self-GenomeNet performs better than other self-supervised methods in data-scarce genomic tasks and outperforms standard supervised training with ~10 times fewer labeled training data. Furthermore, the learned representations generalize well to new datasets and tasks. These findings suggest that Self-GenomeNet is well suited for large-scale, unlabeled genomic datasets and could substantially improve the performance of genomic models.


Assuntos
Aprendizado Profundo , Genômica , Biologia Computacional , Aprendizado de Máquina
9.
Cell Rep Med ; 4(9): 101152, 2023 09 19.
Artigo em Inglês | MEDLINE | ID: mdl-37572667

RESUMO

Male sex represents one of the major risk factors for severe COVID-19 outcome. However, underlying mechanisms that mediate sex-dependent disease outcome are as yet unknown. Here, we identify the CYP19A1 gene encoding for the testosterone-to-estradiol metabolizing enzyme CYP19A1 (also known as aromatase) as a host factor that contributes to worsened disease outcome in SARS-CoV-2-infected males. We analyzed exome sequencing data obtained from a human COVID-19 cohort (n = 2,866) using a machine-learning approach and identify a CYP19A1-activity-increasing mutation to be associated with the development of severe disease in men but not women. We further analyzed human autopsy-derived lungs (n = 86) and detect increased pulmonary CYP19A1 expression at the time point of death in men compared with women. In the golden hamster model, we show that SARS-CoV-2 infection causes increased CYP19A1 expression in the lung that is associated with dysregulated plasma sex hormone levels and reduced long-term pulmonary function in males but not females. Treatment of SARS-CoV-2-infected hamsters with a clinically approved CYP19A1 inhibitor (letrozole) improves impaired lung function and supports recovery of imbalanced sex hormones specifically in males. Our study identifies CYP19A1 as a contributor to sex-specific SARS-CoV-2 disease outcome in males. Furthermore, inhibition of CYP19A1 by the clinically approved drug letrozole may furnish a new therapeutic strategy for individualized patient management and treatment.


Assuntos
Aromatase , COVID-19 , Feminino , Humanos , Masculino , Aromatase/genética , Letrozol , SARS-CoV-2 , COVID-19/genética , Estradiol , Testosterona
10.
Cell Host Microbe ; 31(6): 1007-1020.e4, 2023 06 14.
Artigo em Inglês | MEDLINE | ID: mdl-37279755

RESUMO

Bacteria can evolve to withstand a wide range of antibiotics (ABs) by using various resistance mechanisms. How ABs affect the ecology of the gut microbiome is still poorly understood. We investigated strain-specific responses and evolution during repeated AB perturbations by three clinically relevant ABs, using gnotobiotic mice colonized with a synthetic bacterial community (oligo-mouse-microbiota). Over 80 days, we observed resilience effects at the strain and community levels, and we found that they were correlated with modulations of the estimated growth rate and levels of prophage induction as determined from metagenomics data. Moreover, we tracked mutational changes in the bacterial populations, and this uncovered clonal expansion and contraction of haplotypes and selection of putative AB resistance-conferring SNPs. We functionally verified these mutations via reisolation of clones with increased minimum inhibitory concentration (MIC) of ciprofloxacin and tetracycline from evolved communities. This demonstrates that host-associated microbial communities employ various mechanisms to respond to selective pressures that maintain community stability.


Assuntos
Microbioma Gastrointestinal , Microbiota , Animais , Camundongos , Antibacterianos/farmacologia , Bactérias/genética , Vida Livre de Germes
11.
Bioinformatics ; 39(2)2023 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-36786404

RESUMO

MOTIVATION: Gene annotation is the problem of mapping proteins to their functions represented as Gene Ontology (GO) terms, typically inferred based on the primary sequences. Gene annotation is a multi-label multi-class classification problem, which has generated growing interest for its uses in the characterization of millions of proteins with unknown functions. However, there is no standard GO dataset used for benchmarking the newly developed new machine learning models within the bioinformatics community. Thus, the significance of improvements for these models remains unclear. RESULTS: The Gene Benchmarking database is the first effort to provide an easy-to-use and configurable hub for the learning and evaluation of gene annotation models. It provides easy access to pre-specified datasets and takes the non-trivial steps of preprocessing and filtering all data according to custom presets using a web interface. The GO bench web application can also be used to evaluate and display any trained model on leaderboards for annotation tasks. AVAILABILITY AND IMPLEMENTATION: The GO Benchmarking dataset is freely available at www.gobench.org. Code is hosted at github.com/mofradlab, with repositories for website code, core utilities and examples of usage (Supplementary Section S.7). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Benchmarking , Software , Anotação de Sequência Molecular , Ontologia Genética , Aprendizado de Máquina , Proteínas/metabolismo
12.
Nature ; 613(7945): 639-649, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36697862

RESUMO

Whether the human fetus and the prenatal intrauterine environment (amniotic fluid and placenta) are stably colonized by microbial communities in a healthy pregnancy remains a subject of debate. Here we evaluate recent studies that characterized microbial populations in human fetuses from the perspectives of reproductive biology, microbial ecology, bioinformatics, immunology, clinical microbiology and gnotobiology, and assess possible mechanisms by which the fetus might interact with microorganisms. Our analysis indicates that the detected microbial signals are likely the result of contamination during the clinical procedures to obtain fetal samples or during DNA extraction and DNA sequencing. Furthermore, the existence of live and replicating microbial populations in healthy fetal tissues is not compatible with fundamental concepts of immunology, clinical microbiology and the derivation of germ-free mammals. These conclusions are important to our understanding of human immune development and illustrate common pitfalls in the microbial analyses of many other low-biomass environments. The pursuit of a fetal microbiome serves as a cautionary example of the challenges of sequence-based microbiome studies when biomass is low or absent, and emphasizes the need for a trans-disciplinary approach that goes beyond contamination controls by also incorporating biological, ecological and mechanistic concepts.


Assuntos
Biomassa , Contaminação por DNA , Feto , Microbiota , Animais , Feminino , Humanos , Gravidez , Líquido Amniótico/imunologia , Líquido Amniótico/microbiologia , Mamíferos , Microbiota/genética , Placenta/imunologia , Placenta/microbiologia , Feto/imunologia , Feto/microbiologia , Reprodutibilidade dos Testes
13.
BMC Genomics ; 23(1): 624, 2022 Aug 30.
Artigo em Inglês | MEDLINE | ID: mdl-36042406

RESUMO

BACKGROUND: Selection of optimal computational strategies for analyzing metagenomics data is a decisive step in determining the microbial composition of a sample, and this procedure is complex because of the numerous tools currently available. The aim of this research was to summarize the results of crowdsourced sbv IMPROVER Microbiomics Challenge designed to evaluate the performance of off-the-shelf metagenomics software as well as to investigate the robustness of these results by the extended post-challenge analysis. In total 21 off-the-shelf taxonomic metagenome profiling pipelines were benchmarked for their capacity to identify the microbiome composition at various taxon levels across 104 shotgun metagenomics datasets of bacterial genomes (representative of various microbiome samples) from public databases. Performance was determined by comparing predicted taxonomy profiles with the gold standard. RESULTS: Most taxonomic profilers performed homogeneously well at the phylum level but generated intermediate and heterogeneous scores at the genus and species levels, respectively. kmer-based pipelines using Kraken with and without Bracken or using CLARK-S performed best overall, but they exhibited lower precision than the two marker-gene-based methods MetaPhlAn and mOTU. Filtering out the 1% least abundance species-which were not reliably predicted-helped increase the performance of most profilers by increasing precision but at the cost of recall. However, the use of adaptive filtering thresholds determined from the sample's Shannon index increased the performance of most kmer-based profilers while mitigating the tradeoff between precision and recall. CONCLUSIONS: kmer-based metagenomic pipelines using Kraken/Bracken or CLARK-S performed most robustly across a large variety of microbiome datasets. Removing non-reliably predicted low-abundance species by using diversity-dependent adaptive filtering thresholds further enhanced the performance of these tools. This work demonstrates the applicability of computational pipelines for accurately determining taxonomic profiles in clinical and environmental contexts and exemplifies the power of crowdsourcing for unbiased evaluation.


Assuntos
Crowdsourcing , Metagenoma , Benchmarking , Metagenômica/métodos , Software
14.
PLoS Comput Biol ; 18(8): e1009100, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-35951662

RESUMO

Single-cell genome sequencing provides a highly granular view of biological systems but is affected by high error rates, allelic amplification bias, and uneven genome coverage. This creates a need for data-specific computational methods, for purposes such as for cell lineage tree inference. The objective of cell lineage tree reconstruction is to infer the evolutionary process that generated a set of observed cell genomes. Lineage trees may enable a better understanding of tumor formation and growth, as well as of organ development for healthy body cells. We describe a method, Scelestial, for lineage tree reconstruction from single-cell data, which is based on an approximation algorithm for the Steiner tree problem and is a generalization of the neighbor-joining method. We adapt the algorithm to efficiently select a limited subset of potential sequences as internal nodes, in the presence of missing values, and to minimize cost by lineage tree-based missing value imputation. In a comparison against seven state-of-the-art single-cell lineage tree reconstruction algorithms-BitPhylogeny, OncoNEM, SCITE, SiFit, SASC, SCIPhI, and SiCloneFit-on simulated and real single-cell tumor samples, Scelestial performed best at reconstructing trees in terms of accuracy and run time. Scelestial has been implemented in C++. It is also available as an R package named RScelestial.


Assuntos
Algoritmos , Neoplasias , Evolução Biológica , Linhagem da Célula/genética , Humanos , Modelos Genéticos , Filogenia
15.
Environ Microbiome ; 17(1): 33, 2022 Jun 24.
Artigo em Inglês | MEDLINE | ID: mdl-35751093

RESUMO

BACKGROUND: Tremendous amounts of data generated from microbiome research studies during the last decades require not only standards for sampling and preparation of omics data but also clear concepts of how the metadata is prepared to ensure re-use for integrative and interdisciplinary microbiome analysis. RESULTS: In this Commentary, we present our views on the key issues related to the current system for metadata submission in omics research, and propose the development of a global metadata system. Such a system should be easy to use, clearly structured in a hierarchical way, and should be compatible with all existing microbiome data repositories, following common standards for minimal required information and common ontology. Although minimum metadata requirements are essential for microbiome datasets, the immense technological progress requires a flexible system, which will have to be constantly improved and re-thought. While FAIR principles (Findable, Accessible, Interoperable, and Reusable) are already considered, international legal issues on genetic resource and sequence sharing provided by the Convention on Biological Diversity need more awareness and engagement of the scientific community. CONCLUSIONS: The suggested approach for metadata entries would strongly improve retrieving and re-using data as demonstrated in several representative use cases. These integrative analyses, in turn, would further advance the potential of microbiome research for novel scientific discoveries and the development of microbiome-derived products.

16.
Emerg Microbes Infect ; 11(1): 1037-1048, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-35320064

RESUMO

The coronavirus SARS-CoV-2 is the causative agent for the disease COVID-19. To capture the IgA, IgG, and IgM antibody response of patients infected with SARS-CoV-2 at individual epitope resolution, we constructed planar microarrays of 648 overlapping peptides that cover the four major structural proteins S(pike), N(ucleocapsid), M(embrane), and E(nvelope). The arrays were incubated with sera of 67 SARS-CoV-2 positive and 22 negative control samples. Specific responses to SARS-CoV-2 were detectable, and nine peptides were associated with a more severe course of the disease. A random forest model disclosed that antibody binding to 21 peptides, mostly localized in the S protein, was associated with higher neutralization values in cellular anti-SARS-CoV-2 assays. For antibodies addressing the N-terminus of M, or peptides close to the fusion region of S, protective effects were proven by antibody depletion and neutralization assays. The study pinpoints unusual viral binding epitopes that might be suited as vaccine candidates.


Assuntos
COVID-19 , SARS-CoV-2 , Anticorpos Neutralizantes , Anticorpos Antivirais , Formação de Anticorpos , Epitopos , Humanos , Aprendizado de Máquina , Peptídeos , Glicoproteína da Espícula de Coronavírus
17.
Nucleic Acids Res ; 50(10): e60, 2022 06 10.
Artigo em Inglês | MEDLINE | ID: mdl-35188571

RESUMO

Advances in transcriptomic and translatomic techniques enable in-depth studies of RNA activity profiles and RNA-based regulatory mechanisms. Ribosomal RNA (rRNA) sequences are highly abundant among cellular RNA, but if the target sequences do not include polyadenylation, these cannot be easily removed in library preparation, requiring their post-hoc removal with computational techniques to accelerate and improve downstream analyses. Here, we describe RiboDetector, a novel software based on a Bi-directional Long Short-Term Memory (BiLSTM) neural network, which rapidly and accurately identifies rRNA reads from transcriptomic, metagenomic, metatranscriptomic, noncoding RNA, and ribosome profiling sequence data. Compared with state-of-the-art approaches, RiboDetector produced at least six times fewer misclassifications on the benchmark datasets. Importantly, the few false positives of RiboDetector were not enriched in certain Gene Ontology (GO) terms, suggesting a low bias for downstream functional profiling. RiboDetector also demonstrated a remarkable generalizability for detecting novel rRNA sequences that are divergent from the training data with sequence identities of <90%. On a personal computer, RiboDetector processed 40M reads in less than 6 min, which was ∼50 times faster in GPU mode and ∼15 times in CPU mode than other methods. RiboDetector is available under a GPL v3.0 license at https://github.com/hzi-bifo/RiboDetector.


Assuntos
Aprendizado Profundo , RNA Ribossômico , Metagenômica/métodos , RNA , RNA Ribossômico/genética , Software
18.
Nature ; 601(7894): 508, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-35042983
19.
IEEE/ACM Trans Comput Biol Bioinform ; 19(6): 3744-3753, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34460382

RESUMO

Pretrained representations have recently gained attention in various machine learning applications. Nonetheless, the high computational costs associated with training these models have motivated alternative approaches for representation learning. Herein we introduce TripletProt, a new approach for protein representation learning based on the Siamese neural networks. Representation learning of biological entities which capture essential features can alleviate many of the challenges associated with supervised learning in bioinformatics. The most important distinction of our proposed method is relying on the protein-protein interaction (PPI) network. The computational cost of the generated representations for any potential application is significantly lower than comparable methods since the length of the representations is significantly smaller than that in other approaches. TripletProt offers great potentials for the protein informatics tasks and can be widely applied to similar tasks. We evaluate TripletProt comprehensively in protein functional annotation tasks including sub-cellular localization (14 categories) and gene ontology prediction (more than 2000 classes), which are both challenging multi-class, multi-label classification machine learning problems. We compare the performance of TripletProt with the state-of-the-art approaches including a recurrent language model-based approach (i.e., UniRep), as well as a protein-protein interaction (PPI) network and sequence-based method (i.e., DeepGO). Our TripletProt showed an overall improvement of F1 score in the above mentioned comprehensive functional annotation tasks, solely relying on the PPI network. Availability: The source code and datasets are available at https://github.com/EsmaeilNourani/TripletProt.


Assuntos
Redes Neurais de Computação , Proteínas , Proteínas/metabolismo , Software , Mapas de Interação de Proteínas , Idioma
20.
Nat Commun ; 12(1): 6744, 2021 11 18.
Artigo em Inglês | MEDLINE | ID: mdl-34795237

RESUMO

Accurate single cell mutational profiles can reveal genomic cell-to-cell heterogeneity. However, sequencing libraries suitable for genotyping require whole genome amplification, which introduces allelic bias and copy errors. The resulting data violates assumptions of variant callers developed for bulk sequencing. Thus, only dedicated models accounting for amplification bias and errors can provide accurate calls. We present ProSolo for calling single nucleotide variants from multiple displacement amplified (MDA) single cell DNA sequencing data. ProSolo probabilistically models a single cell jointly with a bulk sequencing sample and integrates all relevant MDA biases in a site-specific and scalable-because computationally efficient-manner. This achieves a higher accuracy in calling and genotyping single nucleotide variants in single cells in comparison to state-of-the-art tools and supports imputation of insufficiently covered genotypes, when downstream tools cannot handle missing data. Moreover, ProSolo implements the first approach to control the false discovery rate reliably and flexibly. ProSolo is implemented in an extendable framework, with code and usage at: https://github.com/prosolo/prosolo.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Célula Única/métodos , Software , Técnicas Genéticas , Genômica/métodos , Humanos , Mutação , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...