Pesquisa | Portal Regional da BVS (teste)

1.

Importations of SARS-CoV-2 lineages decline after nonpharmaceutical interventions in phylogeographic analyses.

Goliaei, Sama; Foroughmand-Araabi, Mohammad-Hadi; Roddy, Aideen; Weber, Ariane; Översti, Sanni; Kühnert, Denise; McHardy, Alice C.

Nat Commun ; 15(1): 5267, 2024 Jun 20.

Artigo em Inglês | MEDLINE | ID: mdl-38902246

RESUMO

During the early stages of the SARS-CoV-2 pandemic, before vaccines were available, nonpharmaceutical interventions (NPIs) such as reducing contacts or antigenic testing were used to control viral spread. Quantifying their success is therefore key for future pandemic preparedness. Using 1.8 million SARS-CoV-2 genomes from systematic surveillance, we study viral lineage importations into Germany for the third pandemic wave from late 2020 to early 2021, using large-scale Bayesian phylogenetic and phylogeographic analysis with a longitudinal assessment of lineage importation dynamics over multiple sampling strategies. All major nationwide NPIs were followed by fewer importations, with the strongest decreases seen for free rapid tests, the strengthening of regulations on mask-wearing in public transport and stores, as well as on internal movements and gatherings. Most SARS-CoV-2 lineages first appeared in the three most populous states with most cases, and spread from there within the country. Importations rose before and peaked shortly after the Christmas holidays. The substantial effects of free rapid tests and obligatory medical/surgical mask-wearing suggests these as key for pandemic preparedness, given their relatively few negative socioeconomic effects. The approach relates environmental factors at the host population level to viral lineage dissemination, facilitating similar analyses of rapidly evolving pathogens in the future.

Assuntos

COVID-19 , Filogenia , Filogeografia , SARS-CoV-2 , Humanos , COVID-19/epidemiologia , COVID-19/virologia , COVID-19/prevenção & controle , COVID-19/transmissão , SARS-CoV-2/genética , SARS-CoV-2/classificação , Alemanha/epidemiologia , Teorema de Bayes , Genoma Viral/genética , Pandemias/prevenção & controle

2.

Assessing computational predictions of antimicrobial resistance phenotypes from microbial genomes.

Hu, Kaixin; Meyer, Fernando; Deng, Zhi-Luo; Asgari, Ehsaneddin; Kuo, Tzu-Hao; Münch, Philipp C; McHardy, Alice C.

Brief Bioinform ; 25(3)2024 Mar 27.

Artigo em Inglês | MEDLINE | ID: mdl-38706320

RESUMO

The advent of rapid whole-genome sequencing has created new opportunities for computational prediction of antimicrobial resistance (AMR) phenotypes from genomic data. Both rule-based and machine learning (ML) approaches have been explored for this task, but systematic benchmarking is still needed. Here, we evaluated four state-of-the-art ML methods (Kover, PhenotypeSeeker, Seq2Geno2Pheno and Aytan-Aktug), an ML baseline and the rule-based ResFinder by training and testing each of them across 78 species-antibiotic datasets, using a rigorous benchmarking workflow that integrates three evaluation approaches, each paired with three distinct sample splitting methods. Our analysis revealed considerable variation in the performance across techniques and datasets. Whereas ML methods generally excelled for closely related strains, ResFinder excelled for handling divergent genomes. Overall, Kover most frequently ranked top among the ML approaches, followed by PhenotypeSeeker and Seq2Geno2Pheno. AMR phenotypes for antibiotic classes such as macrolides and sulfonamides were predicted with the highest accuracies. The quality of predictions varied substantially across species-antibiotic combinations, particularly for beta-lactams; across species, resistance phenotyping of the beta-lactams compound, aztreonam, amoxicillin/clavulanic acid, cefoxitin, ceftazidime and piperacillin/tazobactam, alongside tetracyclines demonstrated more variable performance than the other benchmarked antibiotics. By organism, Campylobacter jejuni and Enterococcus faecium phenotypes were more robustly predicted than those of Escherichia coli, Staphylococcus aureus, Salmonella enterica, Neisseria gonorrhoeae, Klebsiella pneumoniae, Pseudomonas aeruginosa, Acinetobacter baumannii, Streptococcus pneumoniae and Mycobacterium tuberculosis. In addition, our study provides software recommendations for each species-antibiotic combination. It furthermore highlights the need for optimization for robust clinical applications, particularly for strains that diverge substantially from those used for training.

Assuntos

Antibacterianos , Fenótipo , Antibacterianos/farmacologia , Aprendizado de Máquina , Farmacorresistência Bacteriana/genética , Biologia Computacional/métodos , Genoma Bacteriano , Genoma Microbiano , Humanos , Bactérias/genética , Bactérias/efeitos dos fármacos

3.

Optimized model architectures for deep learning on genomic data.

Gündüz, Hüseyin Anil; Mreches, René; Moosbauer, Julia; Robertson, Gary; To, Xiao-Yin; Franzosa, Eric A; Huttenhower, Curtis; Rezaei, Mina; McHardy, Alice C; Bischl, Bernd; Münch, Philipp C; Binder, Martin.

Commun Biol ; 7(1): 516, 2024 Apr 30.

Artigo em Inglês | MEDLINE | ID: mdl-38693292

RESUMO

The success of deep learning in various applications depends on task-specific architecture design choices, including the types, hyperparameters, and number of layers. In computational biology, there is no consensus on the optimal architecture design, and decisions are often made using insights from more well-established fields such as computer vision. These may not consider the domain-specific characteristics of genome sequences, potentially limiting performance. Here, we present GenomeNet-Architect, a neural architecture design framework that automatically optimizes deep learning models for genome sequence data. It optimizes the overall layout of the architecture, with a search space specifically designed for genomics. Additionally, it optimizes hyperparameters of individual layers and the model training procedure. On a viral classification task, GenomeNet-Architect reduced the read-level misclassification rate by 19%, with 67% faster inference and 83% fewer parameters, and achieved similar contig-level accuracy with ~100 times fewer parameters compared to the best-performing deep learning baselines.

Assuntos

Aprendizado Profundo , Genômica , Genômica/métodos , Biologia Computacional/métodos , Humanos , Redes Neurais de Computação

4.

Author Correction: Optimized model architectures for deep learning on genomic data.

Gündüz, Hüseyin Anil; Mreches, René; Moosbauer, Julia; Robertson, Gary; To, Xiao-Yin; Franzosa, Eric A; Huttenhower, Curtis; Rezaei, Mina; McHardy, Alice C; Bischl, Bernd; Münch, Philipp C; Binder, Martin.

Commun Biol ; 7(1): 625, 2024 May 23.

Artigo em Inglês | MEDLINE | ID: mdl-38783006

5.

A self-supervised deep learning method for data-efficient training in genomics.

Gündüz, Hüseyin Anil; Binder, Martin; To, Xiao-Yin; Mreches, René; Bischl, Bernd; McHardy, Alice C; Münch, Philipp C; Rezaei, Mina.

Commun Biol ; 6(1): 928, 2023 09 11.

Artigo em Inglês | MEDLINE | ID: mdl-37696966

RESUMO

Deep learning in bioinformatics is often limited to problems where extensive amounts of labeled data are available for supervised classification. By exploiting unlabeled data, self-supervised learning techniques can improve the performance of machine learning models in the presence of limited labeled data. Although many self-supervised learning methods have been suggested before, they have failed to exploit the unique characteristics of genomic data. Therefore, we introduce Self-GenomeNet, a self-supervised learning technique that is custom-tailored for genomic data. Self-GenomeNet leverages reverse-complement sequences and effectively learns short- and long-term dependencies by predicting targets of different lengths. Self-GenomeNet performs better than other self-supervised methods in data-scarce genomic tasks and outperforms standard supervised training with ~10 times fewer labeled training data. Furthermore, the learned representations generalize well to new datasets and tasks. These findings suggest that Self-GenomeNet is well suited for large-scale, unlabeled genomic datasets and could substantially improve the performance of genomic models.

Assuntos

Aprendizado Profundo , Genômica , Biologia Computacional , Aprendizado de Máquina

6.

CYP19A1 mediates severe SARS-CoV-2 disease outcome in males.

Stanelle-Bertram, Stephanie; Beck, Sebastian; Mounogou, Nancy Kouassi; Schaumburg, Berfin; Stoll, Fabian; Al Jawazneh, Amirah; Schmal, Zoé; Bai, Tian; Zickler, Martin; Beythien, Georg; Becker, Kathrin; de la Roi, Madeleine; Heinrich, Fabian; Schulz, Claudia; Sauter, Martina; Krasemann, Susanne; Lange, Philine; Heinemann, Axel; van Riel, Debby; Leijten, Lonneke; Bauer, Lisa; van den Bosch, Thierry P P; Lopuhaä, Boaz; Busche, Tobias; Wibberg, Daniel; Schaudien, Dirk; Goldmann, Torsten; Lüttjohann, Anna; Ruschinski, Jenny; Jania, Hanna; Müller, Zacharias; Pinho Dos Reis, Vinicius; Krupp-Buzimkic, Vanessa; Wolff, Martin; Fallerini, Chiara; Baldassarri, Margherita; Furini, Simone; Norwood, Katrina; Käufer, Christopher; Schützenmeister, Nina; von Köckritz-Blickwede, Maren; Schroeder, Maria; Jarczak, Dominik; Nierhaus, Axel; Welte, Tobias; Kluge, Stefan; McHardy, Alice C; Sommer, Frank; Kalinowski, Jörn; Krauss-Etschmann, Susanne.

Cell Rep Med ; 4(9): 101152, 2023 09 19.

Artigo em Inglês | MEDLINE | ID: mdl-37572667

RESUMO

Male sex represents one of the major risk factors for severe COVID-19 outcome. However, underlying mechanisms that mediate sex-dependent disease outcome are as yet unknown. Here, we identify the CYP19A1 gene encoding for the testosterone-to-estradiol metabolizing enzyme CYP19A1 (also known as aromatase) as a host factor that contributes to worsened disease outcome in SARS-CoV-2-infected males. We analyzed exome sequencing data obtained from a human COVID-19 cohort (n = 2,866) using a machine-learning approach and identify a CYP19A1-activity-increasing mutation to be associated with the development of severe disease in men but not women. We further analyzed human autopsy-derived lungs (n = 86) and detect increased pulmonary CYP19A1 expression at the time point of death in men compared with women. In the golden hamster model, we show that SARS-CoV-2 infection causes increased CYP19A1 expression in the lung that is associated with dysregulated plasma sex hormone levels and reduced long-term pulmonary function in males but not females. Treatment of SARS-CoV-2-infected hamsters with a clinically approved CYP19A1 inhibitor (letrozole) improves impaired lung function and supports recovery of imbalanced sex hormones specifically in males. Our study identifies CYP19A1 as a contributor to sex-specific SARS-CoV-2 disease outcome in males. Furthermore, inhibition of CYP19A1 by the clinically approved drug letrozole may furnish a new therapeutic strategy for individualized patient management and treatment.

Assuntos

Aromatase , COVID-19 , Feminino , Humanos , Masculino , Aromatase/genética , Letrozol , SARS-CoV-2 , COVID-19/genética , Estradiol , Testosterona

7.

Pulsed antibiotic treatments of gnotobiotic mice manifest in complex bacterial community dynamics and resistance effects.

Münch, Philipp C; Eberl, Claudia; Woelfel, Simon; Ring, Diana; Fritz, Adrian; Herp, Simone; Lade, Iris; Geffers, Robert; Franzosa, Eric A; Huttenhower, Curtis; McHardy, Alice C; Stecher, Bärbel.

Cell Host Microbe ; 31(6): 1007-1020.e4, 2023 06 14.

Artigo em Inglês | MEDLINE | ID: mdl-37279755

RESUMO

Bacteria can evolve to withstand a wide range of antibiotics (ABs) by using various resistance mechanisms. How ABs affect the ecology of the gut microbiome is still poorly understood. We investigated strain-specific responses and evolution during repeated AB perturbations by three clinically relevant ABs, using gnotobiotic mice colonized with a synthetic bacterial community (oligo-mouse-microbiota). Over 80 days, we observed resilience effects at the strain and community levels, and we found that they were correlated with modulations of the estimated growth rate and levels of prophage induction as determined from metagenomics data. Moreover, we tracked mutational changes in the bacterial populations, and this uncovered clonal expansion and contraction of haplotypes and selection of putative AB resistance-conferring SNPs. We functionally verified these mutations via reisolation of clones with increased minimum inhibitory concentration (MIC) of ciprofloxacin and tetracycline from evolved communities. This demonstrates that host-associated microbial communities employ various mechanisms to respond to selective pressures that maintain community stability.

Assuntos

Microbioma Gastrointestinal , Microbiota , Animais , Camundongos , Antibacterianos/farmacologia , Bactérias/genética , Vida Livre de Germes

8.

GO Bench: shared hub for universal benchmarking of machine learning-based protein functional annotations.

Dickson, Andrew; Asgari, Ehsaneddin; McHardy, Alice C; Mofrad, Mohammad R K.

Bioinformatics ; 39(2)2023 02 03.

Artigo em Inglês | MEDLINE | ID: mdl-36786404

RESUMO

MOTIVATION: Gene annotation is the problem of mapping proteins to their functions represented as Gene Ontology (GO) terms, typically inferred based on the primary sequences. Gene annotation is a multi-label multi-class classification problem, which has generated growing interest for its uses in the characterization of millions of proteins with unknown functions. However, there is no standard GO dataset used for benchmarking the newly developed new machine learning models within the bioinformatics community. Thus, the significance of improvements for these models remains unclear. RESULTS: The Gene Benchmarking database is the first effort to provide an easy-to-use and configurable hub for the learning and evaluation of gene annotation models. It provides easy access to pre-specified datasets and takes the non-trivial steps of preprocessing and filtering all data according to custom presets using a web interface. The GO bench web application can also be used to evaluate and display any trained model on leaderboards for annotation tasks. AVAILABILITY AND IMPLEMENTATION: The GO Benchmarking dataset is freely available at www.gobench.org. Code is hosted at github.com/mofradlab, with repositories for website code, core utilities and examples of usage (Supplementary Section S.7). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Benchmarking , Software , Anotação de Sequência Molecular , Ontologia Genética , Aprendizado de Máquina , Proteínas/metabolismo

9.

Crowdsourced benchmarking of taxonomic metagenome profilers: lessons learned from the sbv IMPROVER Microbiomics challenge.

Poussin, Carine; Khachatryan, Lusine; Sierro, Nicolas; Narsapuram, Vijay Kumar; Meyer, Fernando; Kaikala, Vinay; Chawla, Vandna; Muppirala, Usha; Kumar, Sunil; Belcastro, Vincenzo; Battey, James N D; Scotti, Elena; Boué, Stéphanie; McHardy, Alice C; Peitsch, Manuel C; Ivanov, Nikolai V; Hoeng, Julia.

BMC Genomics ; 23(1): 624, 2022 Aug 30.

Artigo em Inglês | MEDLINE | ID: mdl-36042406

RESUMO

BACKGROUND: Selection of optimal computational strategies for analyzing metagenomics data is a decisive step in determining the microbial composition of a sample, and this procedure is complex because of the numerous tools currently available. The aim of this research was to summarize the results of crowdsourced sbv IMPROVER Microbiomics Challenge designed to evaluate the performance of off-the-shelf metagenomics software as well as to investigate the robustness of these results by the extended post-challenge analysis. In total 21 off-the-shelf taxonomic metagenome profiling pipelines were benchmarked for their capacity to identify the microbiome composition at various taxon levels across 104 shotgun metagenomics datasets of bacterial genomes (representative of various microbiome samples) from public databases. Performance was determined by comparing predicted taxonomy profiles with the gold standard. RESULTS: Most taxonomic profilers performed homogeneously well at the phylum level but generated intermediate and heterogeneous scores at the genus and species levels, respectively. kmer-based pipelines using Kraken with and without Bracken or using CLARK-S performed best overall, but they exhibited lower precision than the two marker-gene-based methods MetaPhlAn and mOTU. Filtering out the 1% least abundance species-which were not reliably predicted-helped increase the performance of most profilers by increasing precision but at the cost of recall. However, the use of adaptive filtering thresholds determined from the sample's Shannon index increased the performance of most kmer-based profilers while mitigating the tradeoff between precision and recall. CONCLUSIONS: kmer-based metagenomic pipelines using Kraken/Bracken or CLARK-S performed most robustly across a large variety of microbiome datasets. Removing non-reliably predicted low-abundance species by using diversity-dependent adaptive filtering thresholds further enhanced the performance of these tools. This work demonstrates the applicability of computational pipelines for accurately determining taxonomic profiles in clinical and environmental contexts and exemplifies the power of crowdsourcing for unbiased evaluation.

Assuntos

Crowdsourcing , Metagenoma , Benchmarking , Metagenômica/métodos , Software

10.

Scelestial: Fast and accurate single-cell lineage tree inference based on a Steiner tree approximation algorithm.

Foroughmand-Araabi, Mohammad-Hadi; Goliaei, Sama; McHardy, Alice C.

PLoS Comput Biol ; 18(8): e1009100, 2022 08.

Artigo em Inglês | MEDLINE | ID: mdl-35951662

RESUMO

Single-cell genome sequencing provides a highly granular view of biological systems but is affected by high error rates, allelic amplification bias, and uneven genome coverage. This creates a need for data-specific computational methods, for purposes such as for cell lineage tree inference. The objective of cell lineage tree reconstruction is to infer the evolutionary process that generated a set of observed cell genomes. Lineage trees may enable a better understanding of tumor formation and growth, as well as of organ development for healthy body cells. We describe a method, Scelestial, for lineage tree reconstruction from single-cell data, which is based on an approximation algorithm for the Steiner tree problem and is a generalization of the neighbor-joining method. We adapt the algorithm to efficiently select a limited subset of potential sequences as internal nodes, in the presence of missing values, and to minimize cost by lineage tree-based missing value imputation. In a comparison against seven state-of-the-art single-cell lineage tree reconstruction algorithms-BitPhylogeny, OncoNEM, SCITE, SiFit, SASC, SCIPhI, and SiCloneFit-on simulated and real single-cell tumor samples, Scelestial performed best at reconstructing trees in terms of accuracy and run time. Scelestial has been implemented in C++. It is also available as an R package named RScelestial.

Assuntos

Algoritmos , Neoplasias , Evolução Biológica , Linhagem da Célula/genética , Humanos , Modelos Genéticos , Filogenia

11.

Peptide microarrays coupled to machine learning reveal individual epitopes from human antibody responses with neutralizing capabilities against SARS-CoV-2.

Hotop, Sven-Kevin; Reimering, Susanne; Shekhar, Aditya; Asgari, Ehsaneddin; Beutling, Ulrike; Dahlke, Christine; Fathi, Anahita; Khan, Fawad; Lütgehetmann, Marc; Ballmann, Rico; Gerstner, Andreas; Tegge, Werner; Cicin-Sain, Luka; Bilitewski, Ursula; McHardy, Alice C; Brönstrup, Mark.

Emerg Microbes Infect ; 11(1): 1037-1048, 2022 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-35320064

RESUMO

The coronavirus SARS-CoV-2 is the causative agent for the disease COVID-19. To capture the IgA, IgG, and IgM antibody response of patients infected with SARS-CoV-2 at individual epitope resolution, we constructed planar microarrays of 648 overlapping peptides that cover the four major structural proteins S(pike), N(ucleocapsid), M(embrane), and E(nvelope). The arrays were incubated with sera of 67 SARS-CoV-2 positive and 22 negative control samples. Specific responses to SARS-CoV-2 were detectable, and nine peptides were associated with a more severe course of the disease. A random forest model disclosed that antibody binding to 21 peptides, mostly localized in the S protein, was associated with higher neutralization values in cellular anti-SARS-CoV-2 assays. For antibodies addressing the N-terminus of M, or peptides close to the fusion region of S, protective effects were proven by antibody depletion and neutralization assays. The study pinpoints unusual viral binding epitopes that might be suited as vaccine candidates.

Assuntos

COVID-19 , SARS-CoV-2 , Anticorpos Neutralizantes , Anticorpos Antivirais , Formação de Anticorpos , Epitopos , Humanos , Aprendizado de Máquina , Peptídeos , Glicoproteína da Espícula de Coronavírus

12.

Rapid and accurate identification of ribosomal RNA sequences via deep learning.

Deng, Zhi-Luo; Münch, Philipp C; Mreches, René; McHardy, Alice C.

Nucleic Acids Res ; 50(10): e60, 2022 06 10.

Artigo em Inglês | MEDLINE | ID: mdl-35188571

RESUMO

Advances in transcriptomic and translatomic techniques enable in-depth studies of RNA activity profiles and RNA-based regulatory mechanisms. Ribosomal RNA (rRNA) sequences are highly abundant among cellular RNA, but if the target sequences do not include polyadenylation, these cannot be easily removed in library preparation, requiring their post-hoc removal with computational techniques to accelerate and improve downstream analyses. Here, we describe RiboDetector, a novel software based on a Bi-directional Long Short-Term Memory (BiLSTM) neural network, which rapidly and accurately identifies rRNA reads from transcriptomic, metagenomic, metatranscriptomic, noncoding RNA, and ribosome profiling sequence data. Compared with state-of-the-art approaches, RiboDetector produced at least six times fewer misclassifications on the benchmark datasets. Importantly, the few false positives of RiboDetector were not enriched in certain Gene Ontology (GO) terms, suggesting a low bias for downstream functional profiling. RiboDetector also demonstrated a remarkable generalizability for detecting novel rRNA sequences that are divergent from the training data with sequence identities of <90%. On a personal computer, RiboDetector processed 40M reads in less than 6 min, which was â¼50 times faster in GPU mode and â¼15 times in CPU mode than other methods. RiboDetector is available under a GPL v3.0 license at https://github.com/hzi-bifo/RiboDetector.

Assuntos

Aprendizado Profundo , RNA Ribossômico , Metagenômica/métodos , RNA , RNA Ribossômico/genética , Software

13.

Nobel nominators - which women will you suggest?

McHardy, Alice C.

Nature ; 601(7894): 508, 2022 01.

Artigo em Inglês | MEDLINE | ID: mdl-35042983

Assuntos

Prêmio Nobel , Feminino , Humanos

14.

TripletProt: Deep Representation Learning of Proteins Based On Siamese Networks.

Nourani, Esmaeil; Asgari, Ehsaneddin; McHardy, Alice C; Mofrad, Mohammad R K.

IEEE/ACM Trans Comput Biol Bioinform ; 19(6): 3744-3753, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-34460382

RESUMO

Pretrained representations have recently gained attention in various machine learning applications. Nonetheless, the high computational costs associated with training these models have motivated alternative approaches for representation learning. Herein we introduce TripletProt, a new approach for protein representation learning based on the Siamese neural networks. Representation learning of biological entities which capture essential features can alleviate many of the challenges associated with supervised learning in bioinformatics. The most important distinction of our proposed method is relying on the protein-protein interaction (PPI) network. The computational cost of the generated representations for any potential application is significantly lower than comparable methods since the length of the representations is significantly smaller than that in other approaches. TripletProt offers great potentials for the protein informatics tasks and can be widely applied to similar tasks. We evaluate TripletProt comprehensively in protein functional annotation tasks including sub-cellular localization (14 categories) and gene ontology prediction (more than 2000 classes), which are both challenging multi-class, multi-label classification machine learning problems. We compare the performance of TripletProt with the state-of-the-art approaches including a recurrent language model-based approach (i.e., UniRep), as well as a protein-protein interaction (PPI) network and sequence-based method (i.e., DeepGO). Our TripletProt showed an overall improvement of F1 score in the above mentioned comprehensive functional annotation tasks, solely relying on the PPI network. Availability: The source code and datasets are available at https://github.com/EsmaeilNourani/TripletProt.

Assuntos

Redes Neurais de Computação , Proteínas , Proteínas/metabolismo , Software , Mapas de Interação de Proteínas , Idioma

15.

Accurate and scalable variant calling from single cell DNA sequencing data with ProSolo.

Lähnemann, David; Köster, Johannes; Fischer, Ute; Borkhardt, Arndt; McHardy, Alice C; Schönhuth, Alexander.

Nat Commun ; 12(1): 6744, 2021 11 18.

Artigo em Inglês | MEDLINE | ID: mdl-34795237

RESUMO

Accurate single cell mutational profiles can reveal genomic cell-to-cell heterogeneity. However, sequencing libraries suitable for genotyping require whole genome amplification, which introduces allelic bias and copy errors. The resulting data violates assumptions of variant callers developed for bulk sequencing. Thus, only dedicated models accounting for amplification bias and errors can provide accurate calls. We present ProSolo for calling single nucleotide variants from multiple displacement amplified (MDA) single cell DNA sequencing data. ProSolo probabilistically models a single cell jointly with a bulk sequencing sample and integrates all relevant MDA biases in a site-specific and scalable-because computationally efficient-manner. This achieves a higher accuracy in calling and genotyping single nucleotide variants in single cells in comparison to state-of-the-art tools and supports imputation of insufficiently covered genotypes, when downstream tools cannot handle missing data. Moreover, ProSolo implements the first approach to control the false discovery rate reliably and flexibly. ProSolo is implemented in an extendable framework, with code and usage at: https://github.com/prosolo/prosolo.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Célula Única/métodos , Software , Técnicas Genéticas , Genômica/métodos , Humanos , Mutação , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA/métodos

16.

EpitopeVec: linear epitope prediction using deep protein sequence embeddings.

Bahai, Akash; Asgari, Ehsaneddin; Mofrad, Mohammad R K; Kloetgen, Andreas; McHardy, Alice C.

Bioinformatics ; 37(23): 4517-4525, 2021 12 07.

Artigo em Inglês | MEDLINE | ID: mdl-34180989

RESUMO

MOTIVATION: B-cell epitopes (BCEs) play a pivotal role in the development of peptide vaccines, immuno-diagnostic reagents and antibody production, and thus in infectious disease prevention and diagnostics in general. Experimental methods used to determine BCEs are costly and time-consuming. Therefore, it is essential to develop computational methods for the rapid identification of BCEs. Although several computational methods have been developed for this task, generalizability is still a major concern, where cross-testing of the classifiers trained and tested on different datasets has revealed accuracies of 51-53%. RESULTS: We describe a new method called EpitopeVec, which uses a combination of residue properties, modified antigenicity scales, and protein language model-based representations (protein vectors) as features of peptides for linear BCE predictions. Extensive benchmarking of EpitopeVec and other state-of-the-art methods for linear BCE prediction on several large and small datasets, as well as cross-testing, demonstrated an improvement in the performance of EpitopeVec over other methods in terms of accuracy and area under the curve. As the predictive performance depended on the species origin of the respective antigens (viral, bacterial and eukaryotic), we also trained our method on a large viral dataset to create a dedicated linear viral BCE predictor with improved cross-testing performance. AVAILABILITY AND IMPLEMENTATION: The software is available at https://github.com/hzi-bifo/epitope-prediction. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Antígenos , Peptídeos , Sequência de Aminoácidos , Peptídeos/química , Antígenos/química , Software , Epitopos de Linfócito B/química

17.

Evaluating assembly and variant calling software for strain-resolved analysis of large DNA viruses.

Deng, Zhi-Luo; Dhingra, Akshay; Fritz, Adrian; Götting, Jasper; Münch, Philipp C; Steinbrück, Lars; Schulz, Thomas F; Ganzenmüller, Tina; McHardy, Alice C.

Brief Bioinform ; 22(3)2021 05 20.

Artigo em Inglês | MEDLINE | ID: mdl-34020538

RESUMO

Infection with human cytomegalovirus (HCMV) can cause severe complications in immunocompromised individuals and congenitally infected children. Characterizing heterogeneous viral populations and their evolution by high-throughput sequencing of clinical specimens requires the accurate assembly of individual strains or sequence variants and suitable variant calling methods. However, the performance of most methods has not been assessed for populations composed of low divergent viral strains with large genomes, such as HCMV. In an extensive benchmarking study, we evaluated 15 assemblers and 6 variant callers on 10 lab-generated benchmark data sets created with two different library preparation protocols, to identify best practices and challenges for analyzing such data. Most assemblers, especially metaSPAdes and IVA, performed well across a range of metrics in recovering abundant strains. However, only one, Savage, recovered low abundant strains and in a highly fragmented manner. Two variant callers, LoFreq and VarScan2, excelled across all strain abundances. Both shared a large fraction of false positive variant calls, which were strongly enriched in T to G changes in a 'G.G' context. The magnitude of this context-dependent systematic error is linked to the experimental protocol. We provide all benchmarking data, results and the entire benchmarking workflow named QuasiModo, Quasispecies Metric determination on omics, under the GNU General Public License v3.0 (https://github.com/hzi-bifo/Quasimodo), to enable full reproducibility and further benchmarking on these and other data.

Assuntos

Citomegalovirus/genética , Variação Genética , Genoma Viral , Software , Humanos

18.

Tutorial: assessing metagenomics software with the CAMI benchmarking toolkit.

Meyer, Fernando; Lesker, Till-Robin; Koslicki, David; Fritz, Adrian; Gurevich, Alexey; Darling, Aaron E; Sczyrba, Alexander; Bremges, Andreas; McHardy, Alice C.

Nat Protoc ; 16(4): 1785-1801, 2021 04.

Artigo em Inglês | MEDLINE | ID: mdl-33649565

RESUMO

Computational methods are key in microbiome research, and obtaining a quantitative and unbiased performance estimate is important for method developers and applied researchers. For meaningful comparisons between methods, to identify best practices and common use cases, and to reduce overhead in benchmarking, it is necessary to have standardized datasets, procedures and metrics for evaluation. In this tutorial, we describe emerging standards in computational meta-omics benchmarking derived and agreed upon by a larger community of researchers. Specifically, we outline recent efforts by the Critical Assessment of Metagenome Interpretation (CAMI) initiative, which supplies method developers and applied researchers with exhaustive quantitative data about software performance in realistic scenarios and organizes community-driven benchmarking challenges. We explain the most relevant evaluation metrics for assessing metagenome assembly, binning and profiling results, and provide step-by-step instructions on how to generate them. The instructions use simulated mouse gut metagenome data released in preparation for the second round of CAMI challenges and showcase the use of a repository of tool results for CAMI datasets. This tutorial will serve as a reference for the community and facilitate informative and reproducible benchmarking in microbiome research.

Assuntos

Benchmarking , Metagenômica/métodos , Software , Animais , Simulação por Computador , Bases de Dados Genéticas , Microbioma Gastrointestinal/genética , Metagenoma , Camundongos , Filogenia , Padrões de Referência , Reprodutibilidade dos Testes

19.

Hepatitis C reference viruses highlight potent antibody responses and diverse viral functional interactions with neutralising antibodies.

Bankwitz, Dorothea; Bahai, Akash; Labuhn, Maurice; Doepke, Mandy; Ginkel, Corinne; Khera, Tanvi; Todt, Daniel; Ströh, Luisa J; Dold, Leona; Klein, Florian; Klawonn, Frank; Krey, Thomas; Behrendt, Patrick; Cornberg, Markus; McHardy, Alice C; Pietschmann, Thomas.

Gut ; 70(9): 1734-1745, 2021 09.

Artigo em Inglês | MEDLINE | ID: mdl-33323394

RESUMO

OBJECTIVE: Neutralising antibodies are key effectors of infection-induced and vaccine-induced immunity. Quantification of antibodies' breadth and potency is critical for understanding the mechanisms of protection and for prioritisation of vaccines. Here, we used a unique collection of human specimens and HCV strains to develop HCV reference viruses for quantification of neutralising antibodies, and to investigate viral functional diversity. DESIGN: We profiled neutralisation potency of polyclonal immunoglobulins from 104 patients infected with HCV genotype (GT) 1-6 across 13 HCV strains representing five viral GTs. Using metric multidimensional scaling, we plotted HCV neutralisation onto neutralisation maps. We employed K-means clustering to guide virus clustering and selecting representative strains. RESULTS: Viruses differed greatly in neutralisation sensitivity, with J6 (GT2a) being most resistant and SA13 (GT5a) being most sensitive. They mapped to six distinct neutralisation clusters, in part composed of viruses from different GTs. There was no correlation between viral neutralisation and genetic distance, indicating functional neutralisation clustering differs from sequence-based clustering. Calibrating reference viruses representing these clusters against purified antibodies from 496 patients infected by GT1 to GT6 viruses readily identified individuals with extraordinary potent and broadly neutralising antibodies. It revealed comparable antibody cross-neutralisation and diversity between specimens from diverse viral GTs, confirming well-balanced reporting of HCV cross-neutralisation across highly diverse human samples. CONCLUSION: Representative isolates from six neutralisation clusters broadly reconstruct the functional HCV neutralisation space. They enable high resolution profiling of HCV neutralisation and they may reflect viral functional and antigenic properties important to consider in HCV vaccine design.

Assuntos

Anticorpos Neutralizantes/sangue , Hepacivirus/imunologia , Anticorpos Anti-Hepatite C/sangue , Hepatite C/imunologia , Sequência de Aminoácidos , Anticorpos Neutralizantes/imunologia , Hepacivirus/genética , Hepatite C/virologia , Humanos , Imunoglobulina G/sangue , Imunoglobulina G/imunologia

20.

Computational strategies to combat COVID-19: useful tools to accelerate SARS-CoV-2 and coronavirus research.

Hufsky, Franziska; Lamkiewicz, Kevin; Almeida, Alexandre; Aouacheria, Abdel; Arighi, Cecilia; Bateman, Alex; Baumbach, Jan; Beerenwinkel, Niko; Brandt, Christian; Cacciabue, Marco; Chuguransky, Sara; Drechsel, Oliver; Finn, Robert D; Fritz, Adrian; Fuchs, Stephan; Hattab, Georges; Hauschild, Anne-Christin; Heider, Dominik; Hoffmann, Marie; Hölzer, Martin; Hoops, Stefan; Kaderali, Lars; Kalvari, Ioanna; von Kleist, Max; Kmiecinski, Renó; Kühnert, Denise; Lasso, Gorka; Libin, Pieter; List, Markus; Löchel, Hannah F; Martin, Maria J; Martin, Roman; Matschinske, Julian; McHardy, Alice C; Mendes, Pedro; Mistry, Jaina; Navratil, Vincent; Nawrocki, Eric P; O'Toole, Áine Niamh; Ontiveros-Palacios, Nancy; Petrov, Anton I; Rangel-Pineros, Guillermo; Redaschi, Nicole; Reimering, Susanne; Reinert, Knut; Reyes, Alejandro; Richardson, Lorna; Robertson, David L; Sadegh, Sepideh; Singer, Joshua B.

Brief Bioinform ; 22(2): 642-663, 2021 03 22.

Artigo em Inglês | MEDLINE | ID: mdl-33147627

RESUMO

SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) is a novel virus of the family Coronaviridae. The virus causes the infectious disease COVID-19. The biology of coronaviruses has been studied for many years. However, bioinformatics tools designed explicitly for SARS-CoV-2 have only recently been developed as a rapid reaction to the need for fast detection, understanding and treatment of COVID-19. To control the ongoing COVID-19 pandemic, it is of utmost importance to get insight into the evolution and pathogenesis of the virus. In this review, we cover bioinformatics workflows and tools for the routine detection of SARS-CoV-2 infection, the reliable analysis of sequencing data, the tracking of the COVID-19 pandemic and evaluation of containment measures, the study of coronavirus evolution, the discovery of potential drug targets and development of therapeutic strategies. For each tool, we briefly describe its use case and how it advances research specifically for SARS-CoV-2. All tools are free to use and available online, either through web applications or public code repositories. Contact:evbc@unj-jena.de.

Assuntos

COVID-19/prevenção & controle , Biologia Computacional , SARS-CoV-2/isolamento & purificação , Pesquisa Biomédica , COVID-19/epidemiologia , COVID-19/virologia , Genoma Viral , Humanos , Pandemias , SARS-CoV-2/genética

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA