Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 84
Filtrar
1.
Environ Microbiol ; 26(1): e16566, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38149467

RESUMO

Trimming of sequencing reads is a pre-processing step that aims to discard sequence segments such as primers, adapters and low quality nucleotides that will interfere with clustering and classification steps. We evaluated the impact of trimming length of paired-end 16S and 18S rRNA amplicon reads on the ability to reconstruct the taxonomic composition and relative abundances of communities with a known composition in both even and uneven proportions. We found that maximizing read retention maximizes recall but reduces precision by increasing false positives. The presence of expected taxa was accurately predicted across broad trim length ranges but recovering original relative proportions remains a difficult challenge. We show that parameters that maximize taxonomic recovery do not simultaneously maximize relative abundance accuracy. Trim length represents one of several experimental parameters that have non-uniform impact across microbial clades, making it a difficult parameter to optimize. This study offers insights, guidelines, and helps researchers assess the significance of their decisions when trimming raw reads in a microbiome analysis based on overlapping or non-overlapping paired-end amplicons.


Assuntos
Microbiota , RNA Ribossômico 16S/genética , Microbiota/genética , Análise de Sequência de DNA , RNA Ribossômico 18S , Primers do DNA/genética , Sequenciamento de Nucleotídeos em Larga Escala
2.
bioRxiv ; 2023 Aug 14.
Artigo em Inglês | MEDLINE | ID: mdl-37609252

RESUMO

Lateral gene transfer (LGT) is an important mechanism for genome diversification in microbial populations, including the human microbiome. While prior work has surveyed LGT events in human-associated microbial isolate genomes, the scope and dynamics of novel LGT events arising in personal microbiomes are not well understood, as there are no widely adopted computational methods to detect, quantify, and characterize LGT from complex microbial communities. We addressed this by developing, benchmarking, and experimentally validating a computational method (WAAFLE) to profile novel LGT events from assembled metagenomes. Applying WAAFLE to >2K human metagenomes from diverse body sites, we identified >100K putative high-confidence but previously uncharacterized LGT events (~2 per assembled microbial genome-equivalent). These events were enriched for mobile elements (as expected), as well as restriction-modification and transport functions typically associated with the destruction of foreign DNA. LGT frequency was quantifiably influenced by biogeography, the phylogenetic similarity of the involved taxa, and the ecological abundance of the donor taxon. These forces manifest as LGT networks in which hub species abundant in a community type donate unequally with their close phylogenetic neighbors. Our findings suggest that LGT may be a more ubiquitous process in the human microbiome than previously described. The open-source WAAFLE implementation, documentation, and data from this work are available at http://huttenhower.sph.harvard.edu/waafle.

3.
Sci Rep ; 13(1): 5210, 2023 03 30.
Artigo em Inglês | MEDLINE | ID: mdl-36997631

RESUMO

Using environmental DNA (eDNA) to monitor biodiversity in aquatic environments is becoming an efficient and cost-effective alternative to other methods such as visual and acoustic identification. Until recently, eDNA sampling was accomplished primarily through manual sampling methods; however, with technological advances, automated samplers are being developed to make sampling easier and more accessible. This paper describes a new eDNA sampler capable of self-cleaning and multi-sample capture and preservation, all within a single unit capable of being deployed by a single person. The first in-field test of this sampler took place in the Bedford Basin, Nova Scotia, Canada alongside parallel samples taken using the typical Niskin bottle collection and post-collection filtration method. Both methods were able to capture the same aquatic microbial community and counts of representative DNA sequences were well correlated between methods with R[Formula: see text] values ranging from 0.71-0.93. The two collection methods returned the same top 10 families in near identical relative abundance, demonstrating that the sampler was able to capture the same community composition of common microbes as the Niskin. The presented eDNA sampler provides a robust alternative to manual sampling methods, is amenable to autonomous vehicle payload constraints, and will facilitate persistent monitoring of remote and inaccessible sites.


Assuntos
DNA Ambiental , Microbiota , Humanos , DNA Ambiental/genética , Biodiversidade , Filtração , Microbiota/genética , Nova Escócia , Monitoramento Ambiental/métodos , Código de Barras de DNA Taxonômico/métodos
4.
Nucleic Acids Res ; 51(D1): D690-D699, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36263822

RESUMO

The Comprehensive Antibiotic Resistance Database (CARD; card.mcmaster.ca) combines the Antibiotic Resistance Ontology (ARO) with curated AMR gene (ARG) sequences and resistance-conferring mutations to provide an informatics framework for annotation and interpretation of resistomes. As of version 3.2.4, CARD encompasses 6627 ontology terms, 5010 reference sequences, 1933 mutations, 3004 publications, and 5057 AMR detection models that can be used by the accompanying Resistance Gene Identifier (RGI) software to annotate genomic or metagenomic sequences. Focused curation enhancements since 2020 include expanded ß-lactamase curation, incorporation of likelihood-based AMR mutations for Mycobacterium tuberculosis, addition of disinfectants and antiseptics plus their associated ARGs, and systematic curation of resistance-modifying agents. This expanded curation includes 180 new AMR gene families, 15 new drug classes, 1 new resistance mechanism, and two new ontological relationships: evolutionary_variant_of and is_small_molecule_inhibitor. In silico prediction of resistomes and prevalence statistics of ARGs has been expanded to 377 pathogens, 21,079 chromosomes, 2,662 genomic islands, 41,828 plasmids and 155,606 whole-genome shotgun assemblies, resulting in collation of 322,710 unique ARG allele sequences. New features include the CARD:Live collection of community submitted isolate resistome data and the introduction of standardized 15 character CARD Short Names for ARGs to support machine learning efforts.


Assuntos
Curadoria de Dados , Bases de Dados Factuais , Resistência Microbiana a Medicamentos , Aprendizado de Máquina , Antibacterianos/farmacologia , Genes Bacterianos , Funções Verossimilhança , Software , Anotação de Sequência Molecular
5.
Syst Biol ; 72(3): 559-574, 2023 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-35904761

RESUMO

Organismal traits can evolve in a coordinated way, with correlated patterns of gains and losses reflecting important evolutionary associations. Discovering these associations can reveal important information about the functional and ecological linkages among traits. Phylogenetic profiles treat individual genes as traits distributed across sets of genomes and can provide a fine-grained view of the genetic underpinnings of evolutionary processes in a set of genomes. Phylogenetic profiling has been used to identify genes that are functionally linked and to identify common patterns of lateral gene transfer in microorganisms. However, comparative analysis of phylogenetic profiles and other trait distributions should take into account the phylogenetic relationships among the organisms under consideration. Here, we propose the Community Coevolution Model (CCM), a new coevolutionary model to analyze the evolutionary associations among traits, with a focus on phylogenetic profiles. In the CCM, traits are considered to evolve as a community with interactions, and the transition rate for each trait depends on the current states of other traits. Surpassing other comparative methods for pairwise trait analysis, CCM has the additional advantage of being able to examine multiple traits as a community to reveal more dependency relationships. We also develop a simulation procedure to generate phylogenetic profiles with correlated evolutionary patterns that can be used as benchmark data for evaluation purposes. A simulation study demonstrates that CCM is more accurate than other methods including the Jaccard Index and three tree-aware methods. The parameterization of CCM makes the interpretation of the relations between genes more direct, which leads to Darwin's scenario being identified easily based on the estimated parameters. We show that CCM is more efficient and fits real data better than other methods resulting in higher likelihood scores with fewer parameters. An examination of 3786 phylogenetic profiles across a set of 659 bacterial genomes highlights linkages between genes with common functions, including many patterns that would not have been identified under a nonphylogenetic model of common distribution. We also applied the CCM to 44 proteins in the well-studied Mitochondrial Respiratory Complex I and recovered associations that mapped well onto the structural associations that exist in the complex. [Coevolution; evolutionary rates; gene network; graphical models; phylogenetic profiles; phylogeny.].


Assuntos
Evolução Biológica , Proteínas , Filogenia , Fenótipo , Genoma Bacteriano
6.
BMC Microbiol ; 22(1): 270, 2022 11 10.
Artigo em Inglês | MEDLINE | ID: mdl-36357861

RESUMO

BACKGROUND: Preterm birth is a global problem with about 12% of births in sub-Saharan Africa occurring before 37 weeks of gestation. Several studies have explored a potential association between vaginal microbiota and preterm birth, and some have found an association while others have not. We performed a study designed to determine whether there is an association with vaginal microbiota and/or placental microbiota and preterm birth in an African setting. METHODS: Women presenting to the study hospital in labor with a gestational age of 26 to 36 weeks plus six days were prospectively enrolled in a study of the microbiota in preterm labor along with controls matched for age and parity. A vaginal sample was collected at the time of presentation to the hospital in active labor. In addition, a placental sample was collected when available. Libraries were constructed using PCR primers to amplify the V6/V7/V8 variable regions of the 16S rRNA gene, followed by sequencing with an Illumina MiSeq machine and analysis using QIIME2 2022.2. RESULTS: Forty-nine women presenting with preterm labor and their controls were enrolled in the study of which 23 matched case-control pairs had sufficient sequence data for comparison. Lactobacillus was identified in all subjects, ranging in abundance from < 1% to > 99%, with Lactobacillus iners and Lactobacillus crispatus the most common species. Over half of the vaginal samples contained Gardnerella and/or Prevotella; both species were associated with preterm birth in previous studies. However, we found no significant difference in composition between mothers with preterm and those with full-term deliveries, with both groups showing roughly equal representation of different Lactobacillus species and dysbiosis-associated genera. Placental samples generally had poor DNA recovery, with a mix of probable sequencing artifacts, contamination, and bacteria acquired during passage through the birth canal. However, several placental samples showed strong evidence for the presence of Streptococcus species, which are known to infect the placenta. CONCLUSIONS: The current study showed no association of preterm birth with composition of the vaginal community. It does provide important information on the range of sequence types in African women and supports other data suggesting that women of African ancestry have an increased frequency of non-Lactobacillus types, but without evidence of associated adverse outcomes.


Assuntos
Microbiota , Trabalho de Parto Prematuro , Nascimento Prematuro , Humanos , Feminino , Recém-Nascido , Gravidez , Lactente , RNA Ribossômico 16S/genética , Nascimento Prematuro/microbiologia , Estudos de Casos e Controles , Quênia , Placenta , Vagina/microbiologia , Trabalho de Parto Prematuro/microbiologia , Microbiota/genética
7.
Microb Genom ; 8(9)2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-36129737

RESUMO

Enterococcus faecium is a ubiquitous opportunistic pathogen that is exhibiting increasing levels of antimicrobial resistance (AMR). Many of the genes that confer resistance and pathogenic functions are localized on mobile genetic elements (MGEs), which facilitate their transfer between lineages. Here, features including resistance determinants, virulence factors and MGEs were profiled in a set of 1273 E. faecium genomes from two disparate geographic locations (in the UK and Canada) from a range of agricultural, clinical and associated habitats. Neither lineages of E. faecium, type A and B, nor MGEs are constrained by geographic proximity, but our results show evidence of a strong association of many profiled genes and MGEs with habitat. Many features were associated with a group of clinical and municipal wastewater genomes that are likely forming a new human-associated ecotype within type A. The evolutionary dynamics of E. faecium make it a highly versatile emerging pathogen, and its ability to acquire, transmit and lose features presents a high risk for the emergence of new pathogenic variants and novel resistance combinations. This study provides a workflow for MGE-centric surveillance of AMR in Enterococcus that can be adapted to other pathogens.


Assuntos
Anti-Infecciosos , Enterococcus faecium , Saúde Única , Enterococcus faecium/genética , Humanos , Fatores de Virulência/genética , Águas Residuárias
8.
Clin Microbiol Rev ; 35(3): e0017921, 2022 09 21.
Artigo em Inglês | MEDLINE | ID: mdl-35612324

RESUMO

Antimicrobial resistance (AMR) is a global health crisis that poses a great threat to modern medicine. Effective prevention strategies are urgently required to slow the emergence and further dissemination of AMR. Given the availability of data sets encompassing hundreds or thousands of pathogen genomes, machine learning (ML) is increasingly being used to predict resistance to different antibiotics in pathogens based on gene content and genome composition. A key objective of this work is to advocate for the incorporation of ML into front-line settings but also highlight the further refinements that are necessary to safely and confidently incorporate these methods. The question of what to predict is not trivial given the existence of different quantitative and qualitative laboratory measures of AMR. ML models typically treat genes as independent predictors, with no consideration of structural and functional linkages; they also may not be accurate when new mutational variants of known AMR genes emerge. Finally, to have the technology trusted by end users in public health settings, ML models need to be transparent and explainable to ensure that the basis for prediction is clear. We strongly advocate that the next set of AMR-ML studies should focus on the refinement of these limitations to be able to bridge the gap to diagnostic implementation.


Assuntos
Antibacterianos , Farmacorresistência Bacteriana , Antibacterianos/farmacologia , Antibacterianos/uso terapêutico , Farmacorresistência Bacteriana/genética , Aprendizado de Máquina
9.
Bioinformatics ; 38(11): 3051-3061, 2022 05 26.
Artigo em Inglês | MEDLINE | ID: mdl-35536192

RESUMO

MOTIVATION: There is a plethora of measures to evaluate functional similarity (FS) of genes based on their co-expression, protein-protein interactions and sequence similarity. These measures are typically derived from hand-engineered and application-specific metrics to quantify the degree of shared information between two genes using their Gene Ontology (GO) annotations. RESULTS: We introduce deepSimDEF, a deep learning method to automatically learn FS estimation of gene pairs given a set of genes and their GO annotations. deepSimDEF's key novelty is its ability to learn low-dimensional embedding vector representations of GO terms and gene products and then calculate FS using these learned vectors. We show that deepSimDEF can predict the FS of new genes using their annotations: it outperformed all other FS measures by >5-10% on yeast and human reference datasets on protein-protein interactions, gene co-expression and sequence homology tasks. Thus, deepSimDEF offers a powerful and adaptable deep neural architecture that can benefit a wide range of problems in genomics and proteomics, and its architecture is flexible enough to support its extension to any organism. AVAILABILITY AND IMPLEMENTATION: Source code and data are available at https://github.com/ahmadpgh/deepSimDEF. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Proteínas , Humanos , Ontologia Genética , Biologia Computacional/métodos , Anotação de Sequência Molecular , Software , Saccharomyces cerevisiae , RNA
10.
Microb Genom ; 7(1)2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-33416461

RESUMO

Diagnosing antimicrobial resistance (AMR) in the clinic is based on empirical evidence and current gold standard laboratory phenotypic methods. Genotypic methods have the potential advantages of being faster and cheaper, and having improved mechanistic resolution over phenotypic methods. We generated and applied rule-based and logistic regression models to predict the AMR phenotype from Escherichia coli and Pseudomonas aeruginosa multidrug-resistant clinical isolate genomes. By inspecting and evaluating these models, we identified previously unknown ß-lactamase substrate activities. In total, 22 unknown ß-lactamase substrate activities were experimentally validated using targeted gene expression studies. Our results demonstrate that generating and analysing predictive models can help guide researchers to the mechanisms driving resistance and improve annotation of AMR genes and phenotypic prediction, and suggest that we cannot solely rely on curated knowledge to predict resistance phenotypes.


Assuntos
Antibacterianos/farmacologia , Biologia Computacional/métodos , Farmacorresistência Bacteriana , Escherichia coli/enzimologia , Pseudomonas aeruginosa/enzimologia , beta-Lactamases/metabolismo , Algoritmos , Simulação por Computador , Curadoria de Dados , Escherichia coli/efeitos dos fármacos , Escherichia coli/genética , Perfilação da Expressão Gênica , Regulação Bacteriana da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Modelos Logísticos , Pseudomonas aeruginosa/efeitos dos fármacos , Pseudomonas aeruginosa/genética , Sequenciamento Completo do Genoma
11.
Microb Genom ; 6(10)2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-33001022

RESUMO

Metagenomic methods enable the simultaneous characterization of microbial communities without time-consuming and bias-inducing culturing. Metagenome-assembled genome (MAG) binning methods aim to reassemble individual genomes from this data. However, the recovery of mobile genetic elements (MGEs), such as plasmids and genomic islands (GIs), by binning has not been well characterized. Given the association of antimicrobial resistance (AMR) genes and virulence factor (VF) genes with MGEs, studying their transmission is a public-health priority. The variable copy number and sequence composition of MGEs makes them potentially problematic for MAG binning methods. To systematically investigate this issue, we simulated a low-complexity metagenome comprising 30 GI-rich and plasmid-containing bacterial genomes. MAGs were then recovered using 12 current prediction pipelines and evaluated. While 82-94 % of chromosomes could be correctly recovered and binned, only 38-44 % of GIs and 1-29 % of plasmid sequences were found. Strikingly, no plasmid-borne VF nor AMR genes were recovered, and only 0-45 % of AMR or VF genes within GIs. We conclude that short-read MAG approaches, without further optimization, are largely ineffective for the analysis of mobile genes, including those of public-health importance, such as AMR and VF genes. We propose that researchers should explore developing methods that optimize for this issue and consider also using unassembled short reads and/or long-read approaches to more fully characterize metagenomic data.


Assuntos
Bactérias/genética , Ilhas Genômicas/genética , Metagenoma/genética , Metagenômica/métodos , Plasmídeos/genética , Algoritmos , Simulação por Computador , Genoma Bacteriano/genética , Microbiota/genética , Análise de Sequência de DNA
12.
Bioinformatics ; 36(10): 3043-3048, 2020 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-32108861

RESUMO

MOTIVATION: Many methods for microbial protein subcellular localization (SCL) prediction exist; however, none is readily available for analysis of metagenomic sequence data, despite growing interest from researchers studying microbial communities in humans, agri-food relevant organisms and in other environments (e.g. for identification of cell-surface biomarkers for rapid protein-based diagnostic tests). We wished to also identify new markers of water quality from freshwater samples collected from pristine versus pollution-impacted watersheds. RESULTS: We report PSORTm, the first bioinformatics tool designed for prediction of diverse bacterial and archaeal protein SCL from metagenomics data. PSORTm incorporates components of PSORTb, one of the most precise and widely used protein SCL predictors, with an automated classification by cell envelope. An evaluation using 5-fold cross-validation with in silico-fragmented sequences with known localization showed that PSORTm maintains PSORTb's high precision, while sensitivity increases proportionately with metagenomic sequence fragment length. PSORTm's read-based analysis was similar to PSORTb-based analysis of metagenome-assembled genomes (MAGs); however, the latter requires non-trivial manual classification of each MAG by cell envelope, and cannot make use of unassembled sequences. Analysis of the watershed samples revealed the importance of normalization and identified potential biomarkers of water quality. This method should be useful for examining a wide range of microbial communities, including human microbiomes, and other microbiomes of medical, environmental or industrial importance. AVAILABILITY AND IMPLEMENTATION: Documentation, source code and docker containers are available for running PSORTm locally at https://www.psort.org/psortm/ (freely available, open-source software under GNU General Public License Version 3). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Archaea , Metagenômica , Archaea/genética , Bactérias/genética , Humanos , Metagenoma , Software
13.
Nucleic Acids Res ; 48(D1): D517-D525, 2020 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-31665441

RESUMO

The Comprehensive Antibiotic Resistance Database (CARD; https://card.mcmaster.ca) is a curated resource providing reference DNA and protein sequences, detection models and bioinformatics tools on the molecular basis of bacterial antimicrobial resistance (AMR). CARD focuses on providing high-quality reference data and molecular sequences within a controlled vocabulary, the Antibiotic Resistance Ontology (ARO), designed by the CARD biocuration team to integrate with software development efforts for resistome analysis and prediction, such as CARD's Resistance Gene Identifier (RGI) software. Since 2017, CARD has expanded through extensive curation of reference sequences, revision of the ontological structure, curation of over 500 new AMR detection models, development of a new classification paradigm and expansion of analytical tools. Most notably, a new Resistomes & Variants module provides analysis and statistical summary of in silico predicted resistance variants from 82 pathogens and over 100 000 genomes. By adding these resistance variants to CARD, we are able to summarize predicted resistance using the information included in CARD, identify trends in AMR mobility and determine previously undescribed and novel resistance variants. Here, we describe updates and recent expansions to CARD and its biocuration process, including new resources for community biocuration of AMR molecular reference data.


Assuntos
Bases de Dados Genéticas , Farmacorresistência Bacteriana , Genes Bacterianos , Software , Bactérias/efeitos dos fármacos , Bactérias/genética , Proteínas de Bactérias/química , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo
14.
mSystems ; 4(4)2019 Aug 06.
Artigo em Inglês | MEDLINE | ID: mdl-31387929

RESUMO

Nontyphoidal Salmonella (NTS) is a leading global cause of bacterial foodborne morbidity and mortality. Our ability to treat severe NTS infections has been impaired by increasing antimicrobial resistance (AMR). To understand and mitigate the global health crisis AMR represents, we need to link the observed resistance phenotypes with their underlying genomic mechanisms. Broiler chickens represent a key reservoir and vector for NTS infections, but isolates from this setting have been characterized in only very low numbers relative to clinical isolates. In this study, we sequenced and assembled 97 genomes encompassing 7 serotypes isolated from broiler chicken in farms in British Columbia between 2005 and 2008. Through application of machine learning (ML) models to predict the observed AMR phenotype from this genomic data, we were able to generate highly (0.92 to 0.99) precise logistic regression models using known AMR gene annotations as features for 7 antibiotics (amoxicillin-clavulanic acid, ampicillin, cefoxitin, ceftiofur, ceftriaxone, streptomycin, and tetracycline). Similarly, we also trained "reference-free" k-mer-based set-covering machine phenotypic prediction models (0.91 to 1.0 precision) for these antibiotics. By combining the inferred k-mers and logistic regression weights, we identified the primary drivers of AMR for the 7 studied antibiotics in these isolates. With our research representing one of the largest studies of a diverse set of NTS isolates from broiler chicken, we can thus confirm that the AmpC-like CMY-2 ß-lactamase is a primary driver of ß-lactam resistance and that the phosphotransferases APH(6)-Id and APH(3″-Ib) are the principal drivers of streptomycin resistance in this important ecosystem.IMPORTANCE Antimicrobial resistance (AMR) represents an existential threat to the function of modern medicine. Genomics and machine learning methods are being increasingly used to analyze and predict AMR. This type of surveillance is very important to try to reduce the impact of AMR. Machine learning models are typically trained using genomic data, but the aspects of the genomes that they use to make predictions are rarely analyzed. In this work, we showed how, by using different types of machine learning models and performing this analysis, it is possible to identify the key genes underlying AMR in nontyphoidal Salmonella (NTS). NTS is among the leading cause of foodborne illness globally; however, AMR in NTS has not been heavily studied within the food chain itself. Therefore, in this work we performed a broad-scale analysis of the AMR in NTS isolates from commercial chicken farms and identified some priority AMR genes for surveillance.

15.
Mol Ecol Resour ; 19(1): 272-282, 2019 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-30312001

RESUMO

Restriction site-associated DNA sequencing (RADseq) is a powerful tool for genotyping of individuals, but the identification of loci and assignment of sequence reads is a crucial and often challenging step. The optimal parameter settings for a given de novo RADseq assembly vary between data sets and can be difficult and computationally expensive to determine. Here, we introduce RADProc, a software package that uses a graph data structure to represent all sequence reads and their similarity relationships. Storing sequence-comparison results in a graph eliminates unnecessary and redundant sequence similarity calculations. De novo locus formation for a given parameter set can be performed on the precomputed graph, making parameter sweeps far more efficient. RADProc also uses a clustering approach for faster nucleotide-distance calculation. The performance of RADProc compares favourably with that of the widely used Stacks software. The run-time comparisons between RADProc and Stacks for 32 different parameter settings using 20 green-crab (Carcinus maenas) samples showed that RADProc took as little as 2 hr 40 min compared to 78 hr by Stacks, while 16 brown trout (Salmo trutta L.) samples were processed by RADProc and Stacks in 23 and 263 hr, respectively. Comparisons of the de novo loci formed, and catalog built using both the methods demonstrate that the improvement in processing speeds achieved by RADProc does not affect much the actual loci formed and the results of downstream analyses based on those loci.


Assuntos
Biologia Computacional/métodos , Loci Gênicos , Genômica/métodos , Análise de Sequência de DNA/métodos , Software , Animais , Braquiúros/genética
16.
Microb Ecol ; 77(3): 713-725, 2019 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-30209585

RESUMO

Soil microorganisms are important mediators of carbon cycling in nature. Although cellulose- and hemicellulose-degrading bacteria have been isolated from Algerian ecosystems, the information on the composition of soil bacterial communities and thus the potential of their members to decompose plant residues is still limited. The objective of the present study was to describe and compare the bacterial community composition in Algerian soils (crop, forest, garden, and desert) and the activity of cellulose- and hemicellulose-degrading enzymes. Bacterial communities were characterized by high-throughput 16S amplicon sequencing followed by the in silico prediction of their functional potential. The highest lignocellulolytic activity was recorded in forest and garden soils whereas activities in the agricultural and desert soils were typically low. The bacterial phyla Proteobacteria (in particular classes α-proteobacteria, δ-proteobacteria, and γ-proteobacteria), Firmicutes, and Actinobacteria dominated in all soils. Forest and garden soils exhibited higher diversity than agricultural and desert soils. Endocellulase activity was elevated in forest and garden soils. In silico analysis predicted higher share of genes assigned to general metabolism in forest and garden soils compared with agricultural and arid soils, particularly in carbohydrate metabolism. The highest potential of lignocellulose decomposition was predicted for forest soils, which is in agreement with the highest activity of corresponding enzymes.


Assuntos
Bactérias/enzimologia , Proteínas de Bactérias/metabolismo , Celulase/metabolismo , Glicosídeo Hidrolases/metabolismo , Microbiologia do Solo , Solo/química , Argélia , Bactérias/classificação , Bactérias/genética , Bactérias/isolamento & purificação , Proteínas de Bactérias/genética , Celulase/genética , Ecossistema , Florestas , Glicosídeo Hidrolases/genética , Filogenia
17.
Methods Mol Biol ; 1849: 113-129, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30298251

RESUMO

Microbial marker-gene sequence data can be used to generate comprehensive taxonomic profiles of the microorganisms present in a given community and for other community diversity analyses. The process of going from raw gene sequences to taxonomic profiles or diversity measures involves a series of data transformations performed by numerous computational tools. This includes tools for sequence quality checking, denoising, taxonomic classification, alignment, and phylogenetic tree building. In this chapter, we demonstrate how the Quantitative Insights Into Microbial Ecology version 2 (QIIME2) software suite can simplify 16S rRNA marker-gene analysis. We walk through an example data set extracted from the guts of bumblebees in order to show how QIIME2 can transform raw sequences into taxonomic bar plots, phylogenetic trees, principal co-ordinates analyses, and other visualizations of microbial diversity.


Assuntos
Bactérias/genética , Marcadores Genéticos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , RNA Ribossômico 16S/genética , Software , Bactérias/classificação , Bactérias/isolamento & purificação , Biodiversidade , Biologia Computacional/métodos , Filogenia
18.
Methods Mol Biol ; 1849: 169-177, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30298254

RESUMO

Marker-gene sequencing is a cost-effective method of taxonomically profiling microbial communities. Unlike metagenomic approaches, marker-gene sequencing does not provide direct information about the functional genes that are present in the genomes of community members. However, by capitalizing on the rapid growth in the number of sequenced genomes, it is possible to infer which functions are likely associated with a marker gene based on its sequence similarity with a reference genome. The PICRUSt tool is based on this idea and can predict functional category abundances based on an input marker gene. In brief, this method requires a reference phylogeny with tips corresponding to taxa with reference genomes as well as taxa lacking sequenced genomes. A modified ancestral state reconstruction (ASR) method is then used to infer counts of functional categories for taxa without reference genomes. The predictions are written to pre-calculated files, which can be cross-referenced with other datasets to quickly generate predictions of functional potential for a community. This chapter will give an in-depth description of these methods and describe how PICRUSt should be used.


Assuntos
Bactérias/genética , Biologia Computacional/métodos , Marcadores Genéticos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Metagenoma , Microbiota , Software , Bactérias/classificação , Bactérias/isolamento & purificação , Biodiversidade , Filogenia
19.
Ecol Evol ; 8(14): 7002-7013, 2018 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-30073062

RESUMO

Restriction-site associated DNA sequencing (RAD-seq) can identify and score thousands of genetic markers from a group of samples for population-genetics studies. One challenge of de novo RAD-seq analysis is to distinguish paralogous sequence variants (PSVs) from true single-nucleotide polymorphisms (SNPs) associated with orthologous loci. In the absence of a reference genome, it is difficult to differentiate true SNPs from PSVs, and their impact on downstream analysis remains unclear. Here, we introduce a network-based approach, PMERGE that connects fragments based on their DNA sequence similarity to identify probable PSVs. Applying our method to de novo RAD-seq data from 150 Atlantic salmon (Salmo salar) samples collected from 15 locations across the Southern Newfoundland coast allowed the identification of 87% of total PSVs identified through alignment to the Atlantic salmon genome. Removal of these paralogs altered the inferred population structure, highlighting the potential impact of filtering in RAD-seq analysis. PMERGE is also applied to a green crab (Carcinus maenas) data set consisting of 242 samples from 11 different locations and was successfully able to identify and remove the majority of paralogous loci (62%). The PMERGE software can be run as part of the widely used Stacks analysis package.

20.
Mol Ecol ; 27(20): 4026-4040, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-30152128

RESUMO

Conservation of exploited species requires an understanding of both genetic diversity and the dominant structuring forces, particularly near range limits, where climatic variation can drive rapid expansions or contractions of geographic range. Here, we examine population structure and landscape associations in Atlantic salmon (Salmo salar) across a heterogeneous landscape near the northern range limit in Labrador, Canada. Analysis of two amplicon-based data sets containing 101 microsatellites and 376 single nucleotide polymorphisms (SNPs) from 35 locations revealed clear differentiation between populations spawning in rivers flowing into a large marine embayment (Lake Melville) compared to coastal populations. The mechanisms influencing the differentiation of embayment populations were investigated using both multivariate and machine-learning landscape genetic approaches. We identified temperature as the strongest correlate with genetic structure, particularly warm temperature extremes and wider annual temperature ranges. The genomic basis of this divergence was further explored using a subset of locations (n = 17) and a 220K SNP array. SNPs associated with spatial structuring and temperature mapped to a diverse set of genes and molecular pathways, including regulation of gene expression, immune response, and cell development and differentiation. The results spanning molecular marker types and both novel and established methods clearly show climate-associated, fine-scale population structure across an environmental gradient in Atlantic salmon near its range limit in North America, highlighting valuable approaches for predicting population responses to climate change and managing species sustainability.


Assuntos
Genética Populacional/métodos , Repetições de Microssatélites/genética , Salmo salar/genética , Animais , América do Norte , Polimorfismo de Nucleotídeo Único/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...