Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 56
Filter
1.
J Clin Immunol ; 44(6): 133, 2024 May 23.
Article in English | MEDLINE | ID: mdl-38780872

ABSTRACT

PURPOSE: A large proportion of Common variable immunodeficiency (CVID) patients has duodenal inflammation with increased intraepithelial lymphocytes (IEL) of unknown aetiology. The histologic similarities to celiac disease, lead to confusion regarding treatment (gluten-free diet) of these patients. We aimed to elucidate the role of epigenetic DNA methylation in the aetiology of duodenal inflammation in CVID and differentiate it from true celiac disease. METHODS: DNA was isolated from snap-frozen pieces of duodenal biopsies and analysed for differences in genome-wide epigenetic DNA methylation between CVID patients with increased IEL (CVID_IEL; n = 5) without IEL (CVID_N; n = 3), celiac disease (n = 3) and healthy controls (n = 3). RESULTS: The DNA methylation data of 5-methylcytosine in CpG sites separated CVID and celiac diseases from healthy controls. Differential methylation in promoters of genes were identified as potential novel mediators in CVID and celiac disease. There was limited overlap of methylation associated genes between CVID_IEL and Celiac disease. High frequency of differentially methylated CpG sites was detected in over 100 genes nearby transcription start site (TSS) in both CVID_IEL and celiac disease, compared to healthy controls. Differential methylation of genes involved in regulation of TNF/cytokine production were enriched in CVID_IEL, compared to healthy controls. CONCLUSION: This is the first study to reveal a role of epigenetic DNA methylation in the etiology of duodenal inflammation of CVID patients, distinguishing CVID_IEL from celiac disease. We identified potential biomarkers and therapeutic targets within gene promotors and in high-frequency differentially methylated CpG regions proximal to TSS in both CVID_IEL and celiac disease.


Subject(s)
Celiac Disease , Common Variable Immunodeficiency , CpG Islands , DNA Methylation , Duodenum , Epigenesis, Genetic , Humans , Common Variable Immunodeficiency/genetics , Duodenum/metabolism , Duodenum/pathology , Celiac Disease/genetics , Female , Male , Adult , Middle Aged , CpG Islands/genetics , Promoter Regions, Genetic/genetics , Intraepithelial Lymphocytes/immunology , Young Adult , Genome-Wide Association Study , 5-Methylcytosine/metabolism
2.
Sci Adv ; 10(16): eadk4825, 2024 Apr 19.
Article in English | MEDLINE | ID: mdl-38630812

ABSTRACT

The ability of epithelial monolayers to self-organize into a dynamic polarized state, where cells migrate in a uniform direction, is essential for tissue regeneration, development, and tumor progression. However, the mechanisms governing long-range polar ordering of motility direction in biological tissues remain unclear. Here, we investigate the self-organizing behavior of quiescent epithelial monolayers that transit to a dynamic state with long-range polar order upon growth factor exposure. We demonstrate that the heightened self-propelled activity of monolayer cells leads to formation of vortex-antivortex pairs that undergo sequential annihilation, ultimately driving the spread of long-range polar order throughout the system. A computational model, which treats the monolayer as an active elastic solid, accurately replicates this behavior, and weakening of cell-to-cell interactions impedes vortex-antivortex annihilation and polar ordering. Our findings uncover a mechanism in epithelia, where elastic solid material characteristics, activated self-propulsion, and topology-mediated guidance converge to fuel a highly efficient polar self-ordering activity.


Subject(s)
Cell Communication , Cell Movement , Epithelium
3.
Nat Commun ; 15(1): 1791, 2024 Feb 29.
Article in English | MEDLINE | ID: mdl-38424056

ABSTRACT

Stool samples for fecal immunochemical tests (FIT) are collected in large numbers worldwide as part of colorectal cancer screening programs. Employing FIT samples from 1034 CRCbiome participants, recruited from a Norwegian colorectal cancer screening study, we identify, annotate and characterize more than 18000 DNA viruses, using shotgun metagenome sequencing. Only six percent of them are assigned to a known taxonomic family, with Microviridae being the most prevalent viral family. Linking individual profiles to comprehensive lifestyle and demographic data shows 17/25 of the variables to be associated with the gut virome. Physical activity, smoking, and dietary fiber consumption exhibit strong and consistent associations with both diversity and relative abundance of individual viruses, as well as with enrichment for auxiliary metabolic genes. We demonstrate the suitability of FIT samples for virome analysis, opening an opportunity for large-scale studies of this enigmatic part of the gut microbiome. The diverse viral populations and their connections to the individual lifestyle uncovered herein paves the way for further exploration of the role of the gut virome in health and disease.


Subject(s)
Colorectal Neoplasms , Viruses , Humans , Virome , DNA Viruses/genetics , Viruses/genetics , DNA , Colorectal Neoplasms/diagnosis , Colorectal Neoplasms/genetics
4.
BMC Bioinformatics ; 24(1): 371, 2023 Oct 02.
Article in English | MEDLINE | ID: mdl-37784008

ABSTRACT

BACKGROUND: Shotgun metagenome sequencing data obtained from a host environment will usually be contaminated with sequences from the host organism. Host sequences should be removed before further analysis to avoid biases, reduce downstream computational load, or ensure privacy in the case of a human host. The tools that we identified, as designed specifically to perform host contamination sequence removal, were either outdated, not maintained, or complicated to use. Consequently, we have developed HoCoRT, a fast and user-friendly tool that implements several methods for optimised host sequence removal. We have evaluated the speed and accuracy of these methods. RESULTS: HoCoRT is an open-source command-line tool for host contamination removal. It is designed to be easy to install and use, offering a one-step option for genome indexing. HoCoRT employs a variety of well-known mapping, classification, and alignment methods to classify reads. The user can select the underlying classification method and its parameters, allowing adaptation to different scenarios. Based on our investigation of various methods and parameters using synthetic human gut and oral microbiomes, and on assessment of publicly available data, we provide recommendations for typical datasets with short and long reads. CONCLUSIONS: To decontaminate a human gut microbiome with short reads using HoCoRT, we found the optimal combination of speed and accuracy with BioBloom, Bowtie2 in end-to-end mode, and HISAT2. Kraken2 consistently demonstrated the highest speed, albeit with a trade-off in accuracy. The same applies to an oral microbiome, but here Bowtie2 was notably slower than the other tools. For long reads, the detection of human host reads is more difficult. In this case, a combination of Kraken2 and Minimap2 achieved the highest accuracy and detected 59% of human reads. In comparison to the dedicated DeconSeq tool, HoCoRT using Bowtie2 in end-to-end mode proved considerably faster and slightly more accurate. HoCoRT is available as a Bioconda package, and the source code can be accessed at https://github.com/ignasrum/hocort along with the documentation. It is released under the MIT licence and is compatible with Linux and macOS (except for the BioBloom module).


Subject(s)
Microbiota , Software , Humans , Metagenome , Sequence Analysis, DNA/methods , Microbiota/genetics , High-Throughput Nucleotide Sequencing/methods
5.
PLoS One ; 18(7): e0286330, 2023.
Article in English | MEDLINE | ID: mdl-37467208

ABSTRACT

Many high-throughput sequencing datasets can be represented as objects with coordinates along a reference genome. Currently, biological investigations often involve a large number of such datasets, for example representing different cell types or epigenetic factors. Drawing overall conclusions from a large collection of results for individual datasets may be challenging and time-consuming. Meaningful interpretation often requires the results to be aggregated according to metadata that represents biological characteristics of interest. In this light, we here propose the hierarchical Genomic Suite HyperBrowser (hGSuite), an open-source extension to the GSuite HyperBrowser platform, which aims to provide a means for extracting key results from an aggregated collection of high-throughput DNA sequencing data. The hGSuite utilizes a metadata-informed data cube to calculate various statistics across the multiple dimensions of the datasets. With this work, we show that the hGSuite and its associated data cube methodology offers a quick and accessible way for exploratory analysis of large genomic datasets. The web-based toolkit named hGsuite Hyperbrowser is available at https://hyperbrowser.uio.no/hgsuite under a GPLv3 license.


Subject(s)
Metadata , Software , Genomics/methods , Genome , Internet
6.
Genome Biol ; 23(1): 247, 2022 11 30.
Article in English | MEDLINE | ID: mdl-36451166

ABSTRACT

DNA loop extrusion emerges as a key process establishing genome structure and function. We introduce MoDLE, a computational tool for fast, stochastic modeling of molecular contacts from DNA loop extrusion capable of simulating realistic contact patterns genome wide in a few minutes. MoDLE accurately simulates contact maps in concordance with existing molecular dynamics approaches and with Micro-C data and does so orders of magnitude faster than existing approaches. MoDLE runs efficiently on machines ranging from laptops to high performance computing clusters and opens up for exploratory and predictive modeling of 3D genome structure in a wide range of settings.


Subject(s)
DNA
7.
Bioinformatics ; 38(17): 4230-4232, 2022 09 02.
Article in English | MEDLINE | ID: mdl-35852318

ABSTRACT

MOTIVATION: Adaptive immune receptor (AIR) repertoires (AIRRs) record past immune encounters with exquisite specificity. Therefore, identifying identical or similar AIR sequences across individuals is a key step in AIRR analysis for revealing convergent immune response patterns that may be exploited for diagnostics and therapy. Existing methods for quantifying AIRR overlap scale poorly with increasing dataset numbers and sizes. To address this limitation, we developed CompAIRR, which enables ultra-fast computation of AIRR overlap, based on either exact or approximate sequence matching. RESULTS: CompAIRR improves computational speed 1000-fold relative to the state of the art and uses only one-third of the memory: on the same machine, the exact pairwise AIRR overlap of 104 AIRRs with 105 sequences is found in ∼17 min, while the fastest alternative tool requires 10 days. CompAIRR has been integrated with the machine learning ecosystem immuneML to speed up commonly used AIRR-based machine learning applications. AVAILABILITY AND IMPLEMENTATION: CompAIRR code and documentation are available at https://github.com/uio-bmi/compairr. Docker images are available at https://hub.docker.com/r/torognes/compairr. The code to replicate the synthetic datasets, scripts for benchmarking and creating figures, and all raw data underlying the figures are available at https://github.com/uio-bmi/compairr-benchmarking. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Ecosystem , Software , Humans , Machine Learning , Benchmarking
8.
Scand J Immunol ; 94(1): e13050, 2021 Jul.
Article in English | MEDLINE | ID: mdl-34643957

ABSTRACT

C-type lectin-like domain family 16 member A (CLEC16A) is associated with autoimmune disorders, including multiple sclerosis (MS), but its functional relevance is not completely understood. CLEC16A is expressed in several immune cells, where it affects autophagic processes and receptor expression. Recently, we reported that the risk genotype of an MS-associated single nucleotide polymorphism in CLEC16A intron 19 is associated with higher expression of CLEC16A in CD4+ T cells. Here, we show that CLEC16A expression is induced in CD4+ T cells upon T cell activation. By the use of imaging flow cytometry and confocal microscopy, we demonstrate that CLEC16A is located in Rab4a-positive recycling endosomes in Jurkat TAg T cells. CLEC16A knock-down in Jurkat cells resulted in lower cell surface expression of the T cell receptor, however, this did not have a major impact on T cell activation response in vitro in Jurkat nor in human, primary CD4+ T cells.


Subject(s)
CD4-Positive T-Lymphocytes/immunology , Genetic Predisposition to Disease/genetics , Lectins, C-Type/genetics , Monosaccharide Transport Proteins/genetics , Multiple Sclerosis/genetics , Receptors, Antigen, T-Cell/biosynthesis , rab4 GTP-Binding Proteins/metabolism , Cell Line, Tumor , Endosomes/metabolism , Flow Cytometry , Humans , Jurkat Cells , Lymphocyte Activation/immunology , Microscopy, Confocal , Multiple Sclerosis/immunology , Polymorphism, Single Nucleotide/genetics
9.
BMC Cancer ; 21(1): 930, 2021 Aug 18.
Article in English | MEDLINE | ID: mdl-34407780

ABSTRACT

BACKGROUND: Colorectal cancer (CRC) screening reduces CRC incidence and mortality. However, current screening methods are either hampered by invasiveness or suboptimal performance, limiting their effectiveness as primary screening methods. To aid in the development of a non-invasive screening test with improved sensitivity and specificity, we have initiated a prospective biomarker study (CRCbiome), nested within a large randomized CRC screening trial in Norway. We aim to develop a microbiome-based classification algorithm to identify advanced colorectal lesions in screening participants testing positive for an immunochemical fecal occult blood test (FIT). We will also examine interactions with host factors, diet, lifestyle and prescription drugs. The prospective nature of the study also enables the analysis of changes in the gut microbiome following the removal of precancerous lesions. METHODS: The CRCbiome study recruits participants enrolled in the Bowel Cancer Screening in Norway (BCSN) study, a randomized trial initiated in 2012 comparing once-only sigmoidoscopy to repeated biennial FIT, where women and men aged 50-74 years at study entry are invited to participate. Since 2017, participants randomized to FIT screening with a positive test result have been invited to join the CRCbiome study. Self-reported diet, lifestyle and demographic data are collected prior to colonoscopy after the positive FIT-test (baseline). Screening data, including colonoscopy findings are obtained from the BCSN database. Fecal samples for gut microbiome analyses are collected both before and 2 and 12 months after colonoscopy. Samples are analyzed using metagenome sequencing, with taxonomy profiles, and gene and pathway content as primary measures. CRCbiome data will also be linked to national registries to obtain information on prescription histories and cancer relevant outcomes occurring during the 10 year follow-up period. DISCUSSION: The CRCbiome study will increase our understanding of how the gut microbiome, in combination with lifestyle and environmental factors, influences the early stages of colorectal carcinogenesis. This knowledge will be crucial to develop microbiome-based screening tools for CRC. By evaluating biomarker performance in a screening setting, using samples from the target population, the generalizability of the findings to future screening cohorts is likely to be high. TRIAL REGISTRATION: ClinicalTrials.gov Identifier: NCT01538550 .


Subject(s)
Colorectal Neoplasms/diagnosis , Early Detection of Cancer/methods , Gastrointestinal Microbiome , Life Style , Aged , Case-Control Studies , Colonoscopy , Colorectal Neoplasms/epidemiology , Colorectal Neoplasms/microbiology , Female , Follow-Up Studies , Humans , Incidence , Male , Middle Aged , Norway/epidemiology , Occult Blood , Prognosis , Prospective Studies , ROC Curve
10.
Bioinformatics ; 38(1): 267-269, 2021 12 22.
Article in English | MEDLINE | ID: mdl-34244702

ABSTRACT

MOTIVATION: Previously we presented swarm, an open-source amplicon clustering programme that produces fine-scale molecular operational taxonomic units (OTUs) that are free of arbitrary global clustering thresholds. Here, we present swarm v3 to address issues of contemporary datasets that are growing towards tera-byte sizes. RESULTS: When compared with previous swarm versions, swarm v3 has modernized C++ source code, reduced memory footprint by up to 50%, optimized CPU-usage and multithreading (more than 7 times faster with default parameters), and it has been extensively tested for its robustness and logic. AVAILABILITY AND IMPLEMENTATION: Source code and binaries are available at https://github.com/torognes/swarm. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Software , Cluster Analysis
12.
Microbiome ; 9(1): 79, 2021 03 29.
Article in English | MEDLINE | ID: mdl-33781324

ABSTRACT

BACKGROUND: Studies of shifts in microbial community composition has many applications. For studies at species or subspecies levels, the 16S amplicon sequencing lacks resolution and is often replaced by full shotgun sequencing. Due to higher costs, this restricts the number of samples sequenced. As an alternative to a full shotgun sequencing we have investigated the use of Reduced Metagenome Sequencing (RMS) to estimate the composition of a microbial community. This involves the use of double-digested restriction-associated DNA sequencing, which means only a smaller fraction of the genomes are sequenced. The read sets obtained by this approach have properties different from both amplicon and shotgun data, and analysis pipelines for both can either not be used at all or not explore the full potential of RMS data. RESULTS: We suggest a procedure for analyzing such data, based on fragment clustering and the use of a constrained ordinary least square de-convolution for estimating the relative abundance of all community members. Mock community datasets show the potential to clearly separate strains even when the 16S is 100% identical, and genome-wide differences is < 0.02, indicating RMS has a very high resolution. From a simulation study, we compare RMS to shotgun sequencing and show that we get improved abundance estimates when the community has many very closely related genomes. From a real dataset of infant guts, we show that RMS is capable of detecting a strain diversity gradient for Escherichia coli across time. CONCLUSION: We find that RMS is a good alternative to either metabarcoding or shotgun sequencing when it comes to resolving microbial communities at the strain level. Like shotgun metagenomics, it requires a good database of reference genomes and is well suited for studies of the human gut or other communities where many reference genomes exist. A data analysis pipeline is offered, as an R package at https://github.com/larssnip/microRMS . Video abstract.


Subject(s)
Metagenome , Microbiota , Humans , Metagenomics , Microbiota/genetics , RNA, Ribosomal, 16S/genetics , Sequence Analysis, DNA
13.
Comput Struct Biotechnol J ; 18: 2877-2889, 2020.
Article in English | MEDLINE | ID: mdl-33163148

ABSTRACT

DNA methylation (5mC) and hydroxymethylation (5hmC) are chemical modifications of cytosine bases which play a crucial role in epigenetic gene regulation. However, cost, data complexity and unavailability of comprehensive analytical tools is one of the major challenges in exploring these epigenetic marks. Hydroxymethylation-and Methylation-Sensitive Tag sequencing (HMST-seq) is one of the most cost-effective techniques that enables simultaneous detection of 5mC and 5hmC at single base pair resolution. We present HMST-Seq-Analyzer as a comprehensive and robust method for performing simultaneous differential methylation analysis on 5mC and 5hmC data sets. HMST-Seq-Analyzer can detect Differentially Methylated Regions (DMRs), annotate them, give a visual overview of methylation status and also perform preliminary quality check on the data. In addition to HMST-Seq, our tool can be used on whole-genome bisulfite sequencing (WGBS) and reduced representation bisulfite sequencing (RRBS) data sets as well. The tool is written in Python with capacity to process data in parallel and is available at (https://hmst-seq.github.io/hmst/).

14.
BMC Bioinformatics ; 21(1): 66, 2020 Feb 21.
Article in English | MEDLINE | ID: mdl-32085722

ABSTRACT

BACKGROUND: Advances in whole genome sequencing strategies have provided the opportunity for genomic and comparative genomic analysis of a vast variety of organisms. The analysis results are highly dependent on the quality of the genome assemblies used. Assessment of the assembly accuracy may significantly increase the reliability of the analysis results and is therefore of great importance. RESULTS: Here, we present a new tool called NucBreak aimed at localizing structural errors in assemblies, including insertions, deletions, duplications, inversions, and different inter- and intra-chromosomal rearrangements. The approach taken by existing alternative tools is based on analysing reads that do not map properly to the assembly, for instance discordantly mapped reads, soft-clipped reads and singletons. NucBreak uses an entirely different and unique method to localise the errors. It is based on analysing the alignments of reads that are properly mapped to an assembly and exploit information about the alternative read alignments. It does not annotate detected errors. We have compared NucBreak with other existing assembly accuracy assessment tools, namely Pilon, REAPR, and FRCbam as well as with several structural variant detection tools, including BreakDancer, Lumpy, and Wham, by using both simulated and real datasets. CONCLUSIONS: The benchmarking results have shown that NucBreak in general predicts assembly errors of different types and sizes with relatively high sensitivity and with lower false discovery rate than the other tools. Such a balance between sensitivity and false discovery rate makes NucBreak a good alternative to the existing assembly accuracy assessment tools and SV detection tools. NucBreak is freely available at https://github.com/uio-bmi/NucBreak under the MPL license.


Subject(s)
Genomics/methods , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods , Genome , Reproducibility of Results , Software
15.
NAR Cancer ; 2(3): zcaa019, 2020 Sep.
Article in English | MEDLINE | ID: mdl-33554121

ABSTRACT

In B lymphocytes, the uracil N-glycosylase (UNG) excises genomic uracils made by activation-induced deaminase (AID), thus underpinning antibody gene diversification and oncogenic chromosomal translocations, but also initiating faithful DNA repair. Ung-/- mice develop B-cell lymphoma (BCL). However, since UNG has anti- and pro-oncogenic activities, its tumor suppressor relevance is unclear. Moreover, how the constant DNA damage and repair caused by the AID and UNG interplay affects B-cell fitness and thereby the dynamics of cell populations in vivo is unknown. Here, we show that UNG specifically protects the fitness of germinal center B cells, which express AID, and not of any other B-cell subset, coincident with AID-induced telomere damage activating p53-dependent checkpoints. Consistent with AID expression being detrimental in UNG-deficient B cells, Ung-/- mice develop BCL originating from activated B cells but lose AID expression in the established tumor. Accordingly, we find that UNG is rarely lost in human BCL. The fitness preservation activity of UNG contingent to AID expression was confirmed in a B-cell leukemia model. Hence, UNG, typically considered a tumor suppressor, acquires tumor-enabling activity in cancer cell populations that express AID by protecting cell fitness.

16.
Sci Rep ; 7(1): 7199, 2017 08 03.
Article in English | MEDLINE | ID: mdl-28775312

ABSTRACT

Both a DNA lesion and an intermediate for antibody maturation, uracil is primarily processed by base excision repair (BER), either initiated by uracil-DNA glycosylase (UNG) or by single-strand selective monofunctional uracil DNA glycosylase (SMUG1). The relative in vivo contributions of each glycosylase remain elusive. To assess the impact of SMUG1 deficiency, we measured uracil and 5-hydroxymethyluracil, another SMUG1 substrate, in Smug1 -/- mice. We found that 5-hydroxymethyluracil accumulated in Smug1 -/- tissues and correlated with 5-hydroxymethylcytosine levels. The highest increase was found in brain, which contained about 26-fold higher genomic 5-hydroxymethyluracil levels than the wild type. Smug1 -/- mice did not accumulate uracil in their genome and Ung -/- mice showed slightly elevated uracil levels. Contrastingly, Ung -/- Smug1 -/- mice showed a synergistic increase in uracil levels with up to 25-fold higher uracil levels than wild type. Whole genome sequencing of UNG/SMUG1-deficient tumours revealed that combined UNG and SMUG1 deficiency leads to the accumulation of mutations, primarily C to T transitions within CpG sequences. This unexpected sequence bias suggests that CpG dinucleotides are intrinsically more mutation prone. In conclusion, we showed that SMUG1 efficiently prevent genomic uracil accumulation, even in the presence of UNG, and identified mutational signatures associated with combined UNG and SMUG1 deficiency.


Subject(s)
Cytosine/metabolism , Dinucleoside Phosphates/metabolism , Uracil-DNA Glycosidase/deficiency , Uracil/metabolism , Animals , CpG Islands , Deamination , Genome , Genomics/methods , Mice , Mice, Knockout , Mutation
17.
BMC Bioinformatics ; 18(1): 338, 2017 Jul 12.
Article in English | MEDLINE | ID: mdl-28701187

ABSTRACT

BACKGROUND: Comparing sets of sequences is a situation frequently encountered in bioinformatics, examples being comparing an assembly to a reference genome, or two genomes to each other. The purpose of the comparison is usually to find where the two sets differ, e.g. to find where a subsequence is repeated or deleted, or where insertions have been introduced. Such comparisons can be done using whole-genome alignments. Several tools for making such alignments exist, but none of them 1) provides detailed information about the types and locations of all differences between the two sets of sequences, 2) enables visualisation of alignment results at different levels of detail, and 3) carefully takes genomic repeats into consideration. RESULTS: We here present NucDiff, a tool aimed at locating and categorizing differences between two sets of closely related DNA sequences. NucDiff is able to deal with very fragmented genomes, repeated sequences, and various local differences and structural rearrangements. NucDiff determines differences by a rigorous analysis of alignment results obtained by the NUCmer, delta-filter and show-snps programs in the MUMmer sequence alignment package. All differences found are categorized according to a carefully defined classification scheme covering all possible differences between two sequences. Information about the differences is made available as GFF3 files, thus enabling visualisation using genome browsers as well as usage of the results as a component in an analysis pipeline. NucDiff was tested with varying parameters for the alignment step and compared with existing alternatives, called QUAST and dnadiff. CONCLUSIONS: We have developed a whole genome alignment difference classification scheme together with the program NucDiff for finding such differences. The proposed classification scheme is comprehensive and can be used by other tools. NucDiff performs comparably to QUAST and dnadiff but gives much more detailed results that can easily be visualized. NucDiff is freely available on https://github.com/uio-cels/NucDiff under the MPL license.


Subject(s)
DNA/chemistry , User-Computer Interface , Base Sequence , Genomics , Internet , Sequence Alignment
18.
mSystems ; 1(1)2016.
Article in English | MEDLINE | ID: mdl-27822515

ABSTRACT

Sequence clustering is a common early step in amplicon-based microbial community analysis, when raw sequencing reads are clustered into operational taxonomic units (OTUs) to reduce the run time of subsequent analysis steps. Here, we evaluated the performance of recently released state-of-the-art open-source clustering software products, namely, OTUCLUST, Swarm, SUMACLUST, and SortMeRNA, against current principal options (UCLUST and USEARCH) in QIIME, hierarchical clustering methods in mothur, and USEARCH's most recent clustering algorithm, UPARSE. All the latest open-source tools showed promising results, reporting up to 60% fewer spurious OTUs than UCLUST, indicating that the underlying clustering algorithm can vastly reduce the number of these derived OTUs. Furthermore, we observed that stringent quality filtering, such as is done in UPARSE, can cause a significant underestimation of species abundance and diversity, leading to incorrect biological results. Swarm, SUMACLUST, and SortMeRNA have been included in the QIIME 1.9.0 release. IMPORTANCE Massive collections of next-generation sequencing data call for fast, accurate, and easily accessible bioinformatics algorithms to perform sequence clustering. A comprehensive benchmark is presented, including open-source tools and the popular USEARCH suite. Simulated, mock, and environmental communities were used to analyze sensitivity, selectivity, species diversity (alpha and beta), and taxonomic composition. The results demonstrate that recent clustering algorithms can significantly improve accuracy and preserve estimated diversity without the application of aggressive filtering. Moreover, these tools are all open source, apply multiple levels of multithreading, and scale to the demands of modern next-generation sequencing data, which is essential for the analysis of massive multidisciplinary studies such as the Earth Microbiome Project (EMP) (J. A. Gilbert, J. K. Jansson, and R. Knight, BMC Biol 12:69, 2014, http://dx.doi.org/10.1186/s12915-014-0069-1).

19.
PeerJ ; 4: e2584, 2016.
Article in English | MEDLINE | ID: mdl-27781170

ABSTRACT

BACKGROUND: VSEARCH is an open source and free of charge multithreaded 64-bit tool for processing and preparing metagenomics, genomics and population genomics nucleotide sequence data. It is designed as an alternative to the widely used USEARCH tool (Edgar, 2010) for which the source code is not publicly available, algorithm details are only rudimentarily described, and only a memory-confined 32-bit version is freely available for academic use. METHODS: When searching nucleotide sequences, VSEARCH uses a fast heuristic based on words shared by the query and target sequences in order to quickly identify similar sequences, a similar strategy is probably used in USEARCH. VSEARCH then performs optimal global sequence alignment of the query against potential target sequences, using full dynamic programming instead of the seed-and-extend heuristic used by USEARCH. Pairwise alignments are computed in parallel using vectorisation and multiple threads. RESULTS: VSEARCH includes most commands for analysing nucleotide sequences available in USEARCH version 7 and several of those available in USEARCH version 8, including searching (exact or based on global alignment), clustering by similarity (using length pre-sorting, abundance pre-sorting or a user-defined order), chimera detection (reference-based or de novo), dereplication (full length or prefix), pairwise alignment, reverse complementation, sorting, and subsampling. VSEARCH also includes commands for FASTQ file processing, i.e., format detection, filtering, read quality statistics, and merging of paired reads. Furthermore, VSEARCH extends functionality with several new commands and improvements, including shuffling, rereplication, masking of low-complexity sequences with the well-known DUST algorithm, a choice among different similarity definitions, and FASTQ file format conversion. VSEARCH is here shown to be more accurate than USEARCH when performing searching, clustering, chimera detection and subsampling, while on a par with USEARCH for paired-ends read merging. VSEARCH is slower than USEARCH when performing clustering and chimera detection, but significantly faster when performing paired-end reads merging and dereplication. VSEARCH is available at https://github.com/torognes/vsearch under either the BSD 2-clause license or the GNU General Public License version 3.0. DISCUSSION: VSEARCH has been shown to be a fast, accurate and full-fledged alternative to USEARCH. A free and open-source versatile tool for sequence analysis is now available to the metagenomics community.

20.
BMC Genomics ; 17(1): 791, 2016 10 10.
Article in English | MEDLINE | ID: mdl-27724857

ABSTRACT

BACKGROUND: As an intracellular human pathogen, Mycobacterium tuberculosis (Mtb) is facing multiple stressful stimuli inside the macrophage and the granuloma. Understanding Mtb responses to stress is essential to identify new virulence factors and pathways that play a role in the survival of the tubercle bacillus. The main goal of this study was to map the regulatory networks of differentially expressed (DE) transcripts in Mtb upon various forms of genotoxic stress. We exposed Mtb cells to oxidative (H2O2 or paraquat), nitrosative (DETA/NO), or alkylation (MNNG) stress or mitomycin C, inducing double-strand breaks in the DNA. Total RNA was isolated from treated and untreated cells and subjected to high-throughput deep sequencing. The data generated was analysed to identify DE genes encoding mRNAs, non-coding RNAs (ncRNAs), and the genes potentially targeted by ncRNAs. RESULTS: The most significant transcriptomic alteration with more than 700 DE genes was seen under nitrosative stress. In addition to genes that belong to the replication, recombination and repair (3R) group, mainly found under mitomycin C stress, we identified DE genes important for bacterial virulence and survival, such as genes of the type VII secretion system (T7SS) and the proline-glutamic acid/proline-proline-glutamic acid (PE/PPE) family. By predicting the structures of hypothetical proteins (HPs) encoded by DE genes, we found that some of these HPs might be involved in mycobacterial genome maintenance. We also applied a state-of-the-art method to predict potential target genes of the identified ncRNAs and found that some of these could regulate several genes that might be directly involved in the response to genotoxic stress. CONCLUSIONS: Our study reflects the complexity of the response of Mtb in handling genotoxic stress. In addition to genes involved in genome maintenance, other potential key players, such as the members of the T7SS and PE/PPE gene family, were identified. This plethora of responses is detected not only at the level of DE genes encoding mRNAs but also at the level of ncRNAs and their potential targets.


Subject(s)
DNA Damage , Gene Expression Regulation, Bacterial/drug effects , Mycobacterium tuberculosis/genetics , Transcriptome , Cluster Analysis , DNA Damage/drug effects , Gene Expression Profiling , Humans , Hydrogen Peroxide/toxicity , Methylnitronitrosoguanidine/toxicity , Mycobacterium tuberculosis/drug effects , Type VII Secretion Systems/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...