Search | VHL Regional Portal

DELIMINATE--a fast and efficient method for loss-less compression of genomic sequences: sequence analysis.

Mohammed, Monzoorul Haque; Dutta, Anirban; Bose, Tungadri; Chadaram, Sudha; Mande, Sharmila S.

Bioinformatics ; 28(19): 2527-9, 2012 Oct 01.

Article in English | MEDLINE | ID: mdl-22833526

ABSTRACT

SUMMARY: An unprecedented quantity of genome sequence data is currently being generated using next-generation sequencing platforms. This has necessitated the development of novel bioinformatics approaches and algorithms that not only facilitate a meaningful analysis of these data but also aid in efficient compression, storage, retrieval and transmission of huge volumes of the generated data. We present a novel compression algorithm (DELIMINATE) that can rapidly compress genomic sequence data in a loss-less fashion. Validation results indicate relatively higher compression efficiency of DELIMINATE when compared with popular general purpose compression algorithms, namely, gzip, bzip2 and lzma. AVAILABILITY AND IMPLEMENTATION: Linux, Windows and Mac implementations (both 32 and 64-bit) of DELIMINATE are freely available for download at: http://metagenomics.atc.tcs.com/compression/DELIMINATE. CONTACT: sharmila@atc.tcs.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Algorithms , Computational Biology/methods , Data Compression/methods , Genomics/methods , Base Sequence , Sequence Analysis, DNA/methods

Eu-Detect: an algorithm for detecting eukaryotic sequences in metagenomic data sets.

Mohammed, Monzoorul Haque; Chadaram, Sudha; Komanduri, Dinakar; Ghosh, Tarini Shankar; Mande, Sharmila S.

J Biosci ; 36(4): 709-17, 2011 Sep.

Article in English | MEDLINE | ID: mdl-21857117

ABSTRACT

Physical partitioning techniques are routinely employed (during sample preparation stage) for segregating the prokaryotic and eukaryotic fractions of metagenomic samples. In spite of these efforts, several metagenomic studies focusing on bacterial and archaeal populations have reported the presence of contaminating eukaryotic sequences in metagenomic data sets. Contaminating sequences originate not only from genomes of micro-eukaryotic species but also from genomes of (higher) eukaryotic host cells. The latter scenario usually occurs in the case of host-associated metagenomes. Identification and removal of contaminating sequences is important, since these sequences not only impact estimates of microbial diversity but also affect the accuracy of several downstream analyses. Currently, the computational techniques used for identifying contaminating eukaryotic sequences, being alignment based, are slow, inefficient, and require huge computing resources. In this article, we present Eu-Detect, an alignment-free algorithm that can rapidly identify eukaryotic sequences contaminating metagenomic data sets. Validation results indicate that on a desktop with modest hardware specifications, the Eu-Detect algorithm is able to rapidly segregate DNA sequence fragments of prokaryotic and eukaryotic origin, with high sensitivity. A Web server for the Eu-Detect algorithm is available at http://metagenomics.atc.tcs.com/Eu-Detect/.

Subject(s)

Algorithms , Genome, Archaeal , Genome, Bacterial , Metagenome , Metagenomics/methods , Sequence Analysis, DNA/methods , Artifacts , Base Sequence , Eukaryota/genetics , Sensitivity and Specificity , Software

HabiSign: a novel approach for comparison of metagenomes and rapid identification of habitat-specific sequences.

Ghosh, Tarini Shankar; Mohammed, Monzoorul Haque; Rajasingh, Hannah; Chadaram, Sudha; Mande, Sharmila S.

BMC Bioinformatics ; 12 Suppl 13: S9, 2011.

Article in English | MEDLINE | ID: mdl-22373355

ABSTRACT

BACKGROUND: One of the primary goals of comparative metagenomic projects is to study the differences in the microbial communities residing in diverse environments. Besides providing valuable insights into the inherent structure of the microbial populations, these studies have potential applications in several important areas of medical research like disease diagnostics, detection of pathogenic contamination and identification of hitherto unknown pathogens. Here we present a novel and rapid, alignment-free method called HabiSign, which utilizes patterns of tetra-nucleotide usage in microbial genomes to bring out the differences in the composition of both diverse and related microbial communities. RESULTS: Validation results show that the metagenomic signatures obtained using the HabiSign method are able to accurately cluster metagenomes at biome, phenotypic and species levels, as compared to an average tetranucleotide frequency based approach and the recently published dinucleotide relative abundance based approach. More importantly, the method is able to identify subsets of sequences that are specific to a particular habitat. Apart from this, being alignment-free, the method can rapidly compare and group multiple metagenomic data sets in a short span of time. CONCLUSIONS: The proposed method is expected to have immense applicability in diverse areas of metagenomic research ranging from disease diagnostics and pathogen detection to bio-prospecting. A web-server for the HabiSign algorithm is available at http://metagenomics.atc.tcs.com/HabiSign/.

Subject(s)

Algorithms , Bacterial Typing Techniques , Metagenome , Animals , Metagenomics

i-rDNA: alignment-free algorithm for rapid in silico detection of ribosomal gene fragments from metagenomic sequence data sets.

Mohammed, Monzoorul Haque; Ghosh, Tarini Shankar; Chadaram, Sudha; Mande, Sharmila S.

BMC Genomics ; 12 Suppl 3: S12, 2011 Nov 30.

Article in English | MEDLINE | ID: mdl-22369265

ABSTRACT

BACKGROUND: Obtaining accurate estimates of microbial diversity using rDNA profiling is the first step in most metagenomics projects. Consequently, most metagenomic projects spend considerable amounts of time, money and manpower for experimentally cloning, amplifying and sequencing the rDNA content in a metagenomic sample. In the second step, the entire genomic content of the metagenome is extracted, sequenced and analyzed. Since DNA sequences obtained in this second step also contain rDNA fragments, rapid in silico identification of these rDNA fragments would drastically reduce the cost, time and effort of current metagenomic projects by entirely bypassing the experimental steps of primer based rDNA amplification, cloning and sequencing. In this study, we present an algorithm called i-rDNA that can facilitate the rapid detection of 16S rDNA fragments from amongst millions of sequences in metagenomic data sets with high detection sensitivity. RESULTS: Performance evaluation with data sets/database variants simulating typical metagenomic scenarios indicates the significantly high detection sensitivity of i-rDNA. Moreover, i-rDNA can process a million sequences in less than an hour on a simple desktop with modest hardware specifications. CONCLUSIONS: In addition to the speed of execution, high sensitivity and low false positive rate, the utility of the algorithmic approach discussed in this paper is immense given that it would help in bypassing the entire experimental step of primer-based rDNA amplification, cloning and sequencing. Application of this algorithmic approach would thus drastically reduce the cost, time and human efforts invested in all metagenomic projects. AVAILABILITY: A web-server for the i-rDNA algorithm is available at http://metagenomics.atc.tcs.com/i-rDNA/

Subject(s)

Algorithms , Metagenomics , RNA, Ribosomal, 16S/genetics , Cloning, Molecular , Databases, Genetic , Internet , Search Engine , Sequence Analysis, RNA

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL