Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 18 de 18
Filter
Add more filters










Publication year range
1.
Front Microbiol ; 14: 1238829, 2023.
Article in English | MEDLINE | ID: mdl-37744900

ABSTRACT

Background: Multiple variants of the SARS-CoV-2 virus have plagued the world through successive waves of infection over the past three years. Independent research groups across geographies have shown that the microbiome composition in COVID-19 positive patients (CP) differs from that of COVID-19 negative individuals (CN). However, these observations were based on limited-sized sample-sets collected primarily from the early days of the pandemic. Here, we study the nasopharyngeal microbiota in COVID-19 patients, wherein the samples have been collected across the three COVID-19 waves witnessed in India, which were driven by different variants of concern. Methods: The nasopharyngeal swabs were collected from 589 subjects providing samples for diagnostics purposes at the Centre for Cellular and Molecular Biology (CSIR-CCMB), Hyderabad, India and subjected to 16s rRNA gene amplicon - based sequencing. Findings: We found variations in the microbiota of symptomatic vs. asymptomatic COVID-19 patients. CP showed a marked shift in the microbial diversity and composition compared to CN, in a wave-dependent manner. Rickettsiaceae was the only family that was noted to be consistently depleted in CP samples across the waves. The genera Staphylococcus, Anhydrobacter, Thermus, and Aerococcus were observed to be highly abundant in the symptomatic CP patients when compared to the asymptomatic group. In general, we observed a decrease in the burden of opportunistic pathogens in the host microbiota during the later waves of infection. Interpretation: To our knowledge, this is the first analytical cross-sectional study of this scale, which was designed to understand the relation between the evolving nature of the virus and the changes in the human nasopharyngeal microbiota. Although no clear signatures were observed, this study shall pave the way for a better understanding of the disease pathophysiology and help gather preliminary evidence on whether interventions to the host microbiota can help in better protection or faster recovery.

2.
Curr Res Microb Sci ; 3: 100127, 2022.
Article in English | MEDLINE | ID: mdl-35909605

ABSTRACT

Gut health is intimately linked to dietary habits and the microbial community (microbiota) that flourishes within. The delicate dependency of the latter on nutritional availability is also strongly influenced by interactions (such as, parasitic or mutualistic) between the resident microbes, often affecting their growth rate and ability to produce key metabolites. Since, cultivating the entire repertoire of gut microbes is a challenging task, metabolic models (genome-based metabolic reconstructions) could be employed to predict their growth patterns and interactions. Here, we have used 803 gut microbial metabolic models from the Virtual Metabolic Human repository, and subsequently optimized and simulated them to grow on 13 dietary compositions. The presented pairwise interaction data (https://osf.io/ay8bq/) and the associated bacterial growth rates are expected to be useful for (a) deducing microbial association patterns, (b) diet-based inference of personalised gut profiles, and (c) as a steppingstone for studying multi-species metabolic interactions.

3.
Appl Environ Microbiol ; 88(15): e0059622, 2022 08 09.
Article in English | MEDLINE | ID: mdl-35862686

ABSTRACT

The human microbiota, which comprises an ensemble of taxonomically and functionally diverse but often mutually cooperating microorganisms, benefits its host by shaping the host immunity, energy harvesting, and digestion of complex carbohydrates as well as production of essential nutrients. Dysbiosis in the human microbiota, especially the gut microbiota, has been reported to be linked to several diseases and metabolic disorders. Recent studies have further indicated that tracking these dysbiotic variations could potentially be exploited as biomarkers of disease states. However, the human microbiota is not geography agnostic, and hence a taxonomy-based (microbiome) biomarker for disease diagnostics has certain limitations. In comparison, (microbiome) function-based biomarkers are expected to have a wider applicability. Given that (i) the host physiology undergoes certain changes in the course of a disease and (ii) host-associated microbial communities need to adapt to this changing microenvironment of their host, we hypothesized that signatures emanating from the abundance of bacterial proteins associated with the signal transduction system (herein referred to as sensory proteins [SPs]) might be able to distinguish between healthy and diseased states. To test this hypothesis, publicly available metagenomic data sets corresponding to three diverse health conditions, namely, colorectal cancer, type 2 diabetes mellitus, and schizophrenia, were analyzed. Results demonstrated that SP signatures (derived from host-associated metagenomic samples) indeed differentiated among healthy individual and patients suffering from diseases of various severities. Our finding was suggestive of the prospect of using SP signatures as early biomarkers for diagnosing the onset and progression of multiple diseases and metabolic disorders. IMPORTANCE The composition of the human microbiota, a collection of host-associated microbes, has been shown to differ among healthy and diseased individuals. Recent studies have investigated whether tracking these variations could be exploited for disease diagnostics. It has been noted that compared to microbial taxonomies, the ensemble of functional proteins encoded by microbial genes are less likely to be affected by changes in ethnicity and dietary preferences. These functions are expected to help the microbe adapt to changing environmental conditions. Thus, healthy individuals might harbor a different set of genes than diseased individuals. To test this hypothesis, we analyzed metagenomes from healthy and diseased individuals for signatures of a particular group of proteins called sensory proteins (SP), which enable the bacteria to sense and react to changes in their microenvironment. Results demonstrated that SP signatures indeed differentiate among healthy individuals and those suffering from diseases of various severities.


Subject(s)
Diabetes Mellitus, Type 2 , Microbiota , Biomarkers , Dysbiosis , Humans , Metagenome
4.
Eur J Nutr ; 61(2): 615-624, 2022 Mar.
Article in English | MEDLINE | ID: mdl-34613432

ABSTRACT

PURPOSE: Rice is a staple food for over 3.5 billion people worldwide. The nutritional content of rice varies with different post-harvest processing techniques. Major varieties include brown rice (BR), white rice (WR) and parboiled rice (PBR). While consumption of BR is advocated due to its higher nutritional content compared to other varieties, some studies have indicated lower post-prandial blood glucose (PPBG) levels when PBR is consumed. This apparent benefit of PBR consumption is not well publicised and no commentaries on underlying mechanisms are available in literature. METHODS: In this review, we looked into differential nutrient content of PBR, as compared to BR and WR, and tried to understand how their consumption could be associated with glycaemic control. Various roles played by these nutrients in mechanisms of insulin secretion, insulin resistance, nutrient absorption and T2DM-associated inflammation were reviewed from literature-based evidence. RESULTS: We report differential nutritional factors in PBR, with respect to BR (and WR), such as higher calcium and selenium content, lower phytic acids, and enriched vitamin B6 which might aid PBR's ability to provide better glycaemic control than BR. CONCLUSION: Our interpretation of reviewed literature leads us to suggest the possible benefits of PBR consumption in glycaemic control and its inclusion as the preferred rice variant in diets of T2DM patients and at-risk individuals.


Subject(s)
Oryza , Blood Glucose , Diet , Glycemic Control , Humans
5.
Virus Res ; 305: 198579, 2021 11.
Article in English | MEDLINE | ID: mdl-34560183

ABSTRACT

The SARS-CoV2 mediated Covid-19 pandemic has impacted humankind at an unprecedented scale. While substantial research efforts have focused towards understanding the mechanisms of viral infection and developing vaccines/ therapeutics, factors affecting the susceptibility to SARS-CoV2 infection and manifestation of Covid-19 remain less explored. Given that the Human Leukocyte Antigen (HLA) system is known to vary among ethnic populations, it is likely to affect the recognition of the virus, and in turn, the susceptibility to Covid-19. To understand this, we used bioinformatic tools to probe all SARS-CoV2 peptides which could elicit T-cell response in humans. We also tried to answer the intriguing question of whether these potential epitopes were equally immunogenic across ethnicities, by studying the distribution of HLA alleles among different populations and their share of cognate epitopes. Results indicate that the immune recognition potential of SARS-CoV2 epitopes tend to vary between different ethnic groups. While the South Asians are likely to recognize higher number of CD8-specific epitopes, Europeans are likely to identify higher number of CD4-specific epitopes. We also hypothesize and provide clues that the newer mutations in SARS-CoV2 are unlikely to alter the T-cell mediated immunogenic responses among the studied ethnic populations. The work presented herein is expected to bolster our understanding of the pandemic, by providing insights into differential immunological response of ethnic populations to the virus as well as by gaging the possible effects of mutations in SARS-CoV2 on efficacy of potential epitope-based vaccines through evaluating ∼40,000 viral genomes.


Subject(s)
COVID-19/immunology , Epitopes, B-Lymphocyte/immunology , Epitopes, T-Lymphocyte/immunology , Ethnicity , Genome, Viral , HLA Antigens/immunology , SARS-CoV-2/immunology , Africa/epidemiology , Alleles , Amino Acid Sequence , Asia/epidemiology , CD4-Positive T-Lymphocytes/immunology , CD4-Positive T-Lymphocytes/virology , CD8-Positive T-Lymphocytes/immunology , CD8-Positive T-Lymphocytes/virology , COVID-19/epidemiology , COVID-19/genetics , COVID-19/pathology , Computational Biology/methods , Disease Susceptibility , Epitopes, B-Lymphocyte/classification , Epitopes, B-Lymphocyte/genetics , Epitopes, T-Lymphocyte/classification , Epitopes, T-Lymphocyte/genetics , Europe/epidemiology , HLA Antigens/classification , HLA Antigens/genetics , Humans , Middle East/epidemiology , Oceania/epidemiology , Principal Component Analysis , RNA, Viral/genetics , RNA, Viral/immunology , SARS-CoV-2/genetics , SARS-CoV-2/pathogenicity
6.
Front Mol Biosci ; 8: 669996, 2021.
Article in English | MEDLINE | ID: mdl-34381811

ABSTRACT

The ability of chaperonins to buffer mutations that affect protein folding pathways suggests that their abundance should be evolutionarily advantageous. Here, we investigate the effect of chaperonin overproduction on cellular fitness in Escherichia coli. We demonstrate that chaperonin abundance confers 1) an ability to tolerate higher temperatures, 2) improved cellular fitness, and 3) enhanced folding of metabolic enzymes, which is expected to lead to enhanced energy harvesting potential.

7.
Appl Environ Microbiol ; 86(14)2020 07 02.
Article in English | MEDLINE | ID: mdl-32385079

ABSTRACT

Signal transduction systems are essential for microorganisms to respond to their ever-changing environment. They can be distinguished into one-component systems, two-component systems, and extracytoplasmic-function σ factors. Abundances of a few signal-transducing proteins, termed herein as sensory proteins (SPs), have previously been reported to be correlated with the genome size and ecological niche of certain Gram-positive bacteria. No such reports are available for Gram-negative bacteria. The current study attempts to investigate the relationship of the abundances of SPs to genome size in Escherichia coli, and the bacterial pathotypes or phylotypes. While the relationship between SP abundance and genome size could not be established, the sensory protein index (SPI), a new metric defined herein, was found to be correlated with E. coli virulence. In addition, significant association was observed among the distribution of SPs and E. coli pathotypes. Results indicate that such associations might be due to genomic rearrangements to best utilize the resources available in a given ecological niche. Overall, the study provides an in-depth analysis of the occurrence of different SPs among pathogenic and nonpathogenic E. coli strains. Possibilities of using the SPI as a marker for identifying pathogenic strains from among an organism complex are also discussed.IMPORTANCE Sensory proteins (SPs) act as sensors and actuators for a cell and participate in important mechanisms pertaining to bacterial survival, adaptation, and virulence. Therefore, bacterial species residing in similar ecological niches or those sharing common pathotypes are expected to exhibit similar SP signatures. We have investigated profiles of SPs in different species of Escherichia coli and present in this article the sensory protein index (SPI), a metric for quantifying the abundance and/or distribution of SPs across bacterial genomes, which could indicate the virulence potency of a bacterium. The SPI could find use in characterizing uncultured strains and bacterial complexes, as a biomarker for disease diagnostics, evaluating the effect of therapeutic interventions, assessing effects of ecological alterations, etc. Grouping the studied strains of E. coli on the basis of the frequency of occurrence of SPs in their genomes could potentially replicate the stratification of these strains on the basis of their phylotypes. In addition, E. coli strains belonging to the same pathotypes were also seen to share similar SP signatures. Furthermore, the SPI was seen to be an indicator of pathogenic potency of E. coli strains. The SPI metric is expected to be useful in the (pathogenic) characterization of hereto uncultured strains which are routinely sequenced in host microbiome analysis projects, or from among an ensemble of microbial organisms constituting a biospecimen. Thus, the possibilities of using the SPI as a biomarker for diagnosis of a disease or the outcome of a therapeutic intervention cannot be ruled out. Further, SPIs obtained from longitudinal ecological samples have the potential to serve as key indicators of environmental changes. Such changes in the environment are often detrimental to the resident biome and methods for timely detection of environmental changes hold huge socioeconomic benefits.


Subject(s)
Escherichia coli Proteins/genetics , Escherichia coli/genetics , Escherichia coli/pathogenicity , Escherichia coli Proteins/metabolism , Genome, Bacterial , Virulence/genetics
8.
BMC Genomics ; 20(1): 1022, 2019 Dec 27.
Article in English | MEDLINE | ID: mdl-31881845

ABSTRACT

BACKGROUND: In 2017, World Health Organization (WHO) published a catalogue of 12 families of antibiotic-resistant "priority pathogens" that are posing the greatest threats to human health. Six of these dreaded pathogens are known to infect the human gastrointestinal system. In addition to causing gastrointestinal and systemic infections, these pathogens can also affect the composition of other microbes constituting the healthy gut microbiome. Such aberrations in gut microbiome can significantly affect human physiology and immunity. Identifying the virulence mechanisms of these enteric pathogens are likely to help in developing newer therapeutic strategies to counter them. RESULTS: Using our previously published in silico approach, we have evaluated (and compared) Host-Pathogen Protein-Protein Interaction (HPI) profiles of four groups of enteric pathogens, namely, different species of Escherichia, Shigella, Salmonella and Vibrio. Results indicate that in spite of genus/ species specific variations, most enteric pathogens possess a common repertoire of HPIs. This core set of HPIs are probably responsible for the survival of these pathogen in the harsh nutrient-limiting environment within the gut. Certain genus/ species specific HPIs were also observed. CONSLUSIONS: The identified bacterial proteins involved in the core set of HPIs are expected to be helpful in understanding the pathogenesis of these dreaded gut pathogens in greater detail. Possible role of genus/ species specific variations in the HPI profiles in the virulence of these pathogens are also discussed. The obtained results are likely to provide an opportunity for development of novel therapeutic strategies against the most dreaded gut pathogens.


Subject(s)
Bacterial Physiological Phenomena , Gastrointestinal Microbiome , Host-Pathogen Interactions , Bacterial Infections/metabolism , Bacterial Infections/microbiology , Bacterial Proteins , Computational Biology/methods , Humans , Microbial Interactions , Models, Biological , Protein Binding , Protein Interaction Mapping/methods , Protein Interaction Maps
9.
Front Microbiol ; 10: 2417, 2019.
Article in English | MEDLINE | ID: mdl-31736886

ABSTRACT

Metabolic adaptation of Mycobacterium tuberculosis (M. tuberculosis) to microbicidal intracellular environment of host macrophages is fundamental to its pathogenicity. However, an in-depth understanding of metabolic adjustments through key reaction pathways and networks is limited. To understand how such changes occur, we measured the cellular metabolome of M. tuberculosis subjected to four microbicidal stresses using liquid chromatography-mass spectrometric multiple reactions monitoring (LC-MRM/MS). Overall, 87 metabolites were identified. The metabolites best describing the separation between stresses were identified through multivariate analysis. The coupling of the metabolite measurements with existing genome-scale metabolic model, and using constraint-based simulation led to several new concepts and unreported observations in M. tuberculosis; such as (i) the high levels of released ammonia as an adaptive response to acidic stress was due to increased flux through L-asparaginase rather than urease activity; (ii) nutrient starvation-induced anaplerotic pathway for generation of TCA intermediates from phosphoenolpyruvate using phosphoenolpyruvate kinase; (iii) quenching of protons through GABA shunt pathway or sugar alcohols as possible mechanisms of early adaptation to acidic and oxidative stresses; and (iv) usage of alternate cofactors by the same enzyme as a possible mechanism of rewiring metabolic pathways to overcome stresses. Besides providing new leads and important nodes that can be used for designing intervention strategies, the study advocates the strength of applying flux balance analyses coupled with metabolomics to get a global picture of complex metabolic adjustments.

10.
BMC Genomics ; 19(1): 555, 2018 Jul 27.
Article in English | MEDLINE | ID: mdl-30053801

ABSTRACT

BACKGROUND: Mycobacterium tuberculosis infection in humans is often associated with extended period of latency. To adapt to the hostile hypoxic environment inside a macrophage, M. tuberculosis cells undergo several physiological and metabolic changes. Previous studies have mostly focused on inspecting individual facets of this complex process. In order to gain deeper insights into the infection process and to understand the coordination among different regulatory/ metabolic pathways in the pathogen, the current in silico study investigates three aspects, namely, (i) host-pathogen interactions (HPIs) between human and M. tuberculosis proteins, (ii) gene regulatory network pertaining to adaptation of M. tuberculosis to hypoxia and (iii) alterations in M. tuberculosis metabolism under hypoxic condition. Subsequently, cross-talks between these components have been probed to evaluate possible gene-regulatory events as well as HPIs which are likely to drive metabolic changes during pathogen's adaptation to the intra-host hypoxic environment. RESULTS: The newly identified HPIs suggest the pathogen's ability to subvert host mediated reactive oxygen intermediates/ reactive nitrogen intermediates (ROI/ RNI) stress as well as their potential role in modulating host cell cycle and cytoskeleton structure. The results also indicate a significantly pronounced effect of HPIs on hypoxic metabolism of M. tuberculosis. Findings from the current study underscore the necessity of investigating the infection process from a systems-level perspective incorporating different facets of intra-cellular survival of the pathogen. CONCLUSIONS: The comprehensive host-pathogen interaction network, a Boolean model of M. tuberculosis H37Rv (Mtb) hypoxic gene-regulation, as well as a genome scale metabolic model of Mtb, built for this study are expected to be useful resources for future studies on tuberculosis infection.


Subject(s)
Host-Pathogen Interactions , Mycobacterium tuberculosis/genetics , Mycobacterium tuberculosis/metabolism , Bacterial Proteins/metabolism , Cell Hypoxia , Computer Simulation , Gene Regulatory Networks , Humans , Macrophages/metabolism , Macrophages/microbiology , Protein Interaction Mapping
11.
Article in English | MEDLINE | ID: mdl-28469995

ABSTRACT

Serotype O157:H7, an enterohemorrhagic Escherichia coli (EHEC), is known to cause gastrointestinal and systemic illnesses ranging from diarrhea and hemorrhagic colitis to potentially fatal hemolytic uremic syndrome. Specific genetic factors like ompA, nsrR, and LEE genes are known to play roles in EHEC pathogenesis. However, these factors are not specific to EHEC and their presence in several non-pathogenic strains indicates that additional factors are involved in pathogenicity. We propose a comprehensive effort to screen for such potential genetic elements, through investigation of biomolecular interactions between E. coli and their host. In this work, an in silico investigation of the protein-protein interactions (PPIs) between human cells and four EHEC strains (viz., EDL933, Sakai, EC4115, and TW14359) was performed in order to understand the virulence and host-colonization strategies of these strains. Potential host-pathogen interactions (HPIs) between human cells and the "non-pathogenic" E. coli strain MG1655 were also probed to evaluate whether and how the variations in the genomes could translate into altered virulence and host-colonization capabilities of the studied bacterial strains. Results indicate that a small subset of HPIs are unique to the studied pathogens and can be implicated in virulence. This subset of interactions involved E. coli proteins like YhdW, ChuT, EivG, and HlyA. These proteins have previously been reported to be involved in bacterial virulence. In addition, clear differences in lineage and clade-specific HPI profiles could be identified. Furthermore, available gene expression profiles of the HPI-proteins were utilized to estimate the proportion of proteins which may be involved in interactions. We hypothesized that a cumulative score of the ratios of bound:unbound proteins (involved in HPIs) would indicate the extent of colonization. Thus, we designed the Host Colonization Index (HCI) measure to determine the host colonization potential of the E. coli strains. Pathogenic strains of E. coli were observed to have higher HCIs as compared to a non-pathogenic laboratory strain. However, no significant differences among the HCIs of the two pathogenic groups were observed. Overall, our findings are expected to provide additional insights into EHEC pathogenesis and are likely to aid in designing alternate preventive and therapeutic strategies.


Subject(s)
Computer Simulation , Enterohemorrhagic Escherichia coli/metabolism , Escherichia coli Infections/microbiology , Host-Pathogen Interactions , Protein Interaction Maps/physiology , Animals , Cattle , Enterohemorrhagic Escherichia coli/classification , Enterohemorrhagic Escherichia coli/genetics , Enterohemorrhagic Escherichia coli/pathogenicity , Epithelial Cells , Escherichia coli/genetics , Escherichia coli O157/genetics , Escherichia coli O157/metabolism , Escherichia coli Proteins/genetics , Gene Expression Regulation, Bacterial , Genes, Bacterial , Humans , Virulence/genetics
12.
PLoS One ; 10(11): e0142102, 2015.
Article in English | MEDLINE | ID: mdl-26561344

ABSTRACT

BACKGROUND: Recent advances in sequencing technologies have resulted in an unprecedented increase in the number of metagenomes that are being sequenced world-wide. Given their volume, functional annotation of metagenomic sequence datasets requires specialized computational tools/techniques. In spite of having high accuracy, existing stand-alone functional annotation tools necessitate end-users to perform compute-intensive homology searches of metagenomic datasets against "multiple" databases prior to functional analysis. Although, web-based functional annotation servers address to some extent the problem of availability of compute resources, uploading and analyzing huge volumes of sequence data on a shared public web-service has its own set of limitations. In this study, we present COGNIZER, a comprehensive stand-alone annotation framework which enables end-users to functionally annotate sequences constituting metagenomic datasets. The COGNIZER framework provides multiple workflow options. A subset of these options employs a novel directed-search strategy which helps in reducing the overall compute requirements for end-users. The COGNIZER framework includes a cross-mapping database that enables end-users to simultaneously derive/infer KEGG, Pfam, GO, and SEED subsystem information from the COG annotations. RESULTS: Validation experiments performed with real-world metagenomes and metatranscriptomes, generated using diverse sequencing technologies, indicate that the novel directed-search strategy employed in COGNIZER helps in reducing the compute requirements without significant loss in annotation accuracy. A comparison of COGNIZER's results with pre-computed benchmark values indicate the reliability of the cross-mapping database employed in COGNIZER. CONCLUSION: The COGNIZER framework is capable of comprehensively annotating any metagenomic or metatranscriptomic dataset from varied sequencing platforms in functional terms. Multiple search options in COGNIZER provide end-users the flexibility of choosing a homology search protocol based on available compute resources. The cross-mapping database in COGNIZER is of high utility since it enables end-users to directly infer/derive KEGG, Pfam, GO, and SEED subsystem annotations from COG categorizations. Furthermore, availability of COGNIZER as a stand-alone scalable implementation is expected to make it a valuable annotation tool in the field of metagenomic research. AVAILABILITY AND IMPLEMENTATION: A Linux implementation of COGNIZER is freely available for download from the following links: http://metagenomics.atc.tcs.com/cognizer, https://metagenomics.atc.tcs.com/function/cognizer.


Subject(s)
Databases, Genetic , Metagenome , Metagenomics/methods , Algorithms , Humans , Reproducibility of Results , Sequence Analysis, DNA/methods , Software , Workflow
13.
J Biosci ; 40(3): 571-7, 2015 Sep.
Article in English | MEDLINE | ID: mdl-26333403

ABSTRACT

Given the importance of RNA secondary structures in defining their biological role, it would be convenient for researchers seeking RNA data if both sequence and structural information pertaining to RNA molecules are made available together. Current nucleotide data repositories archive only RNA sequence data. Furthermore, storage formats which can frugally represent RNA sequence as well as structure data in a single file, are currently unavailable. This article proposes a novel storage format, 'FASTR', for concomitant representation of RNA sequence and structure. The storage efficiency of the proposed FASTR format has been evaluated using RNA data from various microorganisms. Results indicate that the size of FASTR formatted files (containing both RNA sequence as well as structure information) are equivalent to that of FASTA-format files, which contain only RNA sequence information. RNA secondary structure is typically represented using a combination of a string of nucleotide characters along with the corresponding dot-bracket notation indicating structural attributes. 'FASTR' - the novel storage format proposed in the present study enables a frugal representation of both RNA sequence and structural information in the form of a single string. In spite of having a relatively smaller storage footprint, the resultant 'fastr' string(s) retain all sequence as well as secondary structural information that could be stored using a dot-bracket notation. An implementation of the 'FASTR' methodology is available for download at http://metagenomics.atc.tcs.com/compression/fastr.


Subject(s)
Databases, Nucleic Acid , Nucleic Acid Conformation , RNA/genetics , Algorithms , Base Sequence , Humans , Molecular Sequence Data , Sequence Analysis, RNA , Software
14.
Genomics ; 106(2): 116-21, 2015 Aug.
Article in English | MEDLINE | ID: mdl-25944184

ABSTRACT

UNLABELLED: Metagenomic sequencing data, obtained from host-associated microbial communities, are usually contaminated with host genome sequence fragments. Prior to performing any downstream analyses, it is necessary to identify and remove such contaminating sequence fragments. The time and memory requirements of available host-contamination detection techniques are enormous. Thus, processing of large metagenomic datasets is a challenging task. This study presents CS-SCORE--a novel algorithm that can rapidly identify host sequences contaminating metagenomic datasets. Validation results indicate that CS-SCORE is 2-6 times faster than the current state-of-the-art methods. Furthermore, the memory footprint of CS-SCORE is in the range of 2-2.5GB, which is significantly lower than other available tools. CS-SCORE achieves this efficiency by incorporating (1) a heuristic pre-filtering mechanism and (2) a directed-mapping approach that utilizes a novel sequence composition metric (cs-score). CS-SCORE is expected to be a handy 'pre-processing' utility for researchers analyzing metagenomic datasets. AVAILABILITY: For academic users, an implementation of CS-SCORE is freely available at: http://metagenomics.atc.tcs.com/cs-score (or) https://metagenomics.atc.tcs.com/preprocessing/cs-score.


Subject(s)
Algorithms , Genome, Human , Metagenomics/methods , Humans
15.
J Bioinform Comput Biol ; 13(3): 1541003, 2015 Jun.
Article in English | MEDLINE | ID: mdl-25790783

ABSTRACT

Sequence data repositories archive and disseminate fastq data in compressed format. In spite of having relatively lower compression efficiency, data repositories continue to prefer GZIP over available specialized fastq compression algorithms. Ease of deployment, high processing speed and portability are the reasons for this preference. This study presents FQC, a fastq compression method that, in addition to providing significantly higher compression gains over GZIP, incorporates features necessary for universal adoption by data repositories/end-users. This study also proposes a novel archival strategy which allows sequence repositories to simultaneously store and disseminate lossless as well as (multiple) lossy variants of fastq files, without necessitating any additional storage requirements. For academic users, Linux, Windows, and Mac implementations (both 32 and 64-bit) of FQC are freely available for download at: https://metagenomics.atc.tcs.com/compression/FQC .


Subject(s)
Data Compression/methods , Data Curation/methods , Sequence Analysis, DNA/methods , Software , Algorithms , Molecular Sequence Data
16.
PLoS One ; 9(12): e114814, 2014.
Article in English | MEDLINE | ID: mdl-25551450

ABSTRACT

MOTIVATION: Paired-end sequencing protocols, offered by next generation sequencing (NGS) platforms like Illumia, generate a pair of reads for every DNA fragment in a sample. Although this protocol has been utilized for several metagenomics studies, most taxonomic binning approaches classify each of the reads (forming a pair), independently. The present work explores some simple but effective strategies of utilizing pairing-information of Illumina short reads for improving the accuracy of taxonomic binning of metagenomic datasets. The strategies proposed can be used in conjunction with all genres of existing binning methods. RESULTS: Validation results suggest that employment of these "Binpairs" strategies can provide significant improvements in the binning outcome. The quality of the taxonomic assignments thus obtained are often comparable to those that can only be achieved with relatively longer reads obtained using other NGS platforms (such as Roche). AVAILABILITY: An implementation of the proposed strategies of utilizing pairing information is freely available for academic users at https://metagenomics.atc.tcs.com/binning/binpairs.


Subject(s)
Classification/methods , High-Throughput Nucleotide Sequencing/methods , Metagenomics , Sequence Analysis, DNA/methods , Statistics as Topic/methods , Computer Simulation , Reproducibility of Results , Sequence Alignment
17.
J Biosci ; 37(4): 785-9, 2012 Sep.
Article in English | MEDLINE | ID: mdl-22922203

ABSTRACT

Recent advances in DNA sequencing technologies have enabled the current generation of life science researchers to probe deeper into the genomic blueprint. The amount of data generated by these technologies has been increasing exponentially since the last decade. Storage, archival and dissemination of such huge data sets require efficient solutions, both from the hardware as well as software perspective. The present paper describes BIND-an algorithm specialized for compressing nucleotide sequence data. By adopting a unique 'block-length' encoding for representing binary data (as a key step), BIND achieves significant compression gains as compared to the widely used general purpose compression algorithms (gzip, bzip2 and lzma). Moreover, in contrast to implementations of existing specialized genomic compression approaches, the implementation of BIND is enabled to handle non-ATGC and lowercase characters. This makes BIND a loss-less compression approach that is suitable for practical use. More importantly, validation results of BIND (with real-world data sets) indicate reasonable speeds of compression and decompression that can be achieved with minimal processor/ memory usage. BIND is available for download at http://metagenomics.atc.tcs.com/compression/BIND. No license is required for academic or non-profit use.


Subject(s)
Algorithms , Data Compression/methods , Information Storage and Retrieval , Sequence Analysis, DNA , Base Sequence , Computing Methodologies , Software
18.
Bioinformatics ; 28(19): 2527-9, 2012 Oct 01.
Article in English | MEDLINE | ID: mdl-22833526

ABSTRACT

SUMMARY: An unprecedented quantity of genome sequence data is currently being generated using next-generation sequencing platforms. This has necessitated the development of novel bioinformatics approaches and algorithms that not only facilitate a meaningful analysis of these data but also aid in efficient compression, storage, retrieval and transmission of huge volumes of the generated data. We present a novel compression algorithm (DELIMINATE) that can rapidly compress genomic sequence data in a loss-less fashion. Validation results indicate relatively higher compression efficiency of DELIMINATE when compared with popular general purpose compression algorithms, namely, gzip, bzip2 and lzma. AVAILABILITY AND IMPLEMENTATION: Linux, Windows and Mac implementations (both 32 and 64-bit) of DELIMINATE are freely available for download at: http://metagenomics.atc.tcs.com/compression/DELIMINATE. CONTACT: sharmila@atc.tcs.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Computational Biology/methods , Data Compression/methods , Genomics/methods , Base Sequence , Sequence Analysis, DNA/methods
SELECTION OF CITATIONS
SEARCH DETAIL
...