Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
Add more filters










Publication year range
1.
J Biol Chem ; 293(30): 11687-11708, 2018 07 27.
Article in English | MEDLINE | ID: mdl-29773649

ABSTRACT

HIV-1 subtype C (HIV-1C) may duplicate longer amino acid stretches in the p6 Gag protein, leading to the creation of an additional Pro-Thr/Ser-Ala-Pro (PTAP) motif necessary for viral packaging. However, the biological significance of a duplication of the PTAP motif for HIV-1 replication and pathogenesis has not been experimentally validated. In a longitudinal study of two different clinical cohorts of select HIV-1 seropositive, drug-naive individuals from India, we found that 8 of 50 of these individuals harbored a mixed infection of viral strains discordant for the PTAP duplication. Conventional and next-generation sequencing of six primary viral quasispecies at multiple time points disclosed that in a mixed infection, the viral strains containing the PTAP duplication dominated the infection. The dominance of the double-PTAP viral strains over a genetically similar single-PTAP viral clone was confirmed in viral proliferation and pairwise competition assays. Of note, in the proximity ligation assay, double-PTAP Gag proteins exhibited a significantly enhanced interaction with the host protein tumor susceptibility gene 101 (Tsg101). Moreover, Tsg101 overexpression resulted in a biphasic effect on HIV-1C proliferation, an enhanced effect at low concentration and an inhibitory effect only at higher concentrations, unlike a uniformly inhibitory effect on subtype B strains. In summary, our results indicate that the duplication of the PTAP motif in the p6 Gag protein enhances the replication fitness of HIV-1C by engaging the Tsg101 host protein with a higher affinity. Our results have implications for HIV-1 pathogenesis, especially of HIV-1C.


Subject(s)
DNA-Binding Proteins/metabolism , Endosomal Sorting Complexes Required for Transport/metabolism , HIV Infections/metabolism , HIV Infections/virology , HIV-1/physiology , Transcription Factors/metabolism , Virus Replication , gag Gene Products, Human Immunodeficiency Virus/metabolism , Adult , Amino Acid Motifs , Cells, Cultured , DNA-Binding Proteins/genetics , Endosomal Sorting Complexes Required for Transport/genetics , Female , HIV Infections/genetics , HIV-1/chemistry , HIV-1/genetics , Host-Pathogen Interactions , Humans , Longitudinal Studies , Male , Middle Aged , Protein Interaction Maps , Transcription Factors/genetics , gag Gene Products, Human Immunodeficiency Virus/chemistry , gag Gene Products, Human Immunodeficiency Virus/genetics
2.
Clin Cancer Res ; 23(18): 5648-5656, 2017 Sep 15.
Article in English | MEDLINE | ID: mdl-28536309

ABSTRACT

Purpose: Tumor-derived cell-free DNA (cfDNA) in plasma can be used for molecular testing and provide an attractive alternative to tumor tissue. Commonly used PCR-based technologies can test for limited number of alterations at the time. Therefore, novel ultrasensitive technologies capable of testing for a broad spectrum of molecular alterations are needed to further personalized cancer therapy.Experimental Design: We developed a highly sensitive ultradeep next-generation sequencing (NGS) assay using reagents from TruSeqNano library preparation and NexteraRapid Capture target enrichment kits to generate plasma cfDNA sequencing libraries for mutational analysis in 61 cancer-related genes using common bioinformatics tools. The results were retrospectively compared with molecular testing of archival primary or metastatic tumor tissue obtained at different points of clinical care.Results: In a study of 55 patients with advanced cancer, the ultradeep NGS assay detected 82% (complete detection) to 87% (complete and partial detection) of the aberrations identified in discordantly collected corresponding archival tumor tissue. Patients with a low variant allele frequency (VAF) of mutant cfDNA survived longer than those with a high VAF did (P = 0.018). In patients undergoing systemic therapy, radiological response was positively associated with changes in cfDNA VAF (P = 0.02), and compared with unchanged/increased mutant cfDNA VAF, decreased cfDNA VAF was associated with longer time to treatment failure (TTF; P = 0.03).Conclusions: Ultradeep NGS assay has good sensitivity compared with conventional clinical mutation testing of archival specimens. A high VAF in mutant cfDNA corresponded with shorter survival. Changes in VAF of mutated cfDNA were associated with TTF. Clin Cancer Res; 23(18); 5648-56. ©2017 AACR.


Subject(s)
Biomarkers, Tumor , Circulating Tumor DNA , High-Throughput Nucleotide Sequencing , Neoplasms/diagnosis , Neoplasms/genetics , Adult , Aged , Aged, 80 and over , Female , Genetic Testing , High-Throughput Nucleotide Sequencing/methods , High-Throughput Nucleotide Sequencing/standards , Humans , Male , Middle Aged , Mutation , Neoplasms/mortality , Prognosis , Reproducibility of Results , Sensitivity and Specificity
3.
BMC Bioinformatics ; 16: 17, 2015 Jan 28.
Article in English | MEDLINE | ID: mdl-25626454

ABSTRACT

BACKGROUND: Next-generation sequencing (NGS) is rapidly becoming common practice in clinical diagnostics and cancer research. In addition to the detection of single nucleotide variants (SNVs), information on copy number variants (CNVs) is of great interest. Several algorithms exist to detect CNVs by analyzing whole genome sequencing data or data from samples enriched by hybridization-capture. PCR-enriched amplicon-sequencing data have special characteristics that have been taken into account by only one publicly available algorithm so far. RESULTS: We describe a new algorithm named quandico to detect copy number differences based on NGS data generated following PCR-enrichment. A weighted t-test statistic was applied to calculate probabilities (p-values) of copy number changes. We assessed the performance of the method using sequencing reads generated from reference DNA with known CNVs, and we were able to detect these variants with 98.6% sensitivity and 98.5% specificity which is significantly better than another recently described method for amplicon sequencing. The source code (R-package) of quandico is licensed under the GPLv3 and it is available at https://github.com/reineckef/quandico . CONCLUSION: We demonstrated that our new algorithm is suitable to call copy number changes using data from PCR-enriched samples with high sensitivity and specificity even for single copy differences.


Subject(s)
Algorithms , High-Throughput Nucleotide Sequencing/methods , Polymerase Chain Reaction/methods , Sequence Analysis, DNA/methods , Case-Control Studies , DNA Copy Number Variations , Humans , Sensitivity and Specificity
4.
BMC Genomics ; 15: 1073, 2014 Dec 05.
Article in English | MEDLINE | ID: mdl-25480444

ABSTRACT

BACKGROUND: Analysis of targeted amplicon sequencing data presents some unique challenges in comparison to the analysis of random fragment sequencing data. Whereas reads from randomly fragmented DNA have arbitrary start positions, the reads from amplicon sequencing have fixed start positions that coincide with the amplicon boundaries. As a result, any variants near the amplicon boundaries can cause misalignments of multiple reads that can ultimately lead to false-positive or false-negative variant calls. RESULTS: We show that amplicon boundaries are variant calling blind spots where the variant calls are highly inaccurate. We propose that an effective strategy to avoid these blind spots is to incorporate the primer bases in obtaining read alignments and post-processing of the alignments, thereby effectively moving these blind spots into the primer binding regions (which are not used for variant calling). Targeted sequencing data analysis pipelines can provide better variant calling accuracy when primer bases are retained and sequenced. CONCLUSIONS: Read bases beyond the variant site are necessary for analysis of amplicon sequencing data. Enzymatic primer digestion, if used in the target enrichment process, should leave at least a few primer bases to ensure that these bases are available during data analysis. The primer bases should only be removed immediately before the variant calling step to ensure that the variants can be called irrespective of where they occur within the amplicon insert region.


Subject(s)
Computational Biology/methods , High-Throughput Nucleotide Sequencing , Sequence Analysis, DNA/methods , Computer Simulation , DNA Primers , Polymerase Chain Reaction/methods , Reproducibility of Results
5.
Microbiome ; 2: 31, 2014.
Article in English | MEDLINE | ID: mdl-25228989

ABSTRACT

BACKGROUND: Sample storage conditions, extraction methods, PCR primers, and parameters are major factors that affect metagenomics analysis based on microbial 16S rRNA gene sequencing. Most published studies were limited to the comparison of only one or two types of these factors. Systematic multi-factor explorations are needed to evaluate the conditions that may impact validity of a microbiome analysis. This study was aimed to improve methodological options to facilitate the best technical approaches in the design of a microbiome study. Three readily available mock bacterial community materials and two commercial extraction techniques, Qiagen DNeasy and MO BIO PowerSoil DNA purification methods, were used to assess procedures for 16S ribosomal DNA amplification and pyrosequencing-based analysis. Primers were chosen for 16S rDNA quantitative PCR and amplification of region V3 to V1. Swabs spiked with mock bacterial community cells and clinical oropharyngeal swabs were incubated at respective temperatures of -80°C, -20°C, 4°C, and 37°C for 4 weeks, then extracted with the two methods, and subjected to pyrosequencing and taxonomic and statistical analyses to investigate microbiome profile stability. RESULTS: The bacterial compositions for the mock community DNA samples determined in this study were consistent with the projected levels and agreed with the literature. The quantitation accuracy of abundances for several genera was improved with changes made to the standard Human Microbiome Project (HMP) procedure. The data for the samples purified with DNeasy and PowerSoil methods were statistically distinct; however, both results were reproducible and in good agreement with each other. The temperature effect on storage stability was investigated by using mock community cells and showed that the microbial community profiles were altered with the increase in incubation temperature. However, this phenomenon was not detected when clinical oropharyngeal swabs were used in the experiment. CONCLUSIONS: Mock community materials originated from the HMP study are valuable controls in developing 16S metagenomics analysis procedures. Long-term exposure to a high temperature may introduce variation into analysis for oropharyngeal swabs, suggestive of storage at 4°C or lower. The observed variations due to sample storage temperature are in a similar range as the intrapersonal variability among different clinical oropharyngeal swab samples.

6.
BMC Genomics ; 15: 244, 2014 Mar 28.
Article in English | MEDLINE | ID: mdl-24678773

ABSTRACT

BACKGROUND: High-throughput sequencing is rapidly becoming common practice in clinical diagnosis and cancer research. Many algorithms have been developed for somatic single nucleotide variant (SNV) detection in matched tumor-normal DNA sequencing. Although numerous studies have compared the performance of various algorithms on exome data, there has not yet been a systematic evaluation using PCR-enriched amplicon data with a range of variant allele fractions. The recently developed gold standard variant set for the reference individual NA12878 by the NIST-led "Genome in a Bottle" Consortium (NIST-GIAB) provides a good resource to evaluate admixtures with various SNV fractions. RESULTS: Using the NIST-GIAB gold standard, we compared the performance of five popular somatic SNV calling algorithms (GATK UnifiedGenotyper followed by simple subtraction, MuTect, Strelka, SomaticSniper and VarScan2) for matched tumor-normal amplicon and exome sequencing data. CONCLUSIONS: We demonstrated that the five commonly used somatic SNV calling methods are applicable to both targeted amplicon and exome sequencing data. However, the sensitivities of these methods vary based on the allelic fraction of the mutation in the tumor sample. Our analysis can assist researchers in choosing a somatic SNV calling method suitable for their specific needs.


Subject(s)
Computational Biology/methods , Exome , High-Throughput Nucleotide Sequencing , Mutation , Software , Algorithms , Databases, Nucleic Acid , Genomics/methods , Humans , Point Mutation , ROC Curve , Sensitivity and Specificity
7.
Nucleic Acids Res ; 40(16): e127, 2012 Sep.
Article in English | MEDLINE | ID: mdl-22584625

ABSTRACT

Accurate estimation of expression levels from RNA-Seq data entails precise mapping of the sequence reads to a reference genome. Because the standard reference genome contains only one allele at any given locus, reads overlapping polymorphic loci that carry a non-reference allele are at least one mismatch away from the reference and, hence, are less likely to be mapped. This bias in read mapping leads to inaccurate estimates of allele-specific expression (ASE). To address this read-mapping bias, we propose the construction of an enhanced reference genome that includes the alternative alleles at known polymorphic loci. We show that mapping to this enhanced reference reduced the read-mapping biases, leading to more reliable estimates of ASE. Experiments on simulated data show that the proposed strategy reduced the number of loci with mapping bias by ≥ 63% when compared with a previous approach that relies on masking the polymorphic loci and by ≥ 18% when compared with the standard approach that uses an unaltered reference. When we applied our strategy to actual RNA-Seq data, we found that it mapped up to 15% more reads than the previous approaches and identified many seemingly incorrect inferences made by them.


Subject(s)
Alleles , Chromosome Mapping/methods , Gene Expression Profiling , Sequence Analysis, RNA/methods , Chromosome Mapping/standards , Genetic Loci , Genome, Human , High-Throughput Nucleotide Sequencing , Humans , Polymorphism, Single Nucleotide , Reference Standards
8.
PLoS One ; 6(3): e17469, 2011 Mar 07.
Article in English | MEDLINE | ID: mdl-21408217

ABSTRACT

BACKGROUND: The annotation of genomes from next-generation sequencing platforms needs to be rapid, high-throughput, and fully integrated and automated. Although a few Web-based annotation services have recently become available, they may not be the best solution for researchers that need to annotate a large number of genomes, possibly including proprietary data, and store them locally for further analysis. To address this need, we developed a standalone software application, the Annotation of microbial Genome Sequences (AGeS) system, which incorporates publicly available and in-house-developed bioinformatics tools and databases, many of which are parallelized for high-throughput performance. METHODOLOGY: The AGeS system supports three main capabilities. The first is the storage of input contig sequences and the resulting annotation data in a central, customized database. The second is the annotation of microbial genomes using an integrated software pipeline, which first analyzes contigs from high-throughput sequencing by locating genomic regions that code for proteins, RNA, and other genomic elements through the Do-It-Yourself Annotation (DIYA) framework. The identified protein-coding regions are then functionally annotated using the in-house-developed Pipeline for Protein Annotation (PIPA). The third capability is the visualization of annotated sequences using GBrowse. To date, we have implemented these capabilities for bacterial genomes. AGeS was evaluated by comparing its genome annotations with those provided by three other methods. Our results indicate that the software tools integrated into AGeS provide annotations that are in general agreement with those provided by the compared methods. This is demonstrated by a >94% overlap in the number of identified genes, a significant number of identical annotated features, and a >90% agreement in enzyme function predictions.


Subject(s)
Genome, Bacterial/genetics , Molecular Sequence Annotation/methods , Software , Base Sequence , Genes, Bacterial/genetics , Reproducibility of Results
9.
Article in English | MEDLINE | ID: mdl-18989047

ABSTRACT

The incomplete perfect phylogeny (IPP) problem and the incomplete perfect phylogeny haplotyping (IPPH) problem deal with constructing a phylogeny for a given set of haplotypes or genotypes with missing entries. The earlier approaches for both of these problems dealt with restricted versions of the problems, where the root is either available or can be trivially re-constructed from the data, or certain assumptions were made about the data. In this paper, we deal with the unrestricted versions of the problems, where the root of the phylogeny is neither available nor trivially recoverable from the data. Both IPP and IPPH problems have previously been proven to be NP-complete. Here, we present efficient enumerative algorithms that can handle practical instances of the problem. Empirical analysis on simulated data shows that the algorithms perform very well both in terms of speed and in terms accuracy of the recovered data.


Subject(s)
Algorithms , Biological Evolution , Chromosome Mapping/methods , Evolution, Molecular , Haplotypes/genetics , Phylogeny , Polymorphism, Single Nucleotide/genetics , Sequence Analysis, DNA/methods
10.
Bioinformatics ; 22(14): e514-22, 2006 Jul 15.
Article in English | MEDLINE | ID: mdl-16873515

ABSTRACT

MOTIVATION: We explore the problem of constructing near-perfect phylogenies on bi-allelic haplotypes, where the deviation from perfect phylogeny is entirely due to homoplasy events. We present polynomial-time algorithms for restricted versions of the problem. We show that these algorithms can be extended to genotype data, in which case the problem is called the near-perfect phylogeny haplotyping (NPPH) problem. We present a near-optimal algorithm for the H1-NPPH problem, which is to determine if a given set of genotypes admit a phylogeny with a single homoplasy event. The time-complexity of our algorithm for the H1-NPPH problem is O(m2(n + m)), where n is the number of genotypes and m is the number of SNP sites. This is a significant improvement over the earlier O(n4) algorithm. We also introduce generalized versions of the problem. The H(1, q)-NPPH problem is to determine if a given set of genotypes admit a phylogeny with q homoplasy events, so that all the homoplasy events occur in a single site. We present an O(m(q+1)(n + m)) algorithm for the H(1,q)-NPPH problem. RESULTS: We present results on simulated data, which demonstrate that the accuracy of our algorithm for the H1-NPPH problem is comparable to that of the existing methods, while being orders of magnitude faster. AVAILABILITY: The implementation of our algorithm for the H1-NPPH problem is available upon request.


Subject(s)
Biological Evolution , Chromosome Mapping/methods , DNA Mutational Analysis/methods , Linkage Disequilibrium/genetics , Polymorphism, Single Nucleotide/genetics , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Algorithms , Base Sequence , Genome, Human/genetics , Haplotypes/genetics , Humans , Molecular Sequence Data , Phylogeny
11.
Article in English | MEDLINE | ID: mdl-16452805

ABSTRACT

Codon optimization enhances the efficiency of DNA expression vectors used in DNA vaccination and gene therapy by increasing protein expression. Additionally, certain nucleotide motifs have experimentally been shown to be immuno-stimulatory while certain others immuno-suppressive. In this paper, we present algorithms to locate a given set of immuno-modulatory motifs in the DNA expression vectors corresponding to a given amino acid sequence and maximize or minimize the number and the context of the immuno-modulatory motifs in the DNA expression vectors. The main contribution is to use multiple pattern matching algorithms to synthesize a DNA sequence for a given amino acid sequence and a graph theoretic approach for finding the longest weighted path in a directed graph that will maximize or minimize certain motifs. This is achieved using O(n(2)) time, where n is the length of the amino acid sequence. Based on this, we develop a software tool.


Subject(s)
Algorithms , Codon/genetics , CpG Islands/genetics , Genetic Engineering/methods , Genetic Vectors/genetics , Pattern Recognition, Automated/methods , Sequence Analysis, DNA/methods , Amino Acid Motifs , Artificial Intelligence , Gene Expression/genetics , Software , Vaccines, DNA/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...