Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
1.
Nat Methods ; 17(11): 1103-1110, 2020 11.
Article in English | MEDLINE | ID: mdl-33020656

ABSTRACT

Long-read sequencing technologies have substantially improved the assemblies of many isolate bacterial genomes as compared to fragmented short-read assemblies. However, assembling complex metagenomic datasets remains difficult even for state-of-the-art long-read assemblers. Here we present metaFlye, which addresses important long-read metagenomic assembly challenges, such as uneven bacterial composition and intra-species heterogeneity. First, we benchmarked metaFlye using simulated and mock bacterial communities and show that it consistently produces assemblies with better completeness and contiguity than state-of-the-art long-read assemblers. Second, we performed long-read sequencing of the sheep microbiome and applied metaFlye to reconstruct 63 complete or nearly complete bacterial genomes within single contigs. Finally, we show that long-read assembly of human microbiomes enables the discovery of full-length biosynthetic gene clusters that encode biomedically important natural products.


Subject(s)
Genome, Bacterial/genetics , Genome, Human/genetics , Metagenome/genetics , Metagenomics/methods , Microbiota/genetics , Algorithms , Animals , Benchmarking , Gastrointestinal Microbiome/genetics , Humans , Sequence Analysis, DNA/methods , Sheep , Software , Species Specificity
2.
Nat Biotechnol ; 37(5): 540-546, 2019 05.
Article in English | MEDLINE | ID: mdl-30936562

ABSTRACT

Accurate genome assembly is hampered by repetitive regions. Although long single molecule sequencing reads are better able to resolve genomic repeats than short-read data, most long-read assembly algorithms do not provide the repeat characterization necessary for producing optimal assemblies. Here, we present Flye, a long-read assembly algorithm that generates arbitrary paths in an unknown repeat graph, called disjointigs, and constructs an accurate repeat graph from these error-riddled disjointigs. We benchmark Flye against five state-of-the-art assemblers and show that it generates better or comparable assemblies, while being an order of magnitude faster. Flye nearly doubled the contiguity of the human genome assembly (as measured by the NGA50 assembly quality metric) compared with existing assemblers.


Subject(s)
Genome, Bacterial/genetics , Genome, Human/genetics , Genomics/methods , Repetitive Sequences, Nucleic Acid/genetics , Algorithms , High-Throughput Nucleotide Sequencing , Humans , Molecular Sequence Annotation , Sequence Analysis, DNA , Software
3.
Proc Natl Acad Sci U S A ; 113(52): E8396-E8405, 2016 12 27.
Article in English | MEDLINE | ID: mdl-27956617

ABSTRACT

The recent breakthroughs in assembling long error-prone reads were based on the overlap-layout-consensus (OLC) approach and did not utilize the strengths of the alternative de Bruijn graph approach to genome assembly. Moreover, these studies often assume that applications of the de Bruijn graph approach are limited to short and accurate reads and that the OLC approach is the only practical paradigm for assembling long error-prone reads. We show how to generalize de Bruijn graphs for assembling long error-prone reads and describe the ABruijn assembler, which combines the de Bruijn graph and the OLC approaches and results in accurate genome reconstructions.


Subject(s)
High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods , Algorithms , Benchmarking , Escherichia coli/genetics , Genomics , Reproducibility of Results , Software , Xanthomonas/genetics
4.
Am J Hum Genet ; 98(4): 667-79, 2016 Apr 07.
Article in English | MEDLINE | ID: mdl-27018473

ABSTRACT

Genetic studies of autism spectrum disorder (ASD) have established that de novo duplications and deletions contribute to risk. However, ascertainment of structural variants (SVs) has been restricted by the coarse resolution of current approaches. By applying a custom pipeline for SV discovery, genotyping, and de novo assembly to genome sequencing of 235 subjects (71 affected individuals, 26 healthy siblings, and their parents), we compiled an atlas of 29,719 SV loci (5,213/genome), comprising 11 different classes. We found a high diversity of de novo mutations, the majority of which were undetectable by previous methods. In addition, we observed complex mutation clusters where combinations of de novo SVs, nucleotide substitutions, and indels occurred as a single event. We estimate a high rate of structural mutation in humans (20%) and propose that genetic risk for ASD is attributable to an elevated frequency of gene-disrupting de novo SVs, but not an elevated rate of genome rearrangement.


Subject(s)
Autism Spectrum Disorder/genetics , Gene Deletion , Gene Duplication , Alleles , Amino Acid Sequence , Base Sequence , Case-Control Studies , Child , DNA Copy Number Variations , Female , Gene Frequency , Gene Rearrangement , Genetic Loci , Genome, Human , Genotyping Techniques , Humans , INDEL Mutation , Male , Microarray Analysis , Molecular Sequence Data , Pedigree , Reproducibility of Results , Sensitivity and Specificity
5.
G3 (Bethesda) ; 6(4): 939-55, 2016 04 07.
Article in English | MEDLINE | ID: mdl-26921293

ABSTRACT

Researchers in evolutionary genetics recently have recognized an exciting opportunity in decomposing beneficial mutations into their proximal, mechanistic determinants. The application of methods and concepts from molecular biology and life history theory to studies of lytic bacteriophages (phages) has allowed them to understand how natural selection sees mutations influencing life history. This work motivated the research presented here, in which we explored whether, under consistent experimental conditions, small differences in the genome of bacteriophage φX174 could lead to altered life history phenotypes among a panel of eight genetically distinct clones. We assessed the clones' phenotypes by applying a novel statistical framework to the results of a serially sampled parallel infection assay, in which we simultaneously inoculated each of a large number of replicate host volumes with ∼1 phage particle. We sequentially plated the volumes over the course of infection and counted the plaques that formed after incubation. These counts served as a proxy for the number of phage particles in a single volume as a function of time. From repeated assays, we inferred significant, genetically determined heterogeneity in lysis time and burst size, including lysis time variance. These findings are interesting in light of the genetic and phenotypic constraints on the single-protein lysis mechanism of φX174. We speculate briefly on the mechanisms underlying our results, and we discuss the potential importance of lysis time variance in viral evolution.


Subject(s)
Bacteriolysis/genetics , Bacteriophage phi X 174/physiology , Genetic Variation , Selection, Genetic , Algorithms , Gene Order , Genome, Viral , Models, Biological , Mutation
6.
Physiol Genomics ; 45(1): 47-57, 2013 Jan 07.
Article in English | MEDLINE | ID: mdl-23170035

ABSTRACT

11ß-Hydroxysteroid dehydrogenase type 1 (11ß-HSD1) is implicated in the etiology of metabolic syndrome. We previously showed that pharmacological inhibition of 11ß-HSD1 ameliorated multiple facets of metabolic syndrome and attenuated atherosclerosis in ApoE-/- mice. However, the molecular mechanism underlying the atheroprotective effect was not clear. In this study, we tested whether and how 11ß-HSD1 inhibition affects vascular inflammation, a major culprit for atherosclerosis and its associated complications. ApoE-/- mice were treated with an 11ß-HSD1 inhibitor for various periods of time. Plasma lipids and aortic cholesterol accumulation were quantified. Several microarray studies were carried out to examine the effect of 11ß-HSD1 inhibition on gene expression in atherosclerotic tissues. Our data suggest 11ß-HSD1 inhibition can directly modulate atherosclerotic plaques and attenuate atherosclerosis independently of lipid lowering effects. We identified immune response genes as the category of mRNA most significantly suppressed by 11ß-HSD1 inhibition. This anti-inflammatory effect was further confirmed in plaque macrophages and smooth muscle cells procured by laser capture microdissection. These findings in the vascular wall were corroborated by reduction in circulating MCP1 levels after 11ß-HSD1 inhibition. Taken together, our data suggest 11ß-HSD1 inhibition regulates proinflammatory gene expression in atherosclerotic tissues of ApoE-/- mice, and this effect may contribute to the attenuation of atherosclerosis in these animals.


Subject(s)
11-beta-Hydroxysteroid Dehydrogenase Type 1/antagonists & inhibitors , Atherosclerosis/drug therapy , Enzyme Inhibitors/pharmacology , Gene Expression Regulation/drug effects , Vasculitis/drug therapy , 11-beta-Hydroxysteroid Dehydrogenase Type 1/metabolism , Animals , Apolipoproteins E/genetics , Atherosclerosis/etiology , Cholesterol/metabolism , Gene Expression Profiling , Genes, MHC Class II/genetics , Glucocorticoids/metabolism , Laser Capture Microdissection , Lipids/blood , Mice , Mice, Knockout , Microarray Analysis , Vasculitis/complications
7.
Mol Syst Biol ; 8: 594, 2012 Jul 17.
Article in English | MEDLINE | ID: mdl-22806142

ABSTRACT

Common inflammatome gene signatures as well as disease-specific signatures were identified by analyzing 12 expression profiling data sets derived from 9 different tissues isolated from 11 rodent inflammatory disease models. The inflammatome signature significantly overlaps with known drug targets and co-expressed gene modules linked to metabolic disorders and cancer. A large proportion of genes in this signature are tightly connected in tissue-specific Bayesian networks (BNs) built from multiple independent mouse and human cohorts. Both the inflammatome signature and the corresponding consensus BNs are highly enriched for immune response-related genes supported as causal for adiposity, adipokine, diabetes, aortic lesion, bone, muscle, and cholesterol traits, suggesting the causal nature of the inflammatome for a variety of diseases. Integration of this inflammatome signature with the BNs uncovered 151 key drivers that appeared to be more biologically important than the non-drivers in terms of their impact on disease phenotypes. The identification of this inflammatome signature, its network architecture, and key drivers not only highlights the shared etiology but also pinpoints potential targets for intervention of various common diseases.


Subject(s)
Gene Expression Profiling , Inflammasomes/genetics , Intracellular Signaling Peptides and Proteins/genetics , Intracellular Signaling Peptides and Proteins/immunology , Age Factors , Analysis of Variance , Animals , Bayes Theorem , Caspases/genetics , Caspases/immunology , Chemokines/genetics , Chemokines/immunology , Cohort Studies , Computational Biology/methods , Disease Models, Animal , Female , Gene Regulatory Networks/immunology , Humans , Interleukins/genetics , Interleukins/metabolism , Male , Mice , Mice, Inbred BALB C , Mice, Inbred C57BL , Mice, Knockout , Rats , Rats, Sprague-Dawley , Sex Factors
8.
PLoS One ; 7(6): e26284, 2012.
Article in English | MEDLINE | ID: mdl-22719818

ABSTRACT

We tested the hypothesis that Crohn's disease (CD)-related genetic polymorphisms involved in host innate immunity are associated with shifts in human ileum-associated microbial composition in a cross-sectional analysis of human ileal samples. Sanger sequencing of the bacterial 16S ribosomal RNA (rRNA) gene and 454 sequencing of 16S rRNA gene hypervariable regions (V1-V3 and V3-V5), were conducted on macroscopically disease-unaffected ileal biopsies collected from 52 ileal CD, 58 ulcerative colitis and 60 control patients without inflammatory bowel diseases (IBD) undergoing initial surgical resection. These subjects also were genotyped for the three major NOD2 risk alleles (Leu1007fs, R708W, G908R) and the ATG16L1 risk allele (T300A). The samples were linked to clinical metadata, including body mass index, smoking status and Clostridia difficile infection. The sequences were classified into seven phyla/subphyla categories using the Naïve Bayesian Classifier of the Ribosome Database Project. Centered log ratio transformation of six predominant categories was included as the dependent variable in the permutation based MANCOVA for the overall composition with stepwise variable selection. Polymerase chain reaction (PCR) assays were conducted to measure the relative frequencies of the Clostridium coccoides - Eubacterium rectales group and the Faecalibacterium prausnitzii spp. Empiric logit transformations of the relative frequencies of these two microbial groups were included in permutation-based ANCOVA. Regardless of sequencing method, IBD phenotype, Clostridia difficile and NOD2 genotype were selected as associated (FDR ≤ 0.05) with shifts in overall microbial composition. IBD phenotype and NOD2 genotype were also selected as associated with shifts in the relative frequency of the C. coccoides--E. rectales group. IBD phenotype, smoking and IBD medications were selected as associated with shifts in the relative frequency of F. prausnitzii spp. These results indicate that the effects of genetic and environmental factors on IBD are mediated at least in part by the enteric microbiota.


Subject(s)
Clostridioides difficile/isolation & purification , Ileum/microbiology , Inflammatory Bowel Diseases/microbiology , Nod2 Signaling Adaptor Protein/genetics , Genotype , Humans , Phenotype , Polymerase Chain Reaction , Polymorphism, Single Nucleotide , RNA, Ribosomal, 16S/genetics
9.
Circ Cardiovasc Genet ; 4(6): 595-604, 2011 Dec.
Article in English | MEDLINE | ID: mdl-22010137

ABSTRACT

BACKGROUND: Atherosclerosis is a complex disease requiring improvements in diagnostic techniques and therapeutic treatments. Both improvements will be facilitated by greater exploration of the biology of atherosclerotic plaque. To this end, we carried out large-scale gene expression analysis of human atherosclerotic lesions. METHODS AND RESULTS: Whole genome expression analysis of 101 plaques from patients with peripheral artery disease identified a robust gene signature (1514 genes) that is dominated by processes related to Toll-like receptor signaling, T-cell activation, cholesterol efflux, oxidative stress response, inflammatory cytokine production, vasoconstriction, and lysosomal activity. Further analysis of gene expression in microdissected carotid plaque samples revealed that this signature is differentially expressed in macrophage-rich and smooth muscle cell-containing regions. A quantitative PCR gene expression panel and inflammatory composite score were developed on the basis of the atherosclerotic plaque gene signature. When applied to serial sections of carotid plaque, the inflammatory composite score was observed to correlate with histological and morphological features related to plaque vulnerability. CONCLUSIONS: The robust mRNA expression signature identified in the present report is associated with pathological features of vulnerable atherosclerotic plaque and may be useful as a source of biomarkers and targets of novel antiatherosclerotic therapies.


Subject(s)
Gene Expression Profiling , Plaque, Atherosclerotic/genetics , Plaque, Atherosclerotic/immunology , Biomarkers , Female , Humans , Macrophages/immunology , Male , Molecular Sequence Data , Proteins/genetics , Proteins/immunology
10.
Clin Breast Cancer ; 10(6): 440-4, 2010 Dec 01.
Article in English | MEDLINE | ID: mdl-21147686

ABSTRACT

PURPOSE: Cyclophosphamide/methotrexate/fluorouracil (CMF) is a proven adjuvant option for patients with early-stage breast cancer. Randomized trials with other regimens demonstrate that dose-dense (DD) scheduling can offer greater efficacy. We investigated the feasibility of administering CMF using a DD schedule. PATIENTS AND METHODS: Thirty-eight patients with early-stage breast cancer were accrued from March 2008 through June 2008. They were treated every 14 days with C 600, M 40, F 600 (all mg/m2) with PEG-filgrastim (Neulasta®) support on day 2 of each cycle. The primary endpoint was tolerability using a Simon's 2-stage optimal design. The design would effectively discriminate between true tolerability (as protocol-defined) rates of ≤ 60% and ≥ 80%. RESULTS: The median age was 52-years-old (range, 38-78 years of age). Twenty-nine of the 38 patients completed 8 cycles of CMF at 14-day intervals. CONCLUSION: Dose-dense adjuvant CMF is tolerable and feasible at 14-day intervals with PEG-filgrastim support.


Subject(s)
Adenocarcinoma/drug therapy , Antineoplastic Combined Chemotherapy Protocols/therapeutic use , Breast Neoplasms/drug therapy , Adenocarcinoma/pathology , Adenocarcinoma/surgery , Adult , Aged , Antineoplastic Agents/administration & dosage , Antineoplastic Combined Chemotherapy Protocols/administration & dosage , Antineoplastic Combined Chemotherapy Protocols/adverse effects , Breast Neoplasms/pathology , Breast Neoplasms/surgery , Chemotherapy, Adjuvant , Cyclophosphamide/administration & dosage , Dose-Response Relationship, Drug , Drug Administration Schedule , Feasibility Studies , Female , Fluorouracil/administration & dosage , Humans , Methotrexate/administration & dosage , Middle Aged , Neoplasm Staging , Pilot Projects , Treatment Outcome
11.
Bioinformatics ; 20(9): 1416-27, 2004 Jun 12.
Article in English | MEDLINE | ID: mdl-14976033

ABSTRACT

MOTIVATION: Many bioinformatic approaches exist for finding novel genes within genomic sequence data. Traditionally, homology search-based methods are often the first approach employed in determining whether a novel gene exists that is similar to a known gene. Unfortunately, distantly related genes or motifs often are difficult to find using single query-based homology search algorithms against large sequence datasets such as the human genome. Therefore, the motivation behind this work was to develop an approach to enhance the sensitivity of traditional single query-based homology algorithms against genomic data without losing search selectivity. RESULTS: We demonstrate that by searching against a genome fragmented into all possible reading frames, the sensitivity of homology-based searches is enhanced without degrading its selectivity. Using the ETS-domain, bromodomain and acetyl-CoA acetyltransferase gene as queries, we were able to demonstrate that direct protein-protein searches using BLAST2P or FASTA3 against a human genome segmented among all possible reading frames and translated was substantially more sensitive than traditional protein-DNA searches against a raw genomic sequence using an application such as TBLAST2N. Receiver operating characteristic analysis was employed to demonstrate that the algorithms remained selective, while comparisons of the algorithms showed that the protein-protein searches were more sensitive in identifying hits. Therefore, through the overprediction of reading frames by this method and the increased sensitivity of protein-protein based homology search algorithms, a genome can be deeply mined, potentially finding hits overlooked by protein-DNA searches against raw genomic data.


Subject(s)
Algorithms , Chromosome Mapping/methods , Open Reading Frames , Proteins/analysis , Proteins/chemistry , Sequence Alignment/methods , Sequence Analysis, Protein/methods , Amino Acid Sequence , Molecular Sequence Data , Reproducibility of Results , Sensitivity and Specificity , Sequence Homology, Amino Acid
12.
J Biol Chem ; 277(18): 15913-22, 2002 May 03.
Article in English | MEDLINE | ID: mdl-11834729

ABSTRACT

The trisubstituted pyrrole 4-[2-(4-fluorophenyl)-5-(1-methylpiperidine-4-yl)-1H-pyrrol-3-yl]pyridine (Compound 1) inhibits the growth of Eimeria spp. both in vitro and in vivo. The molecular target of Compound 1 was identified as cGMP-dependent protein kinase (PKG) using a tritiated analogue to purify a approximately 120-kDa protein from lysates of Eimeria tenella. This represents the first example of a protozoal PKG. Cloning of PKG from several Apicomplexan parasites has identified a parasite signature sequence of nearly 300 amino acids that is not found in mammalian or Drosophila PKG and which contains an additional, third cGMP-binding site. Nucleotide cofactor regulation of parasite PKG is remarkably different from mammalian enzymes. The activity of both native and recombinant E. tenella PKG is stimulated 1000-fold by cGMP, with significant cooperativity. Two isoforms of the parasite enzyme are expressed from a single copy gene. NH(2)-terminal sequence of the soluble isoform of PKG is consistent with alternative translation initiation within the open reading frame of the enzyme. A larger, membrane-associated isoform corresponds to the deduced full-length protein sequence. Compound 1 is a potent inhibitor of both soluble and membrane-associated isoforms of native PKG, as well as recombinant enzyme, with an IC(50) of <1 nm.


Subject(s)
Apicomplexa/metabolism , Cyclic GMP-Dependent Protein Kinases/metabolism , Eimeria tenella/enzymology , Amino Acid Sequence , Animals , Apicomplexa/classification , Apicomplexa/genetics , Binding Sites , Chickens/parasitology , Cloning, Molecular , Cyclic GMP-Dependent Protein Kinase Type I , Cyclic GMP-Dependent Protein Kinases/genetics , Cyclic GMP-Dependent Protein Kinases/isolation & purification , DNA, Complementary/genetics , DNA, Protozoan/genetics , Humans , Ligands , Mammals , Molecular Sequence Data , Peptide Chain Initiation, Translational , Protozoan Proteins/genetics , Protozoan Proteins/isolation & purification , Protozoan Proteins/metabolism , Recombinant Proteins/isolation & purification , Recombinant Proteins/metabolism , Sequence Alignment , Sequence Homology, Amino Acid , Species Specificity
13.
J Biol Chem ; 277(3): 2000-5, 2002 Jan 18.
Article in English | MEDLINE | ID: mdl-11714703

ABSTRACT

Histamine has been shown to play a role in arthropod vision; it is the major neurotransmitter of arthropod photoreceptors. Histamine-gated chloride channels have been identified in insect optic lobes. We report the first isolation of cDNA clones encoding histamine-gated chloride channel subunits from the fruit fly Drosophila melanogaster. The encoded proteins, HisCl1 and HisCl2, share 60% amino acid identity with each other. The closest structural homologue is the human glycine alpha3 receptor, which shares 45 and 43% amino acid identity respectively. Northern hybridization analysis suggested that hisCl1 and hisCl2 mRNAs are predominantly expressed in the insect eye. Oocytes injected with in vitro transcribed RNA, encoding either HisCl1 or HisCl2, produced substantial chloride currents in response to histamine but not in response to GABA, glycine, and glutamate. The histamine sensitivity was similar to that observed in insect laminar neurons. Histamine-activated currents were not blocked by picrotoxinin, fipronil, strychnine, or the H2 antagonist cimetidine. Co-injection of both hisCl1 and hisCl2 RNAs resulted in expression of a histamine-gated chloride channel with increased sensitivity to histamine, demonstrating coassembly of the subunits. The insecticide ivermectin reversibly activated homomeric HisCl1 channels and, more potently, HisCl1 and HisCl2 heteromeric channels.


Subject(s)
Chloride Channels/physiology , Eye/metabolism , Histamine/physiology , Ion Channel Gating/physiology , Amino Acid Sequence , Animals , Base Sequence , Chloride Channels/chemistry , Chloride Channels/genetics , DNA Primers , Drosophila melanogaster , Molecular Sequence Data , Phylogeny , Sequence Homology, Amino Acid
SELECTION OF CITATIONS
SEARCH DETAIL
...