Search | VHL Regional Portal

1.

The landscape of extreme genomic variation in the highly adaptable Atlantic killifish.

Reid, Noah M; Jackson, Craig E; Gilbert, Don; Minx, Patrick; Montague, Michael J; Hampton, Thomas H; Helfrich, Lily W; King, Benjamin L; Nacci, Diane E; Aluru, Neel; Karchner, Sibel I; Colbourne, John K; Hahn, Mark E; Shaw, Joseph R; Oleksiak, Marjorie F; Crawford, Douglas L; Warren, Wesley C; Whitehead, Andrew.

Genome Biol Evol ; 9(3): 659-676, 2017 03.

Article in English | MEDLINE | ID: mdl-28201664

2.

Genomic insights into the Ixodes scapularis tick vector of Lyme disease.

Gulia-Nuss, Monika; Nuss, Andrew B; Meyer, Jason M; Sonenshine, Daniel E; Roe, R Michael; Waterhouse, Robert M; Sattelle, David B; de la Fuente, José; Ribeiro, Jose M; Megy, Karine; Thimmapuram, Jyothi; Miller, Jason R; Walenz, Brian P; Koren, Sergey; Hostetler, Jessica B; Thiagarajan, Mathangi; Joardar, Vinita S; Hannick, Linda I; Bidwell, Shelby; Hammond, Martin P; Young, Sarah; Zeng, Qiandong; Abrudan, Jenica L; Almeida, Francisca C; Ayllón, Nieves; Bhide, Ketaki; Bissinger, Brooke W; Bonzon-Kulichenko, Elena; Buckingham, Steven D; Caffrey, Daniel R; Caimano, Melissa J; Croset, Vincent; Driscoll, Timothy; Gilbert, Don; Gillespie, Joseph J; Giraldo-Calderón, Gloria I; Grabowski, Jeffrey M; Jiang, David; Khalil, Sayed M S; Kim, Donghun; Kocan, Katherine M; Koci, Juraj; Kuhn, Richard J; Kurtti, Timothy J; Lees, Kristin; Lang, Emma G; Kennedy, Ryan C; Kwon, Hyeogsun; Perera, Rushika; Qi, Yumin.

Nat Commun ; 7: 10507, 2016 Feb 09.

Article in English | MEDLINE | ID: mdl-26856261

ABSTRACT

Ticks transmit more pathogens to humans and animals than any other arthropod. We describe the 2.1 Gbp nuclear genome of the tick, Ixodes scapularis (Say), which vectors pathogens that cause Lyme disease, human granulocytic anaplasmosis, babesiosis and other diseases. The large genome reflects accumulation of repetitive DNA, new lineages of retro-transposons, and gene architecture patterns resembling ancient metazoans rather than pancrustaceans. Annotation of scaffolds representing â¼57% of the genome, reveals 20,486 protein-coding genes and expansions of gene families associated with tick-host interactions. We report insights from genome analyses into parasitic processes unique to ticks, including host 'questing', prolonged feeding, cuticle synthesis, blood meal concentration, novel methods of haemoglobin digestion, haem detoxification, vitellogenesis and prolonged off-host survival. We identify proteins associated with the agent of human granulocytic anaplasmosis, an emerging disease, and the encephalitis-causing Langat virus, and a population structure correlated to life-history traits and transmission of the Lyme disease agent.

Subject(s)

Anaplasma phagocytophilum , Arachnid Vectors/genetics , Genome/genetics , Ixodes/genetics , Ligand-Gated Ion Channels/genetics , Animals , Gene Expression Profiling , Genomics , Lyme Disease/transmission , Oocytes , Xenopus laevis

3.

Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies.

Neale, David B; Wegrzyn, Jill L; Stevens, Kristian A; Zimin, Aleksey V; Puiu, Daniela; Crepeau, Marc W; Cardeno, Charis; Koriabine, Maxim; Holtz-Morris, Ann E; Liechty, John D; Martínez-García, Pedro J; Vasquez-Gross, Hans A; Lin, Brian Y; Zieve, Jacob J; Dougherty, William M; Fuentes-Soriano, Sara; Wu, Le-Shin; Gilbert, Don; Marçais, Guillaume; Roberts, Michael; Holt, Carson; Yandell, Mark; Davis, John M; Smith, Katherine E; Dean, Jeffrey F D; Lorenz, W Walter; Whetten, Ross W; Sederoff, Ronald; Wheeler, Nicholas; McGuire, Patrick E; Main, Doreen; Loopstra, Carol A; Mockaitis, Keithanne; deJong, Pieter J; Yorke, James A; Salzberg, Steven L; Langley, Charles H.

Genome Biol ; 15(3): R59, 2014 Mar 04.

Article in English | MEDLINE | ID: mdl-24647006

ABSTRACT

BACKGROUND: The size and complexity of conifer genomes has, until now, prevented full genome sequencing and assembly. The large research community and economic importance of loblolly pine, Pinus taeda L., made it an early candidate for reference sequence determination. RESULTS: We develop a novel strategy to sequence the genome of loblolly pine that combines unique aspects of pine reproductive biology and genome assembly methodology. We use a whole genome shotgun approach relying primarily on next generation sequence generated from a single haploid seed megagametophyte from a loblolly pine tree, 20-1010, that has been used in industrial forest tree breeding. The resulting sequence and assembly was used to generate a draft genome spanning 23.2 Gbp and containing 20.1 Gbp with an N50 scaffold size of 66.9 kbp, making it a significant improvement over available conifer genomes. The long scaffold lengths allow the annotation of 50,172 gene models with intron lengths averaging over 2.7 kbp and sometimes exceeding 100 kbp in length. Analysis of orthologous gene sets identifies gene families that may be unique to conifers. We further characterize and expand the existing repeat library based on the de novo analysis of the repetitive content, estimated to encompass 82% of the genome. CONCLUSIONS: In addition to its value as a resource for researchers and breeders, the loblolly pine genome sequence and assembly reported here demonstrates a novel approach to sequencing the large and complex genomes of this important group of plants that can now be widely applied.

Subject(s)

Contig Mapping/methods , Genome, Plant , Pinus taeda/genetics , Sequence Analysis, DNA/methods , DNA, Plant/genetics , Haploidy

4.

The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color.

Motamayor, Juan C; Mockaitis, Keithanne; Schmutz, Jeremy; Haiminen, Niina; Livingstone, Donald; Cornejo, Omar; Findley, Seth D; Zheng, Ping; Utro, Filippo; Royaert, Stefan; Saski, Christopher; Jenkins, Jerry; Podicheti, Ram; Zhao, Meixia; Scheffler, Brian E; Stack, Joseph C; Feltus, Frank A; Mustiga, Guiliana M; Amores, Freddy; Phillips, Wilbert; Marelli, Jean Philippe; May, Gregory D; Shapiro, Howard; Ma, Jianxin; Bustamante, Carlos D; Schnell, Raymond J; Main, Dorrie; Gilbert, Don; Parida, Laxmi; Kuhn, David N.

Genome Biol ; 14(6): r53, 2013 Jun 03.

Article in English | MEDLINE | ID: mdl-23731509

ABSTRACT

BACKGROUND: Theobroma cacao L. cultivar Matina 1-6 belongs to the most cultivated cacao type. The availability of its genome sequence and methods for identifying genes responsible for important cacao traits will aid cacao researchers and breeders. RESULTS: We describe the sequencing and assembly of the genome of Theobroma cacao L. cultivar Matina 1-6. The genome of the Matina 1-6 cultivar is 445 Mbp, which is significantly larger than a sequenced Criollo cultivar, and more typical of other cultivars. The chromosome-scale assembly, version 1.1, contains 711 scaffolds covering 346.0 Mbp, with a contig N50 of 84.4 kbp, a scaffold N50 of 34.4 Mbp, and an evidence-based gene set of 29,408 loci. Version 1.1 has 10x the scaffold N50 and 4x the contig N50 as Criollo, and includes 111 Mb more anchored sequence. The version 1.1 assembly has 4.4% gap sequence, while Criollo has 10.9%. Through a combination of haplotype, association mapping and gene expression analyses, we leverage this robust reference genome to identify a promising candidate gene responsible for pod color variation. We demonstrate that green/red pod color in cacao is likely regulated by the R2R3 MYB transcription factor TcMYB113, homologs of which determine pigmentation in Rosaceae, Solanaceae, and Brassicaceae. One SNP within the target site for a highly conserved trans-acting siRNA in dicots, found within TcMYB113, seems to affect transcript levels of this gene and therefore pod color variation. CONCLUSIONS: We report a high-quality sequence and annotation of Theobroma cacao L. and demonstrate its utility in identifying candidate genes regulating traits.

Subject(s)

Fruit/genetics , Gene Expression Regulation, Plant , Genes, Plant , Genome, Plant , Quantitative Trait, Heritable , Cacao/genetics , Cacao/metabolism , Chromosome Mapping , Chromosomes, Plant , Color , Fruit/metabolism , Genome Size , High-Throughput Nucleotide Sequencing , Quantitative Trait Loci , RNA, Small Interfering/genetics , RNA, Small Interfering/metabolism , Transcription Factors/genetics , Transcription Factors/metabolism , Transcription, Genetic

5.

SMT vs. TOFT.

Gilbert, Don A.

Bioessays ; 33(7): 555, 2011 Jul.

Article in English | MEDLINE | ID: mdl-21633965

Subject(s)

Neoplasms/etiology , Neoplasms/metabolism , Animals , Humans , Models, Biological , Neoplasms/genetics

6.

Predator-induced defences in Daphnia pulex: selection and evaluation of internal reference genes for gene expression studies with real-time PCR.

Spanier, Katina I; Leese, Florian; Mayer, Christoph; Colbourne, John K; Gilbert, Don; Pfrender, Michael E; Tollrian, Ralph.

BMC Mol Biol ; 11: 50, 2010 Jun 29.

Article in English | MEDLINE | ID: mdl-20587017

ABSTRACT

BACKGROUND: The planktonic microcrustacean Daphnia pulex is among the best-studied animals in ecological, toxicological and evolutionary research. One aspect that has sustained interest in the study system is the ability of D. pulex to develop inducible defence structures when exposed to predators, such as the phantom midge larvae Chaoborus. The available draft genome sequence for D. pulex is accelerating research to identify genes that confer plastic phenotypes that are regularly cued by environmental stimuli. Yet for quantifying gene expression levels, no experimentally validated set of internal control genes exists for the accurate normalization of qRT-PCR data. RESULTS: In this study, we tested six candidate reference genes for normalizing transcription levels of D. pulex genes; alpha tubulin (aTub), glyceraldehyde-3-phosphate dehydrogenase (GAPDH), TATA box binding protein (Tbp) syntaxin 16 (Stx16), X-box binding protein 1 (Xbp1) and CAPON, a protein associated with the neuronal nitric oxide synthase, were selected on the basis of an earlier study and from microarray studies. One additional gene, a matrix metalloproteinase (MMP), was tested to validate its transcriptional response to Chaoborus, which was earlier observed in a microarray study. The transcription profiles of these seven genes were assessed by qRT-PCR from RNA of juvenile D. pulex that showed induced defences in comparison to untreated control animals. We tested the individual suitability of genes for expression normalization using the programs geNorm, NormFinder and BestKeeper. Intriguingly, Xbp1, Tbp, CAPON and Stx16 were selected as ideal reference genes. Analyses on the relative expression level using the software REST showed that both classical housekeeping candidate genes (aTub and GAPDH) were significantly downregulated, whereas the MMP gene was shown to be significantly upregulated, as predicted. aTub is a particularly ill suited reference gene because five copies are found in the D. pulex genome sequence. When applying aTub for expression normalization Xbp1 and Tbp are falsely reported as significantly upregulated. CONCLUSIONS: Our results suggest that the genes Xbp1, Tbp, CAPON and Stx16 are suitable reference genes for accurate normalization in qRT-PCR studies using Chaoborus-induced D. pulex specimens. Furthermore, our study underscores the importance of verifying the expression stability of putative reference genes for normalization of expression levels.

Subject(s)

Daphnia , Gene Expression Profiling/standards , Gene Expression Regulation , Gene Expression , Polymerase Chain Reaction , Predatory Behavior/physiology , Animals , Daphnia/genetics , Daphnia/physiology , Diptera/physiology , Female , Gene Expression Profiling/methods , Genes, Insect , Insect Proteins/genetics , Matrix Metalloproteinases/genetics , Polymerase Chain Reaction/methods , Polymerase Chain Reaction/standards , RNA/genetics , Reference Standards , Tubulin/genetics

7.

Modeling TNT ignition.

Hobbs, Michael L; Kaneshige, Michael J; Gilbert, Don W; Marley, Stephen K; Todd, Steven N.

J Phys Chem A ; 113(39): 10474-87, 2009 Oct 01.

Article in English | MEDLINE | ID: mdl-19736950

ABSTRACT

A 2,4,6-trinitrotoluene (TNT) ignition model was developed using data from multiple sources. The one-step, first-order, pressure-dependent mechanism was used to predict ignition behavior from small- and large-scale experiments involving significant fluid motion. Bubbles created from decomposition gases were shown to cause vigorous boiling. The forced mixing caused by these bubbles was not modeled adequately using only free liquid convection. Thorough mixing and ample contact of the reactive species indicated that the TNT decomposition products were in equilibrium. The effect of impurities on the reaction rate was the primary uncertainty in the decomposition model.

8.

Comments on sequence normalization of tiling array expression.

Gilbert, Don; Rechtsteiner, Andreas.

Bioinformatics ; 25(17): 2171-3, 2009 Sep 01.

Article in English | MEDLINE | ID: mdl-19578171

ABSTRACT

MOTIVATION: Methods to improve tiling array expression signals are needed to accurately detect genome features. Royce et al. provide statistical normalizations of tile signal based on probe sequence content that promises improved accuracy, and should be independently verified. RESULTS: Assessment of the sequence content normalization methods identified a problem: confounding of probe sequence content with gene structure (intron/exon) sequence content. Normalization obscured tile signal changes at gene structure boundaries. This and other evidence suggests that simple sequence normalization does not improve detection of genes from tile expression data.

Subject(s)

Oligonucleotide Array Sequence Analysis/methods , Sequence Analysis, DNA/methods , Animals , Base Composition/genetics , Base Sequence , Daphnia/genetics , Drosophila melanogaster/genetics , Exons/genetics , Introns/genetics

9.

A machine-learning approach to combined evidence validation of genome assemblies.

Choi, Jeong-Hyeon; Kim, Sun; Tang, Haixu; Andrews, Justen; Gilbert, Don G; Colbourne, John K.

Bioinformatics ; 24(6): 744-50, 2008 Mar 15.

Article in English | MEDLINE | ID: mdl-18204064

ABSTRACT

MOTIVATION: While it is common to refer to 'the genome sequence' as if it were a single, complete and contiguous DNA string, it is in fact an assembly of millions of small, partially overlapping DNA fragments. Sophisticated computer algorithms (assemblers and scaffolders) merge these DNA fragments into contigs, and place these contigs into sequence scaffolds using the paired-end sequences derived from large-insert DNA libraries. Each step in this automated process is susceptible to producing errors; hence, the resulting draft assembly represents (in practice) only a likely assembly that requires further validation. Knowing which parts of the draft assembly are likely free of errors is critical if researchers are to draw reliable conclusions from the assembled sequence data. RESULTS: We develop a machine-learning method to detect assembly errors in sequence assemblies. Several in silico measures for assembly validation have been proposed by various researchers. Using three benchmarking Drosophila draft genomes, we evaluate these techniques along with some new measures that we propose, including the good-minus-bad coverage (GMB), the good-to-bad-ratio (RGB), the average Z-score (AZ) and the average absolute Z-score (ASZ). Our results show that the GMB measure performs better than the others in both its sensitivity and its specificity for assembly error detection. Nevertheless, no single method performs sufficiently well to reliably detect genomic regions requiring attention for further experimental verification. To utilize the advantages of all these measures, we develop a novel machine learning approach that combines these individual measures to achieve a higher prediction accuracy (i.e. greater than 90%). Our combined evidence approach avoids the difficult and often ad hoc selection of many parameters the individual measures require, and significantly improves the overall precisions on the benchmarking data sets.

Subject(s)

Algorithms , Artificial Intelligence , Contig Mapping/methods , Drosophila/genetics , Pattern Recognition, Automated/methods , Sequence Analysis, DNA/methods , Animals , Base Sequence , Molecular Sequence Data , Reproducibility of Results , Sensitivity and Specificity

10.

Biomolecular interaction network database.

Gilbert, Don.

Brief Bioinform ; 6(2): 194-8, 2005 Jun.

Article in English | MEDLINE | ID: mdl-15975228

ABSTRACT

This software review looks at the utility of the Biomolecular Interaction Network Database (BIND) as a web database. BIND offers methods common to related biology databases and specialisations for its protein interaction data. Searching and browsing this database is easy and well integrated with the underlying data and the needs of scientists. Interaction networks are visualised with software that offers many useful options. The innovative ontoglyphs are used throughout to provide visual cues to protein functions, localisation and other aspects one needs to know for this data set. One can expect to get useful results that may be well integrated with one's research needs.

Subject(s)

Database Management Systems , Databases, Protein , Information Storage and Retrieval/methods , Protein Interaction Mapping/methods , Proteins/chemistry , Proteins/metabolism , Software , User-Computer Interface , Binding Sites , Protein Binding , Proteins/classification , Proteins/genetics

11.

wFleaBase: the Daphnia genome database.

Colbourne, John K; Singan, Vasanth R; Gilbert, Don G.

BMC Bioinformatics ; 6: 45, 2005 Mar 07.

Article in English | MEDLINE | ID: mdl-15752432

ABSTRACT

BACKGROUND: wFleaBase is a database with the necessary infrastructure to curate, archive and share genetic, molecular and functional genomic data and protocols for an emerging model organism, the microcrustacean Daphnia. Commonly known as the water-flea, Daphnia's ecological merit is unequaled among metazoans, largely because of its sentinel role within freshwater ecosystems and over 200 years of biological investigations. By consequence, the Daphnia Genomics Consortium (DGC) has launched an interdisciplinary research program to create the resources needed to study genes that affect ecological and evolutionary success in natural environments. DISCUSSION: These tools include the genome database wFleaBase, which currently contains functions to search and extract information from expressed sequenced tags, genome survey sequences and full genome sequencing projects. This new database is built primarily from core components of the Generic Model Organism Database project, and related bioinformatics tools. SUMMARY: Over the coming year, preliminary genetic maps and the nearly complete genomic sequence of Daphnia pulex will be integrated into wFleaBase, including gene predictions and ortholog assignments based on sequence similarities with eukaryote genes of known function. wFleaBase aims to serve a large ecological and evolutionary research community. Our challenge is to rapidly expand its content and to ultimately integrate genetic and functional genomic information with population-level responses to environmental challenges. URL: http://wfleabase.org/.

Subject(s)

Computational Biology/methods , Daphnia/genetics , Databases, Genetic , Genome , Animals , Chromosome Mapping , Computer Graphics , Database Management Systems , Databases, Factual , Ecology , Ecosystem , Evolution, Molecular , Genetics, Population , Genomics , Information Services , Information Storage and Retrieval , Internet , Software

12.

Bioinformatics software resources.

Gilbert, Don.

Brief Bioinform ; 5(3): 300-4, 2004 Sep.

Article in English | MEDLINE | ID: mdl-15383216

ABSTRACT

This review looks at internet archives, repositories and lists for obtaining popular and useful biology and bioinformatics software. Resources include collections of free software, services for the collaborative development of new programs, software news media and catalogues of links to bioinformatics software and web tools. Problems with such resources arise from needs for continued curator effort to collect and update these, combined with less than optimal community support, funding and collaboration. Despite some problems, the available software repositories provide needed public access to many tools that are a foundation for analyses in bioscience research efforts.

Subject(s)

Algorithms , Archives , Computational Biology/methods , Databases, Factual , Internet , Software

13.

Bio-Mirror project for public bio-data distribution.

Gilbert, Don; Ugawa, Yoshihiro; Buchhorn, Markus; Wee, Tan Tin; Mizushima, Akira; Kim, Hyunchul; Chon, Kilnam; Weon, Seyeon; Ma, Juncai; Ichiyanagi, Yoshihiro; Liou, Der-Ming; Keretho, Somnuk; Napis, Suhaimi.

Bioinformatics ; 20(17): 3238-40, 2004 Nov 22.

Article in English | MEDLINE | ID: mdl-15059839

ABSTRACT

Timely worldwide distribution of biosequence and bioinformatics data depends on high performance networking and advances in Internet transport methods. The Bio-Mirror project focuses on providing up-to-date distribution of this rapidly growing and changing data. It offers FTP, Web and Rsync access to many high-volume databanks from several sites around the world. Experiments with data grids and other methods offer future improvements in biology data distribution.

Subject(s)

Computational Biology/methods , Database Management Systems , Databases, Genetic , Documentation/methods , Information Dissemination/methods , Information Storage and Retrieval/methods , Internet , Biology/methods , Computer Communication Networks , Systems Integration

14.

Shopping in the genome market with EnsMart.

Gilbert, Don.

Brief Bioinform ; 4(3): 292-6, 2003 Sep.

Article in English | MEDLINE | ID: mdl-14582522

ABSTRACT

Life scientists who work with the supermarket of genome data will find the EnsMart database and software package offers a valuable door to a wealth of genes and genome features. Not only available to lab biologists on the web, this popular multi-organism genome database can be installed and used on your own Unix computer with relative ease. It offers a flexible, fast and practical data-mining framework for computer-savvy biologists and bioinformaticians.

Subject(s)

Databases, Nucleic Acid , Genome , Software , Animals , Genomics , Humans , Information Storage and Retrieval/methods

15.

Protein family alignment annotation.

Gilbert, Don.

Brief Bioinform ; 4(2): 192-6, 2003 Jun.

Article in English | MEDLINE | ID: mdl-12846399

ABSTRACT

For bioscientists studying protein structure and function, the Protein Family Alignment Annotation Tool (Pfaat) is a useful and simple program for annotating collections of proteins. This open-source software includes methods for viewing and aligning protein families, and for annotating sequence structure and residues with known functions. It offers new options to aid the study of proteins, and an extensible annotation tool for bioinformatics developers.

Subject(s)

Proteins/classification , Sequence Analysis, Protein/methods , Software , Amino Acid Sequence , Databases, Protein , Molecular Sequence Data , Phylogeny , Protein Structure, Tertiary , Proteins/genetics , Sequence Alignment

16.

Sequence file format conversion with command-line readseq.

Gilbert, Don.

Curr Protoc Bioinformatics ; Appendix 1: Appendix 1E, 2003 Feb.

Article in English | MEDLINE | ID: mdl-18428689

ABSTRACT

One of the major challenges in using bioinformatics software is that there are a wide variety of sequence formats, e.g., GenBank, EMBL, and FASTA. It is often the case that a sequence or a set of sequences is in one format but is needed in another. This unit offers a solution to this problem--Readseq. Readseq is a program that can read and write 18 different formats.

Subject(s)

Database Management Systems , Databases, Genetic , Information Storage and Retrieval/methods , Programming Languages , Sequence Analysis/methods , User-Computer Interface , Internet

17.

Pise: software for building bioinformatics webs.

Gilbert, Don.

Brief Bioinform ; 3(4): 405-9, 2002 Dec.

Article in English | MEDLINE | ID: mdl-12511068

ABSTRACT

Pise is interface construction software for bioinformatics applications that run by command-line operations. It creates common, easy-to-use interfaces to these applications for the Web, or other uses. It is adaptable to new bioinformatics tools, and offers program chaining, Unix system batch and other controls, making it an attractive method for building and using your own bioinformatics web services.

Subject(s)

Computational Biology , Computer Communication Networks , Software , Humans , Programming Languages , Software Design , Systems Integration , User-Computer Interface

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL