Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
1.
Curr Opin Chem Biol ; 14(3): 325-30, 2010 Jun.
Article in English | MEDLINE | ID: mdl-20457001

ABSTRACT

As high-throughput screening matures as a discipline, cheminformatics is playing an increasingly important role in selecting new compounds for diverse screening libraries. New visualization techniques such as multi-fusion similarity maps, scaffold trees, and principal moments of inertia plots provide complementary information on compound libraries and enable identification of unexplored regions of chemical space with potential biological relevance. Quantitative metrics have been developed to analyze libraries for properties such as natural product-likeness and shape complexity. Analysis of high-throughput screening results and drug discovery programs identify compounds problematic for screening. Taken together these approaches allow us to increase the diversity of biological outcomes available in compound screening libraries and improve the success rates of high-throughput screening against new targets without making significant increases in the size of compound libraries.


Subject(s)
Computational Biology/methods , Drug Discovery/methods , High-Throughput Screening Assays/methods , Small Molecule Libraries
2.
Genetics ; 181(2): 767-81, 2009 Feb.
Article in English | MEDLINE | ID: mdl-19015548

ABSTRACT

We report the discovery and validation of a set of single nucleotide polymorphisms (SNPs) between the reference Neurospora crassa strain Oak Ridge and the Mauriceville strain (FGSC 2555), of sufficient density to allow fine mapping of most loci. Sequencing of Mauriceville cDNAs and alignment to the completed genomic sequence of the Oak Ridge strain identified 19,087 putative SNPs. Of these, a subset was validated by cleaved amplified polymorphic sequence (CAPS), a simple and robust PCR-based assay that reliably distinguishes between SNP alleles. Experimental confirmation resulted in the development of 250 CAPS markers distributed evenly over the genome. To demonstrate the applicability of this map, we used bulked segregant analysis followed by interval mapping to locate the csp-1 mutation to a narrow region on LGI. Subsequently, we refined mapping resolution to 74 kbp by developing additional markers, resequenced the candidate gene, NCU02713.3, in the mutant background, and phenocopied the mutation by gene replacement in the WT strain. Together, these techniques demonstrate a generally applicable and straightforward approach for the isolation of novel genes from existing mutants. Data on both putative and validated SNPs are deposited in a customized public database at the Broad Institute, which encourages augmentation by community users.


Subject(s)
Neurospora crassa/genetics , Polymorphism, Single Nucleotide , Chromosome Mapping , DNA, Fungal/genetics , Databases, Nucleic Acid , Expressed Sequence Tags , Genes, Fungal , Genetic Markers , Mutation , Neurospora crassa/classification , Polymerase Chain Reaction , Recombination, Genetic , Species Specificity
3.
Science ; 317(5843): 1400-2, 2007 Sep 07.
Article in English | MEDLINE | ID: mdl-17823352

ABSTRACT

We sequenced and annotated the genome of the filamentous fungus Fusarium graminearum, a major pathogen of cultivated cereals. Very few repetitive sequences were detected, and the process of repeat-induced point mutation, in which duplicated sequences are subject to extensive mutation, may partially account for the reduced repeat content and apparent low number of paralogous (ancestrally duplicated) genes. A second strain of F. graminearum contained more than 10,000 single-nucleotide polymorphisms, which were frequently located near telomeres and within other discrete chromosomal segments. Many highly polymorphic regions contained sets of genes implicated in plant-fungus interactions and were unusually divergent, with higher rates of recombination. These regions of genome innovation may result from selection due to interactions of F. graminearum with its plant hosts.


Subject(s)
Fusarium/genetics , Genome, Fungal , Polymorphism, Genetic , DNA, Fungal , Evolution, Molecular , Fusarium/physiology , Hordeum/microbiology , Molecular Sequence Data , Plant Diseases/microbiology , Point Mutation , Polymorphism, Single Nucleotide , Sequence Analysis, DNA
4.
Genome Res ; 17(9): 1389-98, 2007 Sep.
Article in English | MEDLINE | ID: mdl-17690204

ABSTRACT

We present Conrad, the first comparative gene predictor based on semi-Markov conditional random fields (SMCRFs). Unlike the best standalone gene predictors, which are based on generalized hidden Markov models (GHMMs) and trained by maximum likelihood, Conrad is discriminatively trained to maximize annotation accuracy. In addition, unlike the best annotation pipelines, which rely on heuristic and ad hoc decision rules to combine standalone gene predictors with additional information such as ESTs and protein homology, Conrad encodes all sources of information as features and treats all features equally in the training and inference algorithms. Conrad outperforms the best standalone gene predictors in cross-validation and whole chromosome testing on two fungi with vastly different gene structures. The performance improvement arises from the SMCRF's discriminative training methods and their ability to easily incorporate diverse types of information by encoding them as feature functions. On Cryptococcus neoformans, configuring Conrad to reproduce the predictions of a two-species phylo-GHMM closely matches the performance of Twinscan. Enabling discriminative training increases performance, and adding new feature functions further increases performance, achieving a level of accuracy that is unprecedented for this organism. Similar results are obtained on Aspergillus nidulans comparing Conrad versus Fgenesh. SMCRFs are a promising framework for gene prediction because of their highly modular nature, simplifying the process of designing and testing potential indicators of gene structure. Conrad's implementation of SMCRFs advances the state of the art in gene prediction in fungi and provides a robust platform for both current application and future research.


Subject(s)
Algorithms , Aspergillus nidulans/genetics , Cryptococcus neoformans/genetics , Genes, Fungal , Software , Artificial Intelligence , Chromosomes, Fungal , Discriminant Analysis , Likelihood Functions , Markov Chains , Reference Standards
5.
Science ; 316(5832): 1718-23, 2007 Jun 22.
Article in English | MEDLINE | ID: mdl-17510324

ABSTRACT

We present a draft sequence of the genome of Aedes aegypti, the primary vector for yellow fever and dengue fever, which at approximately 1376 million base pairs is about 5 times the size of the genome of the malaria vector Anopheles gambiae. Nearly 50% of the Ae. aegypti genome consists of transposable elements. These contribute to a factor of approximately 4 to 6 increase in average gene length and in sizes of intergenic regions relative to An. gambiae and Drosophila melanogaster. Nonetheless, chromosomal synteny is generally maintained among all three insects, although conservation of orthologous gene order is higher (by a factor of approximately 2) between the mosquito species than between either of them and the fruit fly. An increase in genes encoding odorant binding, cytochrome P450, and cuticle domains relative to An. gambiae suggests that members of these protein families underpin some of the biological differences between the two mosquito species.


Subject(s)
Aedes/genetics , Genome, Insect , Insect Vectors/genetics , Aedes/metabolism , Animals , Anopheles/genetics , Anopheles/metabolism , Arboviruses , Base Sequence , DNA Transposable Elements , Dengue/prevention & control , Dengue/transmission , Drosophila melanogaster/genetics , Female , Genes, Insect , Humans , Insect Proteins/genetics , Insect Vectors/metabolism , Male , Membrane Transport Proteins/genetics , Molecular Sequence Data , Multigene Family , Protein Structure, Tertiary/genetics , Sequence Analysis, DNA , Sex Characteristics , Sex Determination Processes , Species Specificity , Synteny , Transcription, Genetic , Yellow Fever/prevention & control , Yellow Fever/transmission
6.
Nat Genet ; 39(1): 113-9, 2007 Jan.
Article in English | MEDLINE | ID: mdl-17159979

ABSTRACT

Genetic variation allows the malaria parasite Plasmodium falciparum to overcome chemotherapeutic agents, vaccines and vector control strategies and remain a leading cause of global morbidity and mortality. Here we describe an initial survey of genetic variation across the P. falciparum genome. We performed extensive sequencing of 16 geographically diverse parasites and identified 46,937 SNPs, demonstrating rich diversity among P. falciparum parasites (pi = 1.16 x 10(-3)) and strong correlation with gene function. We identified multiple regions with signatures of selective sweeps in drug-resistant parasites, including a previously unidentified 160-kb region with extremely low polymorphism in pyrimethamine-resistant parasites. We further characterized 54 worldwide isolates by genotyping SNPs across 20 genomic regions. These data begin to define population structure among African, Asian and American groups and illustrate the degree of linkage disequilibrium, which extends over relatively short distances in African parasites but over longer distances in Asian parasites. We provide an initial map of genetic diversity in P. falciparum and demonstrate its potential utility in identifying genes subject to recent natural selection and in understanding the population genetics of this parasite.


Subject(s)
Chromosome Mapping/methods , Genetic Variation , Genome, Protozoan , Plasmodium falciparum/genetics , Africa , Animals , Asia , Central America , Genotype , Humans , Phylogeny , Polymorphism, Single Nucleotide , South America
7.
Bioinformatics ; 22(14): 1782-3, 2006 Jul 15.
Article in English | MEDLINE | ID: mdl-16709588

ABSTRACT

SUMMARY: Combo is a comparative genome browser that provides a dynamic view of whole genome alignments along with their associated annotations. Combo provides two different visualization perspectives. The perpendicular (dot plot) view provides a dot plot of genome alignments synchronized with a display of genome annotations along each axis. The parallel view displays two genome annotations horizontally, synchronized through a panel displaying local alignments as trapezoids. Users can zoom to any resolution, from whole chromosomes to individual bases. They can select, highlight and view detailed information from specific alignments and annotations. Combo is an organism agnostic and can import data from a variety of file formats. AVAILABILITY: Combo is integrated as part of the Argo Genome Browser which also provides single-genome browsing and editing capabilities. Argo is written in Java, runs on multiple platforms and is freely available for download at http://www.broad.mit.edu/annotation/argo/.


Subject(s)
Algorithms , Chromosome Mapping/methods , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Software , User-Computer Interface , Base Sequence , Computer Graphics , Database Management Systems , Databases, Genetic , Information Storage and Retrieval/methods , Molecular Sequence Data
8.
Nature ; 440(7087): 1045-9, 2006 Apr 20.
Article in English | MEDLINE | ID: mdl-16625196

ABSTRACT

Chromosome 17 is unusual among the human chromosomes in many respects. It is the largest human autosome with orthology to only a single mouse chromosome, mapping entirely to the distal half of mouse chromosome 11. Chromosome 17 is rich in protein-coding genes, having the second highest gene density in the genome. It is also enriched in segmental duplications, ranking third in density among the autosomes. Here we report a finished sequence for human chromosome 17, as well as a structural comparison with the finished sequence for mouse chromosome 11, the first finished mouse chromosome. Comparison of the orthologous regions reveals striking differences. In contrast to the typical pattern seen in mammalian evolution, the human sequence has undergone extensive intrachromosomal rearrangement, whereas the mouse sequence has been remarkably stable. Moreover, although the human sequence has a high density of segmental duplication, the mouse sequence has a very low density. Notably, these segmental duplications correspond closely to the sites of structural rearrangement, demonstrating a link between duplication and rearrangement. Examination of the main classes of duplicated segments provides insight into the dynamics underlying expansion of chromosome-specific, low-copy repeats in the human genome.


Subject(s)
Chromosomes, Human, Pair 17/genetics , Evolution, Molecular , Animals , Base Composition , Gene Duplication , Humans , Long Interspersed Nucleotide Elements/genetics , Mice , Sequence Analysis, DNA , Short Interspersed Nucleotide Elements/genetics , Synteny/genetics
9.
Nature ; 440(7084): 671-5, 2006 Mar 30.
Article in English | MEDLINE | ID: mdl-16572171

ABSTRACT

Here we present a finished sequence of human chromosome 15, together with a high-quality gene catalogue. As chromosome 15 is one of seven human chromosomes with a high rate of segmental duplication, we have carried out a detailed analysis of the duplication structure of the chromosome. Segmental duplications in chromosome 15 are largely clustered in two regions, on proximal and distal 15q; the proximal region is notable because recombination among the segmental duplications can result in deletions causing Prader-Willi and Angelman syndromes. Sequence analysis shows that the proximal and distal regions of 15q share extensive ancient similarity. Using a simple approach, we have been able to reconstruct many of the events by which the current duplication structure arose. We find that most of the intrachromosomal duplications seem to share a common ancestry. Finally, we demonstrate that some remaining gaps in the genome sequence are probably due to structural polymorphisms between haplotypes; this may explain a significant fraction of the gaps remaining in the human genome.


Subject(s)
Chromosomes, Human, Pair 15/genetics , Evolution, Molecular , Gene Duplication , Animals , Conserved Sequence/genetics , Genes , Genome, Human , Haplotypes/genetics , Humans , Macaca mulatta/genetics , Molecular Sequence Data , Multigene Family/genetics , Phylogeny , Polymorphism, Genetic/genetics , Sequence Analysis, DNA , Synteny/genetics
10.
Nature ; 439(7074): 331-5, 2006 Jan 19.
Article in English | MEDLINE | ID: mdl-16421571

ABSTRACT

The International Human Genome Sequencing Consortium (IHGSC) recently completed a sequence of the human genome. As part of this project, we have focused on chromosome 8. Although some chromosomes exhibit extreme characteristics in terms of length, gene content, repeat content and fraction segmentally duplicated, chromosome 8 is distinctly typical in character, being very close to the genome median in each of these aspects. This work describes a finished sequence and gene catalogue for the chromosome, which represents just over 5% of the euchromatic human genome. A unique feature of the chromosome is a vast region of approximately 15 megabases on distal 8p that appears to have a strikingly high mutation rate, which has accelerated in the hominids relative to other sequenced mammals. This fast-evolving region contains a number of genes related to innate immunity and the nervous system, including loci that appear to be under positive selection--these include the major defensin (DEF) gene cluster and MCPH1, a gene that may have contributed to the evolution of expanded brain size in the great apes. The data from chromosome 8 should allow a better understanding of both normal and disease biology and genome evolution.


Subject(s)
Chromosomes, Human, Pair 8/genetics , Evolution, Molecular , Animals , Contig Mapping , DNA, Satellite/genetics , Defensins/genetics , Euchromatin/genetics , Female , Humans , Immunity, Innate/genetics , Male , Molecular Sequence Data , Multigene Family/genetics , Sequence Analysis, DNA
11.
Nature ; 438(7069): 803-19, 2005 Dec 08.
Article in English | MEDLINE | ID: mdl-16341006

ABSTRACT

Here we report a high-quality draft genome sequence of the domestic dog (Canis familiaris), together with a dense map of single nucleotide polymorphisms (SNPs) across breeds. The dog is of particular interest because it provides important evolutionary information and because existing breeds show great phenotypic diversity for morphological, physiological and behavioural traits. We use sequence comparison with the primate and rodent lineages to shed light on the structure and evolution of genomes and genes. Notably, the majority of the most highly conserved non-coding sequences in mammalian genomes are clustered near a small subset of genes with important roles in development. Analysis of SNPs reveals long-range haplotypes across the entire dog genome, and defines the nature of genetic diversity within and across breeds. The current SNP map now makes it possible for genome-wide association studies to identify genes responsible for diseases and traits, with important consequences for human and companion animal health.


Subject(s)
Dogs/genetics , Evolution, Molecular , Genome/genetics , Genomics , Haplotypes/genetics , Animals , Conserved Sequence/genetics , Dog Diseases/genetics , Dogs/classification , Female , Humans , Hybridization, Genetic , Male , Mice , Mutagenesis/genetics , Polymorphism, Single Nucleotide/genetics , Rats , Short Interspersed Nucleotide Elements/genetics , Synteny/genetics
12.
Nature ; 437(7058): 551-5, 2005 Sep 22.
Article in English | MEDLINE | ID: mdl-16177791

ABSTRACT

Chromosome 18 appears to have the lowest gene density of any human chromosome and is one of only three chromosomes for which trisomic individuals survive to term. There are also a number of genetic disorders stemming from chromosome 18 trisomy and aneuploidy. Here we report the finished sequence and gene annotation of human chromosome 18, which will allow a better understanding of the normal and disease biology of this chromosome. Despite the low density of protein-coding genes on chromosome 18, we find that the proportion of non-protein-coding sequences evolutionarily conserved among mammals is close to the genome-wide average. Extending this analysis to the entire human genome, we find that the density of conserved non-protein-coding sequences is largely uncorrelated with gene density. This has important implications for the nature and roles of non-protein-coding sequence elements.


Subject(s)
Chromosomes, Human, Pair 18/genetics , DNA/genetics , Aneuploidy , Animals , Conserved Sequence/genetics , CpG Islands/genetics , Exons/genetics , Expressed Sequence Tags , Genes/genetics , Genome, Human , Humans , Introns/genetics , Molecular Sequence Data , Sequence Analysis, DNA , Synteny
13.
Genome Res ; 14(8): 1447-61, 2004 Aug.
Article in English | MEDLINE | ID: mdl-15289470

ABSTRACT

Although often considered "minimal" organisms, mycoplasmas show a wide range of diversity with respect to host environment, phenotypic traits, and pathogenicity. Here we report the complete genomic sequence and proteogenomic map for the piscine mycoplasma Mycoplasma mobile, noted for its robust gliding motility. For the first time, proteomic data are used in the primary annotation of a new genome, providing validation of expression for many of the predicted proteins. Several novel features were discovered including a long repeating unit of DNA of approximately 2435 bp present in five complete copies that are shown to code for nearly identical yet uniquely expressed proteins. M. mobile has among the lowest DNA GC contents (24.9%) and most reduced set of tRNAs of any organism yet reported (28). Numerous instances of tandem duplication as well as lateral gene transfer are evident in the genome. The multiple available complete genome sequences for other motile and immotile mycoplasmas enabled us to use comparative genomic and phylogenetic methods to suggest several candidate genes that might be involved in motility. The results of these analyses leave open the possibility that gliding motility might have arisen independently more than once in the mycoplasma lineage.


Subject(s)
Genome, Bacterial , Mycoplasma/genetics , Proteome/genetics , Amino Acid Sequence , Computational Biology , Molecular Sequence Data , Phylogeny , Physical Chromosome Mapping
SELECTION OF CITATIONS
SEARCH DETAIL
...