Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 24
Filter
Add more filters










Publication year range
1.
Nucleic Acids Res ; 51(D1): D445-D451, 2023 01 06.
Article in English | MEDLINE | ID: mdl-36350662

ABSTRACT

OrthoDB provides evolutionary and functional annotations of genes in a diverse sampling of eukaryotes, prokaryotes, and viruses. Genomics continues to accelerate our exploration of gene diversity and orthology is the most precise way of bridging gene functional knowledge with the rapidly expanding universe of genomic sequences. OrthoDB samples the most diverse organisms with the best quality genomics data to provide the leading coverage of species diversity. This update of the underlying data to over 18 000 prokaryotes and almost 2000 eukaryotes with over 100 million genes propels the coverage to another level. This achievement also demonstrates the scalability of the underlying OrthoLoger software for delineation of orthologs, freely available from https://orthologer.ezlab.org. In addition to the ab-initio computations of gene orthology used for the OrthoDB release, the OrthoLoger software allows mapping of novel gene sets to precomputed orthologs and thereby links to their annotations. The LEMMI-style benchmarking of OrthoLoger ensures its state-of-the-art performance and is available from https://lemortho.ezlab.org. The OrthoDB web interface has been further developed to include a pairwise orthology view from any gene to any other sampled species. OrthoDB-computed evolutionary annotations as well as extensively collated functional annotations can be accessed via REST API or SPARQL/RDF, downloaded or browsed online from https://www.orthodb.org.


Subject(s)
Databases, Genetic , Evolution, Molecular , Eukaryota/genetics , Genomics , Biological Evolution , Software , Molecular Sequence Annotation
2.
Nucleic Acids Res ; 49(D1): D389-D393, 2021 01 08.
Article in English | MEDLINE | ID: mdl-33196836

ABSTRACT

OrthoDB provides evolutionary and functional annotations of orthologs, inferred for a vast number of available organisms. OrthoDB is leading in the coverage and genomic diversity sampling of Eukaryotes, Prokaryotes and Viruses, and the sampling of Bacteria is further set to increase three-fold. The user interface has been enhanced in response to the massive growth in data. OrthoDB provides three views on the data: (i) a list of orthologous groups related to a user query, which are now arranged to visualize their hierarchical relations, (ii) a detailed view of an orthologous group, now featuring a Sankey diagram to facilitate navigation between the levels of orthology, from more finely-resolved to more general groups of orthologs, as well as an arrangement of orthologs into an interactive organism taxonomy structure, and (iii) we added a gene-centric view, showing the gene functional annotations and the pair-wise orthologs in example species. The OrthoDB standalone software for delineation of orthologs, Orthologer, is freely available. Online BUSCO assessments and mapping to OrthoDB of user-uploaded data enable interactive exploration of related annotations and generation of comparative charts. OrthoDB strives to predict orthologs from the broadest coverage of species, as well as to extensively collate available functional annotations, and to compute evolutionary annotations such as evolutionary rate and phyletic profile. OrthoDB data can be assessed via SPARQL RDF, REST API, downloaded or browsed online from https://orthodb.org.


Subject(s)
Databases, Genetic , Evolution, Molecular , Molecular Sequence Annotation , Sequence Homology, Nucleic Acid , Animals , Software , User-Computer Interface
3.
Nucleic Acids Res ; 47(D1): D807-D811, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30395283

ABSTRACT

OrthoDB (https://www.orthodb.org) provides evolutionary and functional annotations of orthologs. This update features a major scaling up of the resource coverage, sampling the genomic diversity of 1271 eukaryotes, 6013 prokaryotes and 6488 viruses. These include putative orthologs among 448 metazoan, 117 plant, 549 fungal, 148 protist, 5609 bacterial, and 404 archaeal genomes, picking up the best sequenced and annotated representatives for each species or operational taxonomic unit. OrthoDB relies on a concept of hierarchy of levels-of-orthology to enable more finely resolved gene orthologies for more closely related species. Since orthologs are the most likely candidates to retain functions of their ancestor gene, OrthoDB is aimed at narrowing down hypotheses about gene functions and enabling comparative evolutionary studies. Optional registered-user sessions allow on-line BUSCO assessments of gene set completeness and mapping of the uploaded data to OrthoDB to enable further interactive exploration of related annotations and generation of comparative charts. The accelerating expansion of genomics data continues to add valuable information, and OrthoDB strives to provide orthologs from the broadest coverage of species, as well as to extensively collate available functional annotations and to compute evolutionary annotations. The data can be browsed online, downloaded or assessed via REST API or SPARQL RDF compatible with both UniProt and Ensembl.


Subject(s)
Databases, Genetic , Evolution, Molecular , Genomics/trends , Molecular Sequence Annotation , Animals , Eukaryota/genetics , Genetic Variation , Genome, Bacterial/genetics , Genome, Fungal/genetics , Genome, Plant/genetics , Genome, Viral/genetics , Phylogeny , Software
4.
Environ Microbiol ; 20(6): 2288-2300, 2018 06.
Article in English | MEDLINE | ID: mdl-30014616

ABSTRACT

Antibiotic resistance is increasing among pathogens, and the human microbiome contains a reservoir of antibiotic resistance genes. Acidaminococcus intestini is the first Negativicute bacterium (Gram-negative Firmicute) shown to be resistant to beta-lactam antibiotics. Resistance is conferred by the aci1 gene, but its evolutionary history and prevalence remain obscure. We discovered that ACI-1 proteins are phylogenetically distinct from beta-lactamases of Gram-positive Firmicutes and that aci1 occurs in bacteria scattered across the Negativicute clade, suggesting lateral gene transfer. In the reference A. intestini RyC-MR95 genome, we found transposons residing within a tailed prophage context are likely vehicles for aci1's mobility. We found aci1 in 56 (4.4%) of 1,267 human gut metagenomes, mostly hosted within A. intestini, and, where could be determined, mostly within a consistent mobile element constellation. These samples are from Europe, China and the USA, showing that aci1 is distributed globally. We found that for most Negativicute assemblies with aci1, the prophage observed in A. instestini is absent, but in all cases aci1 is flanked by varying transposons. The chimeric mobile elements we identify here likely have a complex evolutionary history and potentially provide multiple complementary mechanisms for antibiotic resistance gene transfer both within and between cells.


Subject(s)
Bacteria/metabolism , Drug Resistance, Bacterial/genetics , Gastrointestinal Microbiome , Prophages/genetics , beta-Lactamases/metabolism , Anti-Bacterial Agents/pharmacology , Bacteria/classification , Bacteria/drug effects , Bacteria/genetics , China , Europe , Firmicutes/genetics , Gene Transfer, Horizontal , Humans , Metagenome , Phylogeny , United States , beta-Lactamases/genetics
5.
Mol Biol Evol ; 35(3): 543-548, 2018 Mar 01.
Article in English | MEDLINE | ID: mdl-29220515

ABSTRACT

Genomics promises comprehensive surveying of genomes and metagenomes, but rapidly changing technologies and expanding data volumes make evaluation of completeness a challenging task. Technical sequencing quality metrics can be complemented by quantifying completeness of genomic data sets in terms of the expected gene content of Benchmarking Universal Single-Copy Orthologs (BUSCO, http://busco.ezlab.org). The latest software release implements a complete refactoring of the code to make it more flexible and extendable to facilitate high-throughput assessments. The original six lineage assessment data sets have been updated with improved species sampling, 34 new subsets have been built for vertebrates, arthropods, fungi, and prokaryotes that greatly enhance resolution, and data sets are now also available for nematodes, protists, and plants. Here, we present BUSCO v3 with example analyses that highlight the wide-ranging utility of BUSCO assessments, which extend beyond quality control of genomics data sets to applications in comparative genomics analyses, gene predictor training, metagenomics, and phylogenomics.

6.
Nucleic Acids Res ; 45(D1): D744-D749, 2017 01 04.
Article in English | MEDLINE | ID: mdl-27899580

ABSTRACT

OrthoDB is a comprehensive catalog of orthologs, genes inherited by extant species from a single gene in their last common ancestor. In 2016 OrthoDB reached its 9th release, growing to over 22 million genes from over 5000 species, now adding plants, archaea and viruses. In this update we focused on usability of this fast-growing wealth of data: updating the user and programmatic interfaces to browse and query the data, and further enhancing the already extensive integration of available gene functional annotations. Collating functional annotations from over 100 resources, and enabled us to propose descriptive titles for 87% of ortholog groups. Additionally, OrthoDB continues to provide computed evolutionary annotations and to allow user queries by sequence homology. The OrthoDB resource now enables users to generate publication-quality comparative genomics charts, as well as to upload, analyze and interactively explore their own private data. OrthoDB is available from http://orthodb.org.


Subject(s)
Computational Biology/methods , Databases, Genetic , Evolution, Molecular , Genomics/methods , Algorithms , Animals , Archaea/genetics , Bacteria/genetics , Fungi/genetics , Molecular Sequence Annotation , Plants/genetics , Software , User-Computer Interface , Viruses/genetics , Web Browser
7.
Bioinformatics ; 31(19): 3210-2, 2015 Oct 01.
Article in English | MEDLINE | ID: mdl-26059717

ABSTRACT

MOTIVATION: Genomics has revolutionized biological research, but quality assessment of the resulting assembled sequences is complicated and remains mostly limited to technical measures like N50. RESULTS: We propose a measure for quantitative assessment of genome assembly and annotation completeness based on evolutionarily informed expectations of gene content. We implemented the assessment procedure in open-source software, with sets of Benchmarking Universal Single-Copy Orthologs, named BUSCO. AVAILABILITY AND IMPLEMENTATION: Software implemented in Python and datasets available for download from http://busco.ezlab.org. CONTACT: evgeny.zdobnov@unige.ch SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Computational Biology/methods , Gene Dosage/genetics , Genome , Genomics/methods , Molecular Sequence Annotation/methods , Software , Animals , Humans
8.
Nucleic Acids Res ; 43(Database issue): D250-6, 2015 Jan.
Article in English | MEDLINE | ID: mdl-25428351

ABSTRACT

Orthology, refining the concept of homology, is the cornerstone of evolutionary comparative studies. With the ever-increasing availability of genomic data, inference of orthology has become instrumental for generating hypotheses about gene functions crucial to many studies. This update of the OrthoDB hierarchical catalog of orthologs (http://www.orthodb.org) covers 3027 complete genomes, including the most comprehensive set of 87 arthropods, 61 vertebrates, 227 fungi and 2627 bacteria (sampling the most complete and representative genomes from over 11,000 available). In addition to the most extensive integration of functional annotations from UniProt, InterPro, GO, OMIM, model organism phenotypes and COG functional categories, OrthoDB uniquely provides evolutionary annotations including rates of ortholog sequence divergence, copy-number profiles, sibling groups and gene architectures. We re-designed the entirety of the OrthoDB website from the underlying technology to the user interface, enabling the user to specify species of interest and to select the relevant orthology level by the NCBI taxonomy. The text searches allow use of complex logic with various identifiers of genes, proteins, domains, ontologies or annotation keywords and phrases. Gene copy-number profiles can also be queried. This release comes with the freely available underlying ortholog clustering pipeline (http://www.orthodb.org/software).


Subject(s)
Databases, Genetic , Sequence Homology , Algorithms , Animals , Data Curation , Eukaryota/genetics , Evolution, Molecular , Genome, Microbial , Humans , Software
9.
Genome Res ; 23(8): 1235-47, 2013 Aug.
Article in English | MEDLINE | ID: mdl-23636946

ABSTRACT

Genomes of eusocial insects code for dramatic examples of phenotypic plasticity and social organization. We compared the genomes of seven ants, the honeybee, and various solitary insects to examine whether eusocial lineages share distinct features of genomic organization. Each ant lineage contains ∼4000 novel genes, but only 64 of these genes are conserved among all seven ants. Many gene families have been expanded in ants, notably those involved in chemical communication (e.g., desaturases and odorant receptors). Alignment of the ant genomes revealed reduced purifying selection compared with Drosophila without significantly reduced synteny. Correspondingly, ant genomes exhibit dramatic divergence of noncoding regulatory elements; however, extant conserved regions are enriched for novel noncoding RNAs and transcription factor-binding sites. Comparison of orthologous gene promoters between eusocial and solitary species revealed significant regulatory evolution in both cis (e.g., Creb) and trans (e.g., fork head) for nearly 2000 genes, many of which exhibit phenotypic plasticity. Our results emphasize that genomic changes can occur remarkably fast in ants, because two recently diverged leaf-cutter ant species exhibit faster accumulation of species-specific genes and greater divergence in regulatory elements compared with other ants or Drosophila. Thus, while the "socio-genomes" of ants and the honeybee are broadly characterized by a pervasive pattern of divergence in gene composition and regulation, they preserve lineage-specific regulatory features linked to eusociality. We propose that changes in gene regulation played a key role in the origins of insect eusociality, whereas changes in gene composition were more relevant for lineage-specific eusocial adaptations.


Subject(s)
Ants/genetics , Genome, Insect , Animals , Behavior, Animal , Binding Sites , Conserved Sequence , DNA Methylation , Evolution, Molecular , Gene Expression Regulation , Hymenoptera/genetics , Insect Proteins/genetics , MicroRNAs/genetics , Models, Genetic , Phylogeny , Regulatory Sequences, Nucleic Acid , Sequence Analysis, DNA , Social Behavior , Species Specificity , Synteny , Transcription Factors/genetics
10.
Nucleic Acids Res ; 41(Database issue): D358-65, 2013 Jan.
Article in English | MEDLINE | ID: mdl-23180791

ABSTRACT

The concept of orthology provides a foundation for formulating hypotheses on gene and genome evolution, and thus forms the cornerstone of comparative genomics, phylogenomics and metagenomics. We present the update of OrthoDB-the hierarchical catalog of orthologs (http://www.orthodb.org). From its conception, OrthoDB promoted delineation of orthologs at varying resolution by explicitly referring to the hierarchy of species radiations, now also adopted by other resources. The current release provides comprehensive coverage of animals and fungi representing 252 eukaryotic species, and is now extended to prokaryotes with the inclusion of 1115 bacteria. Functional annotations of orthologous groups are provided through mapping to InterPro, GO, OMIM and model organism phenotypes, with cross-references to major resources including UniProt, NCBI and FlyBase. Uniquely, OrthoDB provides computed evolutionary traits of orthologs, such as gene duplicability and loss profiles, divergence rates, sibling groups, and now extended with exon-intron architectures, syntenic orthologs and parent-child trees. The interactive web interface allows navigation along the species phylogenies, complex queries with various identifiers, annotation keywords and phrases, as well as with gene copy-number profiles and sequence homology searches. With the explosive growth of available data, OrthoDB also provides mapping of newly sequenced genomes and transcriptomes to the current orthologous groups.


Subject(s)
Databases, Genetic , Genes, Bacterial , Genes, Fungal , Genes , Animals , Cluster Analysis , Evolution, Molecular , Humans , Internet , Mice , Molecular Sequence Annotation , Phenotype , Phylogeny , Synteny
11.
Science ; 331(6017): 555-61, 2011 Feb 04.
Article in English | MEDLINE | ID: mdl-21292972

ABSTRACT

We describe the draft genome of the microcrustacean Daphnia pulex, which is only 200 megabases and contains at least 30,907 genes. The high gene count is a consequence of an elevated rate of gene duplication resulting in tandem gene clusters. More than a third of Daphnia's genes have no detectable homologs in any other available proteome, and the most amplified gene families are specific to the Daphnia lineage. The coexpansion of gene families interacting within metabolic pathways suggests that the maintenance of duplicated genes is not random, and the analysis of gene expression under different environmental conditions reveals that numerous paralogs acquire divergent expression patterns soon after duplication. Daphnia-specific genes, including many additional loci within sequenced regions that are otherwise devoid of annotations, are the most responsive genes to ecological challenges.


Subject(s)
Daphnia/genetics , Ecosystem , Genome , Adaptation, Physiological , Amino Acid Sequence , Animals , Base Sequence , Chromosome Mapping , Daphnia/physiology , Environment , Evolution, Molecular , Gene Conversion , Gene Duplication , Gene Expression , Gene Expression Profiling , Gene Expression Regulation , Genes , Genes, Duplicate , Metabolic Networks and Pathways/genetics , Molecular Sequence Annotation , Molecular Sequence Data , Multigene Family , Phylogeny , Sequence Analysis, DNA
12.
Nucleic Acids Res ; 39(Database issue): D283-8, 2011 Jan.
Article in English | MEDLINE | ID: mdl-20972218

ABSTRACT

The concept of homology drives speculation on a gene's function in any given species when its biological roles in other species are characterized. With reference to a specific species radiation homologous relations define orthologs, i.e. descendants from a single gene of the ancestor. The large-scale delineation of gene genealogies is a challenging task, and the numerous approaches to the problem reflect the importance of the concept of orthology as a cornerstone for comparative studies. Here, we present the updated OrthoDB catalog of eukaryotic orthologs delineated at each radiation of the species phylogeny in an explicitly hierarchical manner of over 100 species of vertebrates, arthropods and fungi (including the metazoa level). New database features include functional annotations, and quantification of evolutionary divergence and relations among orthologous groups. The interface features extended phyletic profile querying and enhanced text-based searches. The ever-increasing sampling of sequenced eukaryotic genomes brings a clearer account of the majority of gene genealogies that will facilitate informed hypotheses of gene function in newly sequenced genomes. Furthermore, uniform analysis across lineages as different as vertebrates, arthropods and fungi with divergence levels varying from several to hundreds of millions of years will provide essential data for uncovering and quantifying long-term trends of gene evolution. OrthoDB is freely accessible from http://cegg.unige.ch/orthodb.


Subject(s)
Databases, Genetic , Phylogeny , Sequence Homology, Amino Acid , Animals , Arthropods/genetics , Drosophila melanogaster/genetics , Evolution, Molecular , Fungi/genetics , Genes , Genomics , Mice , Molecular Sequence Annotation , Protein Structure, Tertiary , Proteins/genetics , Saccharomyces cerevisiae/genetics , Vertebrates/genetics
13.
Genome Biol Evol ; 3: 75-86, 2011.
Article in English | MEDLINE | ID: mdl-21148284

ABSTRACT

Delineating ancestral gene relations among a large set of sequenced eukaryotic genomes allowed us to rigorously examine links between evolutionary and functional traits. We classified 86% of over 1.36 million protein-coding genes from 40 vertebrates, 23 arthropods, and 32 fungi into orthologous groups and linked over 90% of them to Gene Ontology or InterPro annotations. Quantifying properties of ortholog phyletic retention, copy-number variation, and sequence conservation, we examined correlations with gene essentiality and functional traits. More than half of vertebrate, arthropod, and fungal orthologs are universally present across each lineage. These universal orthologs are preferentially distributed in groups with almost all single-copy or all multicopy genes, and sequence evolution of the predominantly single-copy orthologous groups is markedly more constrained. Essential genes from representative model organisms, Mus musculus, Drosophila melanogaster, and Saccharomyces cerevisiae, are significantly enriched in universal orthologs within each lineage, and essential-gene-containing groups consistently exhibit greater sequence conservation than those without. This study of eukaryotic gene repertoire evolution identifies shared fundamental principles and highlights lineage-specific features, it also confirms that essential genes are highly retained and conclusively supports the "knockout-rate prediction" of stronger constraints on essential gene sequence evolution. However, the distinction between sequence conservation of single- versus multicopy orthologs is quantitatively more prominent than between orthologous groups with and without essential genes. The previously underappreciated difference in the tolerance of gene duplications and contrasting evolutionary modes of "single-copy control" versus "multicopy license" may reflect a major evolutionary mechanism that allows extended exploration of gene sequence space.


Subject(s)
Arthropods/genetics , Evolution, Molecular , Fungi/genetics , Gene Duplication , Genes, Essential , Vertebrates/genetics , Animals , Arthropods/classification , Computational Biology , Fungi/classification , Genome , Phylogeny , Proteome , Quantitative Trait Loci , Vertebrates/classification
14.
Proc Natl Acad Sci U S A ; 107(27): 12168-73, 2010 Jul 06.
Article in English | MEDLINE | ID: mdl-20566863

ABSTRACT

As an obligatory parasite of humans, the body louse (Pediculus humanus humanus) is an important vector for human diseases, including epidemic typhus, relapsing fever, and trench fever. Here, we present genome sequences of the body louse and its primary bacterial endosymbiont Candidatus Riesia pediculicola. The body louse has the smallest known insect genome, spanning 108 Mb. Despite its status as an obligate parasite, it retains a remarkably complete basal insect repertoire of 10,773 protein-coding genes and 57 microRNAs. Representing hemimetabolous insects, the genome of the body louse thus provides a reference for studies of holometabolous insects. Compared with other insect genomes, the body louse genome contains significantly fewer genes associated with environmental sensing and response, including odorant and gustatory receptors and detoxifying enzymes. The unique architecture of the 18 minicircular mitochondrial chromosomes of the body louse may be linked to the loss of the gene encoding the mitochondrial single-stranded DNA binding protein. The genome of the obligatory louse endosymbiont Candidatus Riesia pediculicola encodes less than 600 genes on a short, linear chromosome and a circular plasmid. The plasmid harbors a unique arrangement of genes required for the synthesis of pantothenate, an essential vitamin deficient in the louse diet. The human body louse, its primary endosymbiont, and the bacterial pathogens that it vectors all possess genomes reduced in size compared with their free-living close relatives. Thus, the body louse genome project offers unique information and tools to use in advancing understanding of coevolution among vectors, symbionts, and pathogens.


Subject(s)
Genome, Bacterial/genetics , Genome, Insect/genetics , Pediculus/genetics , Pediculus/microbiology , Animals , Enterobacteriaceae/genetics , Genes, Bacterial/genetics , Genes, Insect/genetics , Genomics/methods , Humans , Lice Infestations/parasitology , Molecular Sequence Data , Sequence Analysis, DNA , Symbiosis
15.
Genome Biol ; 10(4): R43, 2009.
Article in English | MEDLINE | ID: mdl-19393040

ABSTRACT

BACKGROUND: The newly assembled Bos taurus genome sequence enables the linkage of bovine milk and lactation data with other mammalian genomes. RESULTS: Using publicly available milk proteome data and mammary expressed sequence tags, 197 milk protein genes and over 6,000 mammary genes were identified in the bovine genome. Intersection of these genes with 238 milk production quantitative trait loci curated from the literature decreased the search space for milk trait effectors by more than an order of magnitude. Genome location analysis revealed a tendency for milk protein genes to be clustered with other mammary genes. Using the genomes of a monotreme (platypus), a marsupial (opossum), and five placental mammals (bovine, human, dog, mice, rat), gene loss and duplication, phylogeny, sequence conservation, and evolution were examined. Compared with other genes in the bovine genome, milk and mammary genes are: more likely to be present in all mammals; more likely to be duplicated in therians; more highly conserved across Mammalia; and evolving more slowly along the bovine lineage. The most divergent proteins in milk were associated with nutritional and immunological components of milk, whereas highly conserved proteins were associated with secretory processes. CONCLUSIONS: Although both copy number and sequence variation contribute to the diversity of milk protein composition across species, our results suggest that this diversity is primarily due to other mechanisms. Our findings support the essentiality of milk to the survival of mammalian neonates and the establishment of milk secretory mechanisms more than 160 million years ago.


Subject(s)
Cattle/genetics , Genome/genetics , Lactation/genetics , Milk Proteins/genetics , Animals , Chromosome Mapping , Chromosomes, Mammalian/genetics , Computational Biology/methods , Databases, Genetic , Evolution, Molecular , Female , Humans , Mammals/classification , Mammals/genetics , Mammary Glands, Animal/metabolism , Milk/chemistry , Milk Proteins/classification , Phylogeny , Quantitative Trait Loci/genetics
16.
Nucleic Acids Res ; 37(Database issue): D111-7, 2009 Jan.
Article in English | MEDLINE | ID: mdl-18927110

ABSTRACT

MicroRNAs (miRNAs) are short, non-protein coding RNAs that direct the widespread phenomenon of post-transcriptional regulation of metazoan genes. The mature approximately 22-nt long RNA molecules are processed from genome-encoded stem-loop structured precursor genes. Hundreds of such genes have been experimentally validated in vertebrate genomes, yet their discovery remains challenging, and substantially higher numbers have been estimated. The miROrtho database (http://cegg.unige.ch/mirortho) presents the results of a comprehensive computational survey of miRNA gene candidates across the majority of sequenced metazoan genomes. We designed and applied a three-tier analysis pipeline: (i) an SVM-based ab initio screen for potent hairpins, plus homologs of known miRNAs, (ii) an orthology delineation procedure and (iii) an SVM-based classifier of the ortholog multiple sequence alignments. The web interface provides direct access to putative miRNA annotations, ortholog multiple alignments, RNA secondary structure conservation, and sequence data. The miROrtho data are conceptually complementary to the miRBase catalog of experimentally verified miRNA sequences, providing a consistent comparative genomics perspective as well as identifying many novel miRNA genes with strong evolutionary support.


Subject(s)
Databases, Nucleic Acid , MicroRNAs/chemistry , MicroRNAs/genetics , Genomics , Internet , Nucleic Acid Conformation , Sequence Alignment , User-Computer Interface
17.
Am J Hum Genet ; 82(4): 971-81, 2008 Apr.
Article in English | MEDLINE | ID: mdl-18394580

ABSTRACT

The elucidation of the largely unknown transcriptome of small RNAs is crucial for the understanding of genome and cellular function. We report here the results of the analysis of small RNAs (< 50 nt) in the ENCODE regions of the human genome. Size-fractionated RNAs from four different cell lines (HepG2, HelaS3, GM06990, SK-N-SH) were mapped with the forward and reverse ENCODE high-density resolution tiling arrays. The top 1% of hybridization signals are termed SmRfrags (Small RNA fragments). Eight percent of SmRfrags overlap the GENCODE genes (CDS), given that the majority map to intergenic regions (34%), intronic regions (53%), and untranslated regions (UTRs) (5%). In addition, 9.6% and 16.8% of SmRfrags in the 5' UTR regions overlap significantly with His/Pol II/TAF250 binding sites and DNase I Hypersensitive sites, respectively (compared to the 5.3% and 9% expected). Interestingly, 17%-24% (depending on the cell line) of SmRfrags are sense-antisense strand pairs that show evidence of overlapping transcription. Only 3.4% and 7.2% of SmRfrags in intergenic regions overlap transcribed fragments (Txfrags) in HeLa and GM06990 cell lines, respectively. We hypothesized that a fraction of the identified SmRfrags corresponded to microRNAs. We tested by Northern blot a set of 15 high-likelihood predictions of microRNA candidates that overlap with smRfrags and validated three potential microRNAs ( approximately 20 nt length). Notably, most of the remaining candidates showed a larger hybridizing band ( approximately 100 nt) that could be a microRNA precursor. The small RNA transcriptome is emerging as an important and abundant component of the genome function.


Subject(s)
Chromosome Mapping , Genome, Human/genetics , MicroRNAs/genetics , Transcription, Genetic , 5' Untranslated Regions/genetics , Base Sequence , Cell Line, Tumor , Humans , Molecular Sequence Data , Oligonucleotide Array Sequence Analysis
18.
Nucleic Acids Res ; 36(Database issue): D271-5, 2008 Jan.
Article in English | MEDLINE | ID: mdl-17947323

ABSTRACT

The concept of orthology is widely used to relate genes across different species using comparative genomics, and it provides the basis for inferring gene function. Here we present the web accessible OrthoDB database that catalogs groups of orthologous genes in a hierarchical manner, at each radiation of the species phylogeny, from more general groups to more fine-grained delineations between closely related species. We used a COG-like and Inparanoid-like ortholog delineation procedure on the basis of all-against-all Smith-Waterman sequence comparisons to analyze 58 eukaryotic genomes, focusing on vertebrates, insects and fungi to facilitate further comparative studies. The database is freely available at http://cegg.unige.ch/orthodb.


Subject(s)
Databases, Genetic , Genomics , Phylogeny , Animals , Fungi/genetics , Insecta/genetics , Internet , Proteomics , User-Computer Interface , Vertebrates/genetics
19.
Genome Biol ; 8(11): R242, 2007.
Article in English | MEDLINE | ID: mdl-18021399

ABSTRACT

BACKGROUND: The increasing number of sequenced insect and vertebrate genomes of variable divergence enables refined comparative analyses to quantify the major modes of animal genome evolution and allows tracing of gene genealogy (orthology) and pinpointing of gene extinctions (losses), which can reveal lineage-specific traits. RESULTS: To consistently quantify losses of orthologous groups of genes, we compared the gene repertoires of five vertebrates and five insects, including honeybee and Tribolium beetle, that represent insect orders outside the previously sequenced Diptera. We found hundreds of lost Urbilateria genes in each of the lineages and assessed their phylogenetic origin. The rate of losses correlates well with the species' rates of molecular evolution and radiation times, without distinction between insects and vertebrates, indicating their stochastic nature. Remarkably, this extends to the universal single-copy orthologs, losses of dozens of which have been tolerated in each species. Nevertheless, the propensity for loss differs substantially among genes, where roughly 20% of the orthologs have an 8-fold higher chance of becoming extinct. Extrapolation of our data also suggests that the Urbilateria genome contained more than 7,000 genes. CONCLUSION: Our results indicate that the seemingly higher number of observed gene losses in insects can be explained by their two- to three-fold higher evolutionary rate. Despite the profound effect of many losses on cellular machinery, overall, they seem to be guided by neutral evolution.


Subject(s)
Insecta/genetics , Vertebrates/genetics , Animals , Evolution, Molecular , Genetic Variation , Humans , Likelihood Functions , Models, Genetic , Phylogeny
20.
Science ; 316(5832): 1738-43, 2007 Jun 22.
Article in English | MEDLINE | ID: mdl-17588928

ABSTRACT

Mosquitoes are vectors of parasitic and viral diseases of immense importance for public health. The acquisition of the genome sequence of the yellow fever and Dengue vector, Aedes aegypti (Aa), has enabled a comparative phylogenomic analysis of the insect immune repertoire: in Aa, the malaria vector Anopheles gambiae (Ag), and the fruit fly Drosophila melanogaster (Dm). Analysis of immune signaling pathways and response modules reveals both conservative and rapidly evolving features associated with different functional gene categories and particular aspects of immune reactions. These dynamics reflect in part continuous readjustment between accommodation and rejection of pathogens and suggest how innate immunity may have evolved.


Subject(s)
Aedes/genetics , Anopheles/genetics , Evolution, Molecular , Immunity, Innate/genetics , Insect Vectors/genetics , Aedes/immunology , Animals , Anopheles/immunology , Antimicrobial Cationic Peptides/physiology , Carrier Proteins/genetics , Carrier Proteins/physiology , Drosophila melanogaster/genetics , Drosophila melanogaster/immunology , Genes, Insect , Insect Proteins/genetics , Insect Proteins/physiology , Insect Vectors/immunology , Malaria/transmission , Melanins/metabolism , Multigene Family , Signal Transduction , Species Specificity
SELECTION OF CITATIONS
SEARCH DETAIL
...