Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
Add more filters










Publication year range
1.
Nature ; 515(7527): 371-375, 2014 Nov 20.
Article in English | MEDLINE | ID: mdl-25409826

ABSTRACT

To broaden our understanding of the evolution of gene regulation mechanisms, we generated occupancy profiles for 34 orthologous transcription factors (TFs) in human-mouse erythroid progenitor, lymphoblast and embryonic stem-cell lines. By combining the genome-wide transcription factor occupancy repertoires, associated epigenetic signals, and co-association patterns, here we deduce several evolutionary principles of gene regulatory features operating since the mouse and human lineages diverged. The genomic distribution profiles, primary binding motifs, chromatin states, and DNA methylation preferences are well conserved for TF-occupied sequences. However, the extent to which orthologous DNA segments are bound by orthologous TFs varies both among TFs and with genomic location: binding at promoters is more highly conserved than binding at distal elements. Notably, occupancy-conserved TF-occupied sequences tend to be pleiotropic; they function in several tissues and also co-associate with many TFs. Single nucleotide variants at sites with potential regulatory functions are enriched in occupancy-conserved TF-occupied sequences.


Subject(s)
Conserved Sequence/genetics , Genome/genetics , Genomics , Regulatory Sequences, Nucleic Acid/genetics , Transcription Factors/metabolism , Animals , Cell Line , Chromatin/genetics , Chromatin/metabolism , Enhancer Elements, Genetic/genetics , Humans , Mice , Polymorphism, Single Nucleotide/genetics
2.
Nature ; 512(7515): 453-6, 2014 Aug 28.
Article in English | MEDLINE | ID: mdl-25164757

ABSTRACT

Despite the large evolutionary distances between metazoan species, they can show remarkable commonalities in their biology, and this has helped to establish fly and worm as model organisms for human biology. Although studies of individual elements and factors have explored similarities in gene regulation, a large-scale comparative analysis of basic principles of transcriptional regulatory features is lacking. Here we map the genome-wide binding locations of 165 human, 93 worm and 52 fly transcription regulatory factors, generating a total of 1,019 data sets from diverse cell types, developmental stages, or conditions in the three species, of which 498 (48.9%) are presented here for the first time. We find that structural properties of regulatory networks are remarkably conserved and that orthologous regulatory factor families recognize similar binding motifs in vivo and show some similar co-associations. Our results suggest that gene-regulatory properties previously observed for individual factors are general principles of metazoan regulation that are remarkably well-preserved despite extensive functional divergence of individual network connections. The comparative maps of regulatory circuitry provided here will drive an improved understanding of the regulatory underpinnings of model organism biology and how these relate to human biology, development and disease.


Subject(s)
Caenorhabditis elegans/genetics , Drosophila melanogaster/genetics , Evolution, Molecular , Gene Expression Regulation/genetics , Gene Regulatory Networks/genetics , Transcription Factors/metabolism , Animals , Binding Sites , Caenorhabditis elegans/growth & development , Chromatin Immunoprecipitation , Conserved Sequence/genetics , Drosophila melanogaster/growth & development , Gene Expression Regulation, Developmental/genetics , Genome/genetics , Humans , Molecular Sequence Annotation , Nucleotide Motifs/genetics , Organ Specificity/genetics , Transcription Factors/genetics
3.
Nature ; 489(7414): 91-100, 2012 Sep 06.
Article in English | MEDLINE | ID: mdl-22955619

ABSTRACT

Transcription factors bind in a combinatorial fashion to specify the on-and-off states of genes; the ensemble of these binding events forms a regulatory network, constituting the wiring diagram for a cell. To examine the principles of the human transcriptional regulatory network, we determined the genomic binding information of 119 transcription-related factors in over 450 distinct experiments. We found the combinatorial, co-association of transcription factors to be highly context specific: distinct combinations of factors bind at specific genomic locations. In particular, there are significant differences in the binding proximal and distal to genes. We organized all the transcription factor binding into a hierarchy and integrated it with other genomic information (for example, microRNA regulation), forming a dense meta-network. Factors at different levels have different properties; for instance, top-level transcription factors more strongly influence expression and middle-level ones co-regulate targets to mitigate information-flow bottlenecks. Moreover, these co-regulations give rise to many enriched network motifs (for example, noise-buffering feed-forward loops). Finally, more connected network components are under stronger selection and exhibit a greater degree of allele-specific activity (that is, differential binding to the two parental alleles). The regulatory information obtained in this study will be crucial for interpreting personal genome sequences and understanding basic principles of human biology and disease.


Subject(s)
DNA/genetics , Encyclopedias as Topic , Gene Regulatory Networks/genetics , Genome, Human/genetics , Molecular Sequence Annotation , Regulatory Sequences, Nucleic Acid/genetics , Transcription Factors/metabolism , Alleles , Cell Line , GATA1 Transcription Factor/metabolism , Gene Expression Profiling , Genomics , Humans , K562 Cells , Organ Specificity , Phosphorylation/genetics , Polymorphism, Single Nucleotide/genetics , Protein Interaction Maps , RNA, Untranslated/genetics , RNA, Untranslated/metabolism , Selection, Genetic/genetics , Transcription Initiation Site
4.
Genome Res ; 22(9): 1813-31, 2012 Sep.
Article in English | MEDLINE | ID: mdl-22955991

ABSTRACT

Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) has become a valuable and widely used approach for mapping the genomic location of transcription-factor binding and histone modifications in living cells. Despite its widespread use, there are considerable differences in how these experiments are conducted, how the results are scored and evaluated for quality, and how the data and metadata are archived for public use. These practices affect the quality and utility of any global ChIP experiment. Through our experience in performing ChIP-seq experiments, the ENCODE and modENCODE consortia have developed a set of working standards and guidelines for ChIP experiments that are updated routinely. The current guidelines address antibody validation, experimental replication, sequencing depth, data and metadata reporting, and data quality assessment. We discuss how ChIP quality, assessed in these ways, affects different uses of ChIP-seq data. All data sets used in the analysis have been deposited for public viewing and downloading at the ENCODE (http://encodeproject.org/ENCODE/) and modENCODE (http://www.modencode.org/) portals.


Subject(s)
Chromatin Immunoprecipitation/methods , Databases, Genetic , High-Throughput Nucleotide Sequencing/methods , Animals , Genome/genetics , Genomics/methods , Guidelines as Topic , Histones/metabolism , Humans , Internet , Transcription Factors/metabolism
6.
Nucleic Acids Res ; 38(20): 6997-7007, 2010 Nov.
Article in English | MEDLINE | ID: mdl-20615899

ABSTRACT

Duplicated pseudogenes in the human genome are disabled copies of functioning parent genes. They result from block duplication events occurring throughout evolutionary history. Relatively recent duplications (with sequence similarity≥90% and length≥1 kb) are termed segmental duplications (SDs); here, we analyze the interrelationship of SDs and pseudogenes. We present a decision-tree approach to classify pseudogenes based on their (and their parents') characteristics in relation to SDs. The classification identifies 140 novel pseudogenes and makes possible improved annotation for the 3172 pseudogenes located in SDs. In particular, it reveals that many pseudogenes in SDs likely did not arise directly from parent genes, but are the result of a multi-step process. In these cases, the initial duplication or retrotransposition of a parent gene gives rise to a 'parent pseudogene', followed by further duplication creating duplicated-duplicated or duplicated-processed pseudogenes, respectively. Moreover, we can precisely identify these parent pseudogenes by overlap with ancestral SD loci. Finally, a comparison of nucleotide substitutions per site in a pseudogene with its surrounding SD region allows us to estimate the time difference between duplication and disablement events, and this suggests that most duplicated pseudogenes in SDs were likely disabled around the time of the original duplication.


Subject(s)
Genome, Human , Pseudogenes , Segmental Duplications, Genomic , Evolution, Molecular , Gene Duplication , Genetic Loci , Humans
7.
Nat Biotechnol ; 28(1): 47-55, 2010 Jan.
Article in English | MEDLINE | ID: mdl-20037582

ABSTRACT

Structural variants (SVs) are a major source of human genomic variation; however, characterizing them at nucleotide resolution remains challenging. Here we assemble a library of breakpoints at nucleotide resolution from collating and standardizing ~2,000 published SVs. For each breakpoint, we infer its ancestral state (through comparison to primate genomes) and its mechanism of formation (e.g., nonallelic homologous recombination, NAHR). We characterize breakpoint sequences with respect to genomic landmarks, chromosomal location, sequence motifs and physical properties, finding that the occurrence of insertions and deletions is more balanced than previously reported and that NAHR-formed breakpoints are associated with relatively rigid, stable DNA helices. Finally, we demonstrate an approach, BreakSeq, for scanning the reads from short-read sequenced genomes against our breakpoint library to accurately identify previously overlooked SVs, which we then validate by PCR. As new data become available, we expect our BreakSeq approach will become more sensitive and facilitate rapid SV genotyping of personal genomes.


Subject(s)
Chromosome Breakpoints , Gene Library , Genetic Variation , Nucleotides/genetics , Sequence Analysis, DNA/methods , Animals , Bias , Chromosome Mapping , Genetic Loci/genetics , Humans , Phylogeny , Primates/genetics
8.
Genome Biol ; 10(2): R23, 2009 Feb 23.
Article in English | MEDLINE | ID: mdl-19236709

ABSTRACT

Personal-genomics endeavors, such as the 1000 Genomes project, are generating maps of genomic structural variants by analyzing ends of massively sequenced genome fragments. To process these we developed Paired-End Mapper (PEMer; http://sv.gersteinlab.org/pemer). This comprises an analysis pipeline, compatible with several next-generation sequencing platforms; simulation-based error models, yielding confidence-values for each structural variant; and a back-end database. The simulations demonstrated high structural variant reconstruction efficiency for PEMer's coverage-adjusted multi-cutoff scoring-strategy and showed its relative insensitivity to base-calling errors.


Subject(s)
Computational Biology/methods , Genomic Structural Variation , Models, Genetic , Base Sequence , Computer Simulation , Genome , Genomics/methods , Internet , Software
9.
Genome Biol ; 10(1): R2, 2009.
Article in English | MEDLINE | ID: mdl-19123937

ABSTRACT

BACKGROUND: The availability of genome sequences of numerous organisms allows comparative study of pseudogenes in syntenic regions. Conservation of pseudogenes suggests that they might have a functional role in some instances. RESULTS: We report the first large-scale comparative analysis of ribosomal protein pseudogenes in four mammalian genomes (human, chimpanzee, mouse and rat). To this end, we have assigned these pseudogenes in the four organisms using an automated pipeline and make the results available online. Each organism has a large number of ribosomal protein pseudogenes (approximately 1,400 to 2,800). The majority of them are processed (generated by retrotransposition). However, we do not see a correlation between the number of pseudogenes associated with a ribosomal protein gene and its mRNA abundance. Analysis of pseudogenes in syntenic regions between species shows that most are conserved between human and chimpanzee, but very few are conserved between primates and rodents. Interestingly, syntenic pseudogenes have a lower rate of nucleotide substitution than their surrounding intergenic DNA. Moreover, evidence from expressed sequence tags indicates that two pseudogenes conserved between human and mouse are transcribed. Detailed analysis shows that one of them, the pseudogene of RPS27, is likely to be a protein-coding gene. This is significant as previous reports indicated there are exactly 80 ribosomal protein genes encoded by the human genome. CONCLUSIONS: Our analysis indicates that processed ribosomal protein pseudogenes abound in mammalian genomes, but few of these are conserved between primates and rodents. This highlights the large amount of recent retrotranspositional activity in mammals and a relatively larger amount of it in the rodent lineage.


Subject(s)
Genome/genetics , Pseudogenes , Ribosomal Proteins/genetics , Animals , Expressed Sequence Tags , Humans , Internet , Mammals/genetics , Mice , Pan troglodytes , Phylogeny , RNA, Messenger/analysis , Rats , Retroelements/genetics , Synteny
10.
Nucleic Acids Res ; 37(Database issue): D738-43, 2009 Jan.
Article in English | MEDLINE | ID: mdl-18957444

ABSTRACT

Pseudofam (http://pseudofam.pseudogene.org) is a database of pseudogene families based on the protein families from the Pfam database. It provides resources for analyzing the family structure of pseudogenes including query tools, statistical summaries and sequence alignments. The current version of Pseudofam contains more than 125,000 pseudogenes identified from 10 eukaryotic genomes and aligned within nearly 3000 families (approximately one-third of the total families in PfamA). Pseudofam uses a large-scale parallelized homology search algorithm (implemented as an extension of the PseudoPipe pipeline) to identify pseudogenes. Each identified pseudogene is assigned to its parent protein family and subsequently aligned to each other by transferring the parent domain alignments from the Pfam family. Pseudogenes are also given additional annotation based on an ontology, reflecting their mode of creation and subsequent history. In particular, our annotation highlights the association of pseudogene families with genomic features, such as segmental duplications. In addition, pseudogene families are associated with key statistics, which identify outlier families with an unusual degree of pseudogenization. The statistics also show how the number of genes and pseudogenes in families correlates across different species. Overall, they highlight the fact that housekeeping families tend to be enriched with a large number of pseudogenes.


Subject(s)
Databases, Genetic , Pseudogenes , Animals , Data Interpretation, Statistical , Genomics , Humans , Internet , Proteins/classification , Proteins/genetics , Sequence Alignment
11.
Genome Biol ; 9(1): 401, 2008 Jan 31.
Article in English | MEDLINE | ID: mdl-18254929

ABSTRACT

We take stock of current genetic nomenclature and attempt to organize strange and notable gene names. We categorize, for instance, those that involve a naming system transferred from another context (for example, Pavlov's dogs). We hope this analysis provides clues to better steer gene naming in the future.


Subject(s)
Genes , Terminology as Topic
12.
Mol Biol Evol ; 25(1): 131-43, 2008 Jan.
Article in English | MEDLINE | ID: mdl-18065488

ABSTRACT

Transcription factor pseudogenes have not been systematically studied before. Nuclear receptors (NRs) constitute one of the largest groups of transcription factors in animals (e.g., 48 NRs in human). The availability of whole-genome sequences enables a global inventory of the NR pseudogenes in a number of vertebrate model organisms. Here we identify the NR pseudogenes in 8 vertebrate organisms and make our results available online at http://www.pseudogene.org/nr. The assignments reveal that NR pseudogenes as a group have characteristics related to generation and distribution contrary to expectations derived from previous large-scale pseudogene studies. In particular, 1) despite its large size, the NR gene family has only a very small number of pseudogenes in each of the vertebrate genomes examined; 2) despite the low transcription levels of NR genes, except for one, all other NR pseudogenes identified in this study are retropseudogenes; and 3) no duplicated NR pseudogenes are found, contrary to the fact that the NR gene family was expanded through several waves of gene duplication events. Our analyses further reveal a number of interesting aspects of NR pseudogenes. Specifically, through careful sequence analysis, we identify remnant introns in 2 mouse retropseudogenes, psiRev-erbbeta and psiLRH1. Generated from partially processed pre-mRNAs, they appear to be rare examples of highly unusual "semiprocessed" pseudogenes. Second, by comparing the genomic sequences, we uncover a pseudogene that is unique to the human lineage relative to chimpanzee. Generated by a recent duplication of a segment in the human genome, this pseudogene is a "duplicated-processed" pseudogene, belonging to a new pseudogene species. Finally, FXRbeta was nonfunctionalized in the human lineage and thus appears to be an example of a rare unitary pseudogene. By comparing orthologous sequences, we dated the FXR-FXRbeta duplication and the nonfunctionalization of FXRbeta in primates.


Subject(s)
Evolution, Molecular , Gene Duplication , Multigene Family/genetics , Pseudogenes/genetics , Receptors, Cytoplasmic and Nuclear/genetics , Vertebrates/genetics , Animals , Genome/physiology , Humans , Mice
13.
Nucleic Acids Res ; 35(Database issue): D55-60, 2007 Jan.
Article in English | MEDLINE | ID: mdl-17099229

ABSTRACT

The Pseudogene.org knowledgebase serves as a comprehensive repository for pseudogene annotation. The definition of a pseudogene varies within the literature, resulting in significantly different approaches to the problem of identification. Consequently, it is difficult to maintain a consistent collection of pseudogenes in detail necessary for their effective use. Our database is designed to address this issue. It integrates a variety of heterogeneous resources and supports a subset structure that highlights specific groups of pseudogenes that are of interest to the research community. Tools are provided for the comparison of sets and the creation of layered set unions, enabling researchers to derive a current 'consensus' set of pseudogenes. Additional features include versatile search, the capacity for robust interaction with other databases, the ability to reconstruct older versions of the database (accounting for changing genome builds) and an underlying object-oriented interface designed for researchers with a minimal knowledge of programming. At the present time, the database contains more than 100,000 pseudogenes spanning 64 prokaryote and 11 eukaryote genomes, including a collection of human annotations compiled from 16 sources.


Subject(s)
Databases, Genetic , Pseudogenes , Humans , Internet , Software , User-Computer Interface
SELECTION OF CITATIONS
SEARCH DETAIL
...