Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
Add more filters










Publication year range
2.
Genome Biol ; 17(1): 148, 2016 07 05.
Article in English | MEDLINE | ID: mdl-27380939

ABSTRACT

BACKGROUND: The success of the CRISPR/Cas9 genome editing technique depends on the choice of the guide RNA sequence, which is facilitated by various websites. Despite the importance and popularity of these algorithms, it is unclear to which extent their predictions are in agreement with actual measurements. RESULTS: We conduct the first independent evaluation of CRISPR/Cas9 predictions. To this end, we collect data from eight SpCas9 off-target studies and compare them with the sites predicted by popular algorithms. We identify problems in one implementation but found that sequence-based off-target predictions are very reliable, identifying most off-targets with mutation rates superior to 0.1 %, while the number of false positives can be largely reduced with a cutoff on the off-target score. We also evaluate on-target efficiency prediction algorithms against available datasets. The correlation between the predictions and the guide activity varied considerably, especially for zebrafish. Together with novel data from our labs, we find that the optimal on-target efficiency prediction model strongly depends on whether the guide RNA is expressed from a U6 promoter or transcribed in vitro. We further demonstrate that the best predictions can significantly reduce the time spent on guide screening. CONCLUSIONS: To make these guidelines easily accessible to anyone planning a CRISPR genome editing experiment, we built a new website ( http://crispor.org ) that predicts off-targets and helps select and clone efficient guide sequences for more than 120 genomes using different Cas9 proteins and the eight efficiency scoring systems evaluated here.


Subject(s)
CRISPR-Cas Systems/genetics , Gene Editing , RNA, Guide, Kinetoplastida/genetics , Software , Algorithms , Genome , Internet , Promoter Regions, Genetic , RNA, Small Nuclear/genetics
4.
Proc Natl Acad Sci U S A ; 111(17): 6131-8, 2014 Apr 29.
Article in English | MEDLINE | ID: mdl-24753594

ABSTRACT

With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies. We also analyze the relationship between signal intensity, genomic coverage, and evolutionary conservation. Our results reinforce the principle that each approach provides complementary information and that we need to use combinations of all three to elucidate genome function in human biology and disease.


Subject(s)
DNA/genetics , Genome, Human/genetics , Biological Evolution , Disease/genetics , Humans , Regulatory Sequences, Nucleic Acid/genetics , Software
5.
Genome Res ; 19(12): 2324-33, 2009 Dec.
Article in English | MEDLINE | ID: mdl-19767417

ABSTRACT

Since its start, the Mammalian Gene Collection (MGC) has sought to provide at least one full-protein-coding sequence cDNA clone for every human and mouse gene with a RefSeq transcript, and at least 6200 rat genes. The MGC cloning effort initially relied on random expressed sequence tag screening of cDNA libraries. Here, we summarize our recent progress using directed RT-PCR cloning and DNA synthesis. The MGC now contains clones with the entire protein-coding sequence for 92% of human and 89% of mouse genes with curated RefSeq (NM-accession) transcripts, and for 97% of human and 96% of mouse genes with curated RefSeq transcripts that have one or more PubMed publications, in addition to clones for more than 6300 rat genes. These high-quality MGC clones and their sequences are accessible without restriction to researchers worldwide.


Subject(s)
Cloning, Molecular/methods , Computational Biology/methods , DNA, Complementary/genetics , Gene Library , Genes/genetics , Mammals/genetics , Animals , DNA/biosynthesis , Humans , Mice , National Institutes of Health (U.S.) , Rats , Reverse Transcriptase Polymerase Chain Reaction , United States
6.
Methods Mol Biol ; 422: 133-44, 2008.
Article in English | MEDLINE | ID: mdl-18629665

ABSTRACT

The UC Santa Cruz Genome Browser provides a number of resources that can be used for phylogenomic studies, including (1) whole-genome sequence data from a number of vertebrate species, (2) pairwise alignments of the human genome sequence to a number of other vertebrate genome, (3) a simultaneous alignment of 17 vertebrate genomes (most of them incompletely sequenced) that covers all of the human sequence, (4) several independent sets of multiple alignments covering 1% of the human genome (ENCODE regions), (5) extensive sequence annotation for interpreting those sequences and alignments, and (6) sequence, alignments, and annotations from certain other species, including an alignment of nine insect genomes. We illustrate the use of these resources in the context of assigning rare genomic changes to the branch of the phylogenetic tree where they appear to have occurred, or of looking for evidence supporting a particular possible tree topology. Sample source code for performing such studies is available.


Subject(s)
Genome/genetics , Genomics/methods , Internet , Phylogeny , Animals , Chromosome Breakage , Chromosomes , Drosophila/genetics , Humans , Interspersed Repetitive Sequences/genetics , Sequence Alignment
7.
Nat Genet ; 40(5): 523-7, 2008 May.
Article in English | MEDLINE | ID: mdl-18443589

ABSTRACT

It has been four years since the original publication of the draft sequence of the rat genome. Five groups are now working together to assemble, annotate and release an updated version of the rat genome. As the prevailing model for physiology, complex disease and pharmacological studies, there is an acute need for the rat's genomic resources to keep pace with the rat's prominence in the laboratory. In this commentary, we describe the current status of the rat genome sequence and the plans for its impending 'upgrade'. We then cover the key online resources providing access to the rat genome, including the new SNP views at Ensembl, the RefSeq and Genes databases at the US National Center for Biotechnology Information, Genome Browser at the University of California Santa Cruz and the disease portals for cardiovascular disease and obesity at the Rat Genome Database.


Subject(s)
Databases, Genetic , Genome , Rats/genetics , Animals , Computational Biology , Disease Models, Animal , Genetic Diseases, Inborn/genetics , Genetic Variation , Genomics , Haplotypes , Humans , Internet , Polymorphism, Single Nucleotide , Rats, Mutant Strains , Sequence Analysis, DNA
8.
Hum Mutat ; 28(6): 554-62, 2007 Jun.
Article in English | MEDLINE | ID: mdl-17326095

ABSTRACT

PhenCode (Phenotypes for ENCODE; http://www.bx.psu.edu/phencode) is a collaborative, exploratory project to help understand phenotypes of human mutations in the context of sequence and functional data from genome projects. Currently, it connects human phenotype and clinical data in various locus-specific databases (LSDBs) with data on genome sequences, evolutionary history, and function from the ENCODE project and other resources in the UCSC Genome Browser. Initially, we focused on a few selected LSDBs covering genes encoding alpha- and beta-globins (HBA, HBB), phenylalanine hydroxylase (PAH), blood group antigens (various genes), androgen receptor (AR), cystic fibrosis transmembrane conductance regulator (CFTR), and Bruton's tyrosine kinase (BTK), but we plan to include additional loci of clinical importance, ultimately genomewide. We have also imported variant data and associated OMIM links from Swiss-Prot. Users can find interesting mutations in the UCSC Genome Browser (in a new Locus Variants track) and follow links back to the LSDBs for more detailed information. Alternatively, they can start with queries on mutations or phenotypes at an LSDB and then display the results at the Genome Browser to view complementary information such as functional data (e.g., chromatin modifications and protein binding from the ENCODE consortium), evolutionary constraint, regulatory potential, and/or any other tracks they choose. We present several examples illustrating the power of these connections for exploring phenotypes associated with functional elements, and for identifying genomic data that could help to explain clinical phenotypes.


Subject(s)
Databases, Genetic , Mutation , Phenotype , Agammaglobulinaemia Tyrosine Kinase , Blood Group Antigens/genetics , Cooperative Behavior , Cystic Fibrosis Transmembrane Conductance Regulator/genetics , Databases, Genetic/standards , Genotype , Globins/genetics , Humans , Internet , Phenylalanine Hydroxylase/genetics , Protein-Tyrosine Kinases/genetics , Receptors, Androgen/genetics , Software Design , Systems Integration
9.
PLoS Genet ; 2(10): e168, 2006 Oct 13.
Article in English | MEDLINE | ID: mdl-17040131

ABSTRACT

Comparative genomics allow us to search the human genome for segments that were extensively changed in the last approximately 5 million years since divergence from our common ancestor with chimpanzee, but are highly conserved in other species and thus are likely to be functional. We found 202 genomic elements that are highly conserved in vertebrates but show evidence of significantly accelerated substitution rates in human. These are mostly in non-coding DNA, often near genes associated with transcription and DNA binding. Resequencing confirmed that the five most accelerated elements are dramatically changed in human but not in other primates, with seven times more substitutions in human than in chimp. The accelerated elements, and in particular the top five, show a strong bias for adenine and thymine to guanine and cytosine nucleotide changes and are disproportionately located in high recombination and high guanine and cytosine content environments near telomeres, suggesting either biased gene conversion or isochore selection. In addition, there is some evidence of directional selection in the regions containing the two most accelerated regions. A combination of evolutionary forces has contributed to accelerated evolution of the fastest evolving elements in the human genome.


Subject(s)
Evolution, Molecular , Genome, Human/genetics , Selection, Genetic , Animals , Base Pairing , Base Sequence , Conserved Sequence , Humans , Molecular Sequence Data , Recombination, Genetic , Regulatory Elements, Transcriptional/genetics , Sequence Analysis, DNA , Species Specificity
10.
PLoS Comput Biol ; 2(4): e33, 2006 Apr.
Article in English | MEDLINE | ID: mdl-16628248

ABSTRACT

The discoveries of microRNAs and riboswitches, among others, have shown functional RNAs to be biologically more important and genomically more prevalent than previously anticipated. We have developed a general comparative genomics method based on phylogenetic stochastic context-free grammars for identifying functional RNAs encoded in the human genome and used it to survey an eight-way genome-wide alignment of the human, chimpanzee, mouse, rat, dog, chicken, zebra-fish, and puffer-fish genomes for deeply conserved functional RNAs. At a loose threshold for acceptance, this search resulted in a set of 48,479 candidate RNA structures. This screen finds a large number of known functional RNAs, including 195 miRNAs, 62 histone 3'UTR stem loops, and various types of known genetic recoding elements. Among the highest-scoring new predictions are 169 new miRNA candidates, as well as new candidate selenocysteine insertion sites, RNA editing hairpins, RNAs involved in transcript auto regulation, and many folds that form singletons or small functional RNA families of completely unknown function. While the rate of false positives in the overall set is difficult to estimate and is likely to be substantial, the results nevertheless provide evidence for many new human functional RNAs and present specific predictions to facilitate their further characterization.


Subject(s)
Genome, Human , MicroRNAs/chemistry , Nucleic Acid Conformation , Sequence Analysis, RNA/methods , 3' Untranslated Regions , Animals , Chickens , Computational Biology/methods , Conserved Sequence , Dogs , Genome , Humans , Mice , Rats , Tetraodontiformes , Zebrafish
11.
Genome Res ; 14(10B): 2121-7, 2004 Oct.
Article in English | MEDLINE | ID: mdl-15489334

ABSTRACT

The National Institutes of Health's Mammalian Gene Collection (MGC) project was designed to generate and sequence a publicly accessible cDNA resource containing a complete open reading frame (ORF) for every human and mouse gene. The project initially used a random strategy to select clones from a large number of cDNA libraries from diverse tissues. Candidate clones were chosen based on 5'-EST sequences, and then fully sequenced to high accuracy and analyzed by algorithms developed for this project. Currently, more than 11,000 human and 10,000 mouse genes are represented in MGC by at least one clone with a full ORF. The random selection approach is now reaching a saturation point, and a transition to protocols targeted at the missing transcripts is now required to complete the mouse and human collections. Comparison of the sequence of the MGC clones to reference genome sequences reveals that most cDNA clones are of very high sequence quality, although it is likely that some cDNAs may carry missense variants as a consequence of experimental artifact, such as PCR, cloning, or reverse transcriptase errors. Recently, a rat cDNA component was added to the project, and ongoing frog (Xenopus) and zebrafish (Danio) cDNA projects were expanded to take advantage of the high-throughput MGC pipeline.


Subject(s)
Cloning, Molecular/methods , DNA, Complementary , Gene Library , Open Reading Frames/physiology , Animals , Computational Biology , DNA Primers , DNA, Complementary/genetics , DNA, Complementary/metabolism , Humans , Mice , National Institutes of Health (U.S.) , Rats , United States , Xenopus laevis/genetics , Zebrafish/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...