Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 15 de 15
Filter
Add more filters











Publication year range
1.
Genome Res ; 11(7): 1175-86, 2001 Jul.
Article in English | MEDLINE | ID: mdl-11435399

ABSTRACT

Comparative sequence analysis has facilitated the discovery of protein coding genes and important functional sequences within proteins, but has been less useful for identifying functional sequence elements in nonprotein-coding DNA because the relatively rapid rate of change of nonprotein-coding sequences and the relative simplicity of non-coding regulatory sequence elements necessitates the comparison of sequences of relatively closely related species. We tested the use of comparative DNA sequence analysis to aid identification of promoter regulatory elements, nonprotein-coding RNA genes, and small protein-coding genes by surveying random DNA sequences of several Saccharomyces yeast species, with the goal of learning which species are best suited for comparisons with S. cerevisiae. We also determined the DNA sequence of a few specific promoters and RNA genes of several Saccharomyces species to determine the degree of conservation of known functional elements within the genome. Our results lead us to conclude that comparative DNA sequence analysis will enable identification of functionally conserved elements within the yeast genome, and suggest a path for obtaining this information.


Subject(s)
Genes, Fungal/physiology , Genome, Fungal , Regulatory Sequences, Nucleic Acid/genetics , Saccharomyces cerevisiae/genetics , Sequence Analysis, DNA/methods , Base Sequence , DNA, Fungal/genetics , Fungal Proteins/genetics , Gene Expression Regulation, Fungal/genetics , Genes, Regulator , Molecular Sequence Data , RNA, Fungal/analysis , Saccharomyces/genetics , Saccharomyces cerevisiae/physiology
2.
Genome Res ; 11(5): 889-900, 2001 May.
Article in English | MEDLINE | ID: mdl-11337482

ABSTRACT

With the availability of a nearly complete sequence of the human genome, aligning expressed sequence tags (EST) to the genomic sequence has become a practical and powerful strategy for gene prediction. Elucidating gene structure is a complex problem requiring the identification of splice junctions, gene boundaries, and alternative splicing variants. We have developed a software tool, Transcript Assembly Program (TAP), to delineate gene structures using genomically aligned EST sequences. TAP assembles the joint gene structure of the entire genomic region from individual splice junction pairs, using a novel algorithm that uses the EST-encoded connectivity and redundancy information to sort out the complex alternative splicing patterns. A method called polyadenylation site scan (PASS) has been developed to detect poly-A sites in the genome. TAP uses these predictions to identify gene boundaries by segmenting the joint gene structure at polyadenylated terminal exons. Reconstructing 1007 known transcripts, TAP scored a sensitivity (Sn) of 60% and a specificity (Sp) of 92% at the exon level. The gene boundary identification process was found to be accurate 78% of the time. also reports alternative splicing patterns in EST alignments. An analysis of alternative splicing in 1124 genic regions suggested that more than half of human genes undergo alternative splicing. Surprisingly, we saw an absolute majority of the detected alternative splicing events affect the coding region. Furthermore, the evolutionary conservation of alternative splicing between human and mouse was analyzed using an EST-based approach. (See http://stl.wustl.edu/~zkan/TAP/)


Subject(s)
Alternative Splicing/genetics , Computational Biology/methods , Expressed Sequence Tags , Genes/genetics , Sequence Alignment/methods , Computational Biology/instrumentation , Genome, Human , Humans , RNA, Messenger/metabolism , Sequence Alignment/instrumentation , Software , Software Validation , Transcription, Genetic
3.
Nat Genet ; 28(2): 160-4, 2001 Jun.
Article in English | MEDLINE | ID: mdl-11381264

ABSTRACT

Single nucleotide polymorphisms (SNPs) are valuable genetic markers of human disease. They also comprise the highest potential density marker set available for mapping experimentally derived mutations in model organisms such as Caenorhabditis elegans. To facilitate the positional cloning of mutations we have identified polymorphisms in CB4856, an isolate from a Hawaiian island that shows a uniformly high density of polymorphisms compared with the reference Bristol N2 strain. Based on 5.4 Mbp of aligned sequences, we predicted 6,222 polymorphisms. Furthermore, 3,457 of these markers modify restriction enzyme recognition sites ('snip-SNPs') and are therefore easily detected as RFLPs. Of these, 493 were experimentally confirmed by restriction digest to produce a snip-SNP map of the worm genome. A mapping strategy using snip-SNPs and bulked segregant analysis (BSA) is outlined. CB4856 is crossed into a mutant strain, and exclusion of CB4856 alleles of a subset of snip-SNPs in mutant progeny is assessed with BSA. The proximity of a linked marker to the mutation is estimated by the relative proportion of each form of the biallelic marker in populations of wildtype and mutant genomes. The usefulness of this approach is illustrated by the rapid mapping of the dyf-5 gene.


Subject(s)
Caenorhabditis elegans/genetics , Chromosome Mapping/methods , Helminth Proteins/genetics , Polymorphism, Single Nucleotide , Animals , Genetic Linkage , Polymorphism, Genetic , Polymorphism, Restriction Fragment Length
4.
Article in English | MEDLINE | ID: mdl-10977083

ABSTRACT

Untranslated regions (UTR) play important roles in the posttranscriptional regulation of mRNA processing. There is a wealth of UTR-related information to be mined from the rapidly accumulating EST collections. A computational tool, UTR-extender, has been developed to infer UTR sequences from genomically aligned ESTs. It can completely and accurately reconstruct 72% of the 3' UTRs and 15% of the 5' UTRs when tested using 908 functionally cloned transcripts. In addition, it predicts extensions for 11% of the 5' UTRs and 28% of the 3' UTRs. These extension regions are validated by examining splicing frequencies and conservation levels. We also developed a method called polyadenylation site scan (PASS) to precisely map polyadenylation sites in human genomic sequences. A PASS analysis of 908 genic regions estimates that 40-50% of human genes undergo alternative polyadenylation. Using EST redundancy to assess expression levels, we also find that genes with short 3' UTRs tend to be highly expressed.


Subject(s)
Algorithms , Genome, Human , Sequence Analysis/methods , Untranslated Regions , Computer Simulation , Humans , Predictive Value of Tests
5.
Bioinformatics ; 16(11): 1040-1, 2000 Nov.
Article in English | MEDLINE | ID: mdl-11159316

ABSTRACT

UNLABELLED: Identifying and masking repetitive elements is usually the first step when analyzing vertebrate genomic sequence. Current repeat identification software is sensitive but slow, creating a costly bottleneck in large-scale analyses. We have developed MaskerAid, a software enhancement to RepeatMasker that increased the speed of masking more than 30-fold at the most sensitive setting. AVAILABILITY: On request from the authors (see http://sapiens.wustl.edu/MaskerAid). CONTACT: maskeraid@watson.wustl.edu


Subject(s)
Repetitive Sequences, Nucleic Acid , Sequence Alignment/statistics & numerical data , Software , Animals , Computational Biology , Databases, Factual , Humans , Long Interspersed Nucleotide Elements , Short Interspersed Nucleotide Elements
6.
Bioinformatics ; 16(11): 1052-3, 2000 Nov.
Article in English | MEDLINE | ID: mdl-11159321

ABSTRACT

UNLABELLED: We have developed a program, MPBLAST, that increases the throughput of batch BLASTN searches by multiplexing (concatenating) query sequences and thereby reducing the number of actual database searches performed. Throughput was observed to increase in reciprocal proportion to the component sequence length. For sequencing read-sized queries of 500 bp, an order of magnitude speed-up was seen. AVAILABILITY: Free (see http://blast.wustl.edu) CONTACT: [ikorf, gish]@watson.wustl.edu


Subject(s)
Sequence Analysis/statistics & numerical data , Software , Computational Biology , Databases, Factual , Expressed Sequence Tags , Internet
7.
Nat Genet ; 23(4): 452-6, 1999 Dec.
Article in English | MEDLINE | ID: mdl-10581034

ABSTRACT

Single-nucleotide polymorphisms (SNPs) are the most abundant form of human genetic variation and a resource for mapping complex genetic traits. The large volume of data produced by high-throughput sequencing projects is a rich and largely untapped source of SNPs (refs 2, 3, 4, 5). We present here a unified approach to the discovery of variations in genetic sequence data of arbitrary DNA sources. We propose to use the rapidly emerging genomic sequence as a template on which to layer often unmapped, fragmentary sequence data and to use base quality values to discern true allelic variations from sequencing errors. By taking advantage of the genomic sequence we are able to use simpler yet more accurate methods for sequence organization: fragment clustering, paralogue identification and multiple alignment. We analyse these sequences with a novel, Bayesian inference engine, POLYBAYES, to calculate the probability that a given site is polymorphic. Rigorous treatment of base quality permits completely automated evaluation of the full length of all sequences, without limitations on alignment depth. We demonstrate this approach by accurate SNP predictions in human ESTs aligned to finished and working-draft quality genomic sequences, a data set representative of the typical challenges of sequence-based SNP discovery.


Subject(s)
Genetic Techniques , Polymorphism, Single Nucleotide , Algorithms , Alleles , Bayes Theorem , Data Interpretation, Statistical , Expressed Sequence Tags , Genetic Variation , Genome, Human , Humans , Sequence Alignment , Software
8.
Genome Res ; 6(9): 807-28, 1996 Sep.
Article in English | MEDLINE | ID: mdl-8889549

ABSTRACT

We report the generation of 319,311 single-pass sequencing reactions (known as expressed sequence tags, or ESTs) obtained from the 5' and 3' ends of 194,031 human cDNA clones. Our goal has been to obtain tag sequences from many different genes and to deposit these in the publicly accessible Data Base for Expressed Sequence Tags. Highly efficient automatic screening of the data allows deposition of the annotated sequences without delay. Sequences have been generated from 26 oligo(dT) primed directionally cloned libraries, of which 18 were normalized. The libraries were constructed using mRNA isolated from 17 different tissues representing three developmental states. Comparisons of a subset of our data with nonredundant human mRNA and protein data bases show that the ESTs represent many known sequences and contain many that are novel. Analysis of protein families using Hidden Markov Models confirms this observation and supports the contention that although normalization reduces significantly the relative abundance of redundant cDNA clones, it does not result in the complete removal of members of gene families.


Subject(s)
Gene Library , Genome, Human , Sequence Tagged Sites , Adult , Cloning, Molecular , DNA, Complementary , Databases, Factual , Female , Humans , Infant , Introns , Markov Chains , Molecular Sequence Data , Pregnancy , Proteins/genetics , RNA, Messenger/genetics
10.
Nat Genet ; 6(2): 119-29, 1994 Feb.
Article in English | MEDLINE | ID: mdl-8162065

ABSTRACT

Sequence similarity search programs are versatile tools for the molecular biologist, frequently able to identify possible DNA coding regions and to provide clues to gene and protein structure and function. While much attention had been paid to the precise algorithms these programs employ and to their relative speeds, there is a constellation of associated issues that are equally important to realize the full potential of these methods. Here, we consider a number of these issues, including the choice of scoring systems, the statistical significance of alignments, the masking of uninformative or potentially confounding sequence regions, the nature and extent of sequence redundancy in the databases and network access to similarity search services.


Subject(s)
Databases, Factual , Information Storage and Retrieval , Sequence Alignment , Sequence Homology , Algorithms , Amino Acid Sequence , Animals , Base Sequence , Humans , Molecular Sequence Data , Software
11.
J Comput Biol ; 1(1): 39-50, 1994.
Article in English | MEDLINE | ID: mdl-8790452

ABSTRACT

A computer program called BLASTX was previously shown to be effective in identifying and assigning putative function to likely protein coding regions by detecting significant similarity between a conceptually translated nucleotide query sequence and members of a protein sequence database. We present and assess the sensitivity of a new option to this software tool, herein called BLASTC, which employs information obtained from biases in codon utilization, along with the information obtained from sequence similarity. A rationale for combining these diverse information sources was derived, and analyses of the information available from codon utilization in several species were performed, with wide variation seen. Codon bias information was found on average to improve the sensitivity of detection of short coding regions of human origin by about a factor of 5. The implications of combining information sources on the interpretation of positive findings are discussed.


Subject(s)
Codon , Sequence Analysis, DNA/methods , Software , Algorithms , Amino Acid Sequence , Animals , Bacillus subtilis , Base Sequence , Databases, Factual , Drosophila melanogaster , Escherichia coli , Humans , Molecular Sequence Data , Saccharomyces cerevisiae , Schizosaccharomyces , Sequence Homology, Amino Acid , Sequence Homology, Nucleic Acid
12.
Nat Genet ; 3(3): 266-72, 1993 Mar.
Article in English | MEDLINE | ID: mdl-8485583

ABSTRACT

Sequence similarity between a translated nucleotide sequence and a known biological protein can provide strong evidence for the presence of a homologous coding region, even between distantly related genes. The computer program BLASTX performed conceptual translation of a nucleotide query sequence followed by a protein database search in one programmatic step. We characterized the sensitivity of BLASTX recognition to the presence of substitution, insertion and deletion errors in the query sequence and to sequence divergence. Reading frames were reliably identified in the presence of 1% query errors, a rate that is typical for primary sequence data. BLASTX is appropriate for use in moderate and large scale sequencing projects at the earliest opportunity, when the data are most prone to containing errors.


Subject(s)
Databases, Factual , Proteins/genetics , Algorithms , Amino Acid Sequence , Animals , Molecular Sequence Data , Mutation , Probability , Rats , Ribosomal Proteins/genetics , Sequence Homology, Amino Acid , Software
13.
J Mol Biol ; 215(3): 403-10, 1990 Oct 05.
Article in English | MEDLINE | ID: mdl-2231712

ABSTRACT

A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent mathematical results on the stochastic properties of MSP scores allow an analysis of the performance of this method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a number of ways and applied in a variety of contexts including straightforward DNA and protein sequence database searches, motif searches, gene identification searches, and in the analysis of multiple regions of similarity in long DNA sequences. In addition to its flexibility and tractability to mathematical analysis, BLAST is an order of magnitude faster than existing sequence comparison tools of comparable sensitivity.


Subject(s)
Base Sequence , Mutation , Software , Algorithms , Amino Acid Sequence , Databases, Factual , Sensitivity and Specificity , Sequence Homology, Nucleic Acid
14.
J Virol ; 61(9): 2864-76, 1987 Sep.
Article in English | MEDLINE | ID: mdl-3039174

ABSTRACT

Many types of human cells cultured in vitro are generally semipermissive for simian virus 40 (SV40) replication. Consequently, subpopulations of stably transformed human cells often carry free viral DNA, which is presumed to arise via spontaneous excision from an integrated DNA template. Stably transformed human cell lines that do not have detectable free DNA are therefore likely to harbor harbor mutant viral genomes incapable of excision and replication, or these cells may synthesize variant cellular proteins necessary for viral replication. We examined four such cell lines and conclude that for the three lines SV80, GM638, and GM639, the cells did indeed harbor spontaneous T-antigen mutants. For the SV80 line, marker rescue (determined by a plaque assay) and DNA sequence analysis of cloned DNA showed that a single point mutation converting serine 147 to asparagine was the cause of the mutation. Similarly, a point mutation converting leucine 457 to methionine for the GM638 mutant T allele was found. Moreover, the SV80 line maintained its permissivity for SV40 DNA replication but did not complement the SV40 tsA209 mutant at its nonpermissive temperature. The cloned SV80 T-antigen allele, though replication incompetent, maintained its ability to transform rodent cells at wild-type efficiencies. A compilation of spontaneously occurring SV40 mutations which cannot replicate but can transform shows that these mutations tend to cluster in two regions of the T-antigen gene, one ascribed to the site-specific DNA-binding ability of the protein, and the other to the ATPase activity which is linked to its helicase activity.


Subject(s)
Antigens, Viral, Tumor/genetics , Cell Transformation, Viral , DNA Replication , Oncogene Proteins, Viral/genetics , Simian virus 40/genetics , Virus Replication , Animals , Antigens, Polyomavirus Transforming , Base Sequence , Cell Fusion , Cell Line , Chromosome Mapping , Cloning, Molecular , DNA Transposable Elements , DNA, Viral/analysis , Humans , Mutation , Rats , Recombination, Genetic
15.
Curr Genet ; 7(2): 85-92, 1983 Apr.
Article in English | MEDLINE | ID: mdl-24173148

ABSTRACT

The RAD52 gene of Saccharomyces cerevisiae has previously been shown to be involved in both recombination and DNA repair. Here we report on the cloning of this gene. A plasmid containing a 5.9 kb yeast DNA fragment inserted into the BamH1 site of the YEp13 vector has been isolated and shown to complement the X-ray sensitive phenotype of the rad52-1 mutation. The rad52-1 cells containing the plasmid form larger colonies than similar cells having lost the plasmid. This plasmid has been shown not to complement either the U.V. sensitivity or the recombination defect of the E. coli recA mutation. From the insert various fragments have been subcloned into the YRp7 and YIp5 vectors. Integration events of two of the subclones have been genetically mapped to the chromosomal location of RAD52, indicating that the structural gene has been cloned. A 1.97 kb BamH1 fragment subcloned into YRp7 in one orientation complements the rad52-1 mutation, while the same fragment in the opposite orientation fails to complement. Various other subclones indicate that a BglII site, within the BamH1 fragment, is in the RAD52 gene. This BglII site has been deleted by Sl-nuclease digestion and the resulting deletion inactivates the RAD52 gene. BAL31 deletions from one end of a 1.9 kb Sal1-BamH1 fragment have been isolated; up to 0.9 kb can be deleted without loss of RAD52 activity, indicating that the RAD52 gene is approximately 1 kb or less in length.

SELECTION OF CITATIONS
SEARCH DETAIL