Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
Add more filters










Database
Language
Publication year range
1.
Conf Proc IEEE Eng Med Biol Soc ; 2004: 3064-7, 2004.
Article in English | MEDLINE | ID: mdl-17270925

ABSTRACT

We present and evaluate a publicly available Web server which classifies protein sequences into SCOP 1.63 PDB95 structural superfamilies. The Website returns ranked lists of likely superfamilies and hence implicit structural predictions according to three computational techniques: BLAST, HMMER and a discriminative classifier SVM-BLOCKS. It is the first Website to provide predictions using SVM-BLOCKS. In addition to the ranked lists, the Website displays alignment information and a Web services interface is also available for computationally intensive use. We conduct a large-scale evaluation which mimics the predictions returned by the Website. The study indicates that the site provides valid predictions and that SVM-BLOCKS approach can outperform BLAST and HMMER when sufficient examples are available to learn the SVM classifiers.

2.
Bioinformatics ; 16(2): 152-8, 2000 Feb.
Article in English | MEDLINE | ID: mdl-10842737

ABSTRACT

MOTIVATION: The main goal in this paper is to develop accurate probabilistic models for important functional regions in DNA sequences (e.g. splice junctions that signal the beginning and end of transcription in human DNA). These methods can subsequently be utilized to improve the performance of gene-finding systems. The models built here attempt to model long-distance dependencies between non-adjacent bases. RESULTS: An efficient modeling method is described which models biological data more accurately than a first-order Markov model without increasing the number of parameters. Intuitively, a small number of parameters helps a learning system to avoid overfitting. Several experiments with the model are presented, which show a small improvement in the average accuracy as compared with a simple Markov model. These experiments suggest that single long distance dependencies do not help the recognition problem, thus confirming several previous studies which have used more heuristic modeling techniques. AVAILABILITY: This software is available for downloaded and as a web resource at http://www.ai.uic.edu/software CONTACT: kasif@eecs.uic.edu


Subject(s)
Computer Simulation , DNA/analysis , Models, Statistical , Neural Networks, Computer , RNA Splicing , Bayes Theorem , Humans , Software
3.
Nucleic Acids Res ; 27(23): 4636-41, 1999 Dec 01.
Article in English | MEDLINE | ID: mdl-10556321

ABSTRACT

The GLIMMER system for microbial gene identification finds approximately 97-98% of all genes in a genome when compared with published annotation. This paper reports on two new results: (i) significant technical improvements to GLIMMER that improve its accuracy still further, and (ii) a comprehensive evaluation that demonstrates that the accuracy of the system is likely to be higher than previously recognized. A significant proportion of the genes missed by the system appear to be hypothetical proteins whose existence is only supported by the predictions of other programs. When the analysis is restricted to genes that have significant homology to genes in other organisms, GLIMMER misses <1% of known genes.


Subject(s)
Genes, Bacterial , Genetic Techniques/standards , Algorithms , Markov Chains , Models, Genetic
4.
Nucleic Acids Res ; 27(11): 2369-76, 1999 Jun 01.
Article in English | MEDLINE | ID: mdl-10325427

ABSTRACT

A new system for aligning whole genome sequences is described. Using an efficient data structure called a suffix tree, the system is able to rapidly align sequences containing millions of nucleotides. Its use is demonstrated on two strains of Mycoplasma tuberculosis, on two less similar species of Mycoplasma bacteria and on two syntenic sequences from human chromosome 12 and mouse chromosome 6. In each case it found an alignment of the input sequences, using between 30 s and 2 min of computation time. From the system output, information on single nucleotide changes, translocations and homologous genes can easily be extracted. Use of the algorithm should facilitate analysis of syntenic chromosomal regions, strain-to-strain comparisons, evolutionary comparisons and genomic duplications.


Subject(s)
Algorithms , Genome, Bacterial , Mycoplasma/genetics , Sequence Alignment/methods , Animals , Base Sequence , DNA , Humans , Mice , Molecular Sequence Data
5.
Genomics ; 62(3): 500-7, 1999 Dec 15.
Article in English | MEDLINE | ID: mdl-10644449

ABSTRACT

A new method has been developed for rapidly closing a large number of gaps in a whole-genome shotgun sequencing project. The method employs multiplex PCR and a novel pooling strategy to minimize the number of laboratory procedures required to sequence the unknown DNA that falls in between contiguous sequences. Multiplex sequencing, a novel procedure in which multiple PCR primers are used in a single sequencing reaction, is used to interpret the multiplex PCR results. Two protocols are presented, one that minimizes pipetting and another that minimizes the number of reactions. The pipette optimized multiplex PCR method has been employed in the final phases of closing the Streptococcus pneumoniae genome sequence, with excellent results.


Subject(s)
Combinatorial Chemistry Techniques/methods , Genome, Bacterial , Polymerase Chain Reaction/methods , Sequence Analysis, DNA/methods , Algorithms , Evaluation Studies as Topic , Streptococcus pneumoniae/genetics
6.
Nucleic Acids Res ; 26(2): 544-8, 1998 Jan 15.
Article in English | MEDLINE | ID: mdl-9421513

ABSTRACT

This paper describes a new system, GLIMMER, for finding genes in microbial genomes. In a series of tests on Haemophilus influenzae , Helicobacter pylori and other complete microbial genomes, this system has proven to be very accurate at locating virtually all the genes in these sequences, outperforming previous methods. A conservative estimate based on experiments on H.pylori and H. influenzae is that the system finds >97% of all genes. GLIMMER uses interpolated Markov models (IMMs) as a framework for capturing dependencies between nearby nucleotides in a DNA sequence. An IMM-based method makes predictions based on a variable context; i.e., a variable-length oligomer in a DNA sequence. The context used by GLIMMER changes depending on the local composition of the sequence. As a result, GLIMMER is more flexible and more powerful than fixed-order Markov methods, which have previously been the primary content-based technique for finding genes in microbial DNA.


Subject(s)
DNA, Bacterial/analysis , Markov Chains , Algorithms , Base Sequence , DNA, Bacterial/chemistry , Haemophilus influenzae/genetics , Helicobacter pylori/genetics , Open Reading Frames , Sensitivity and Specificity , Sequence Alignment , Software
7.
Article in English | MEDLINE | ID: mdl-7584325

ABSTRACT

In this paper we study the performance of probabilistic networks in the context of protein sequence analysis in molecular biology. Specifically, we report the results of our initial experiments applying this framework to the problem of protein secondary structure prediction. One of the main advantages of the probabilistic approach we describe here is our ability to perform detailed experiments where we can experiment with different models. We can easily perform local substitutions (mutations) and measure (probabilistically) their effect on the global structure. Window-based methods do not support such experimentation as readily. Our method is efficient both during training and during prediction, which is important in order to be able to perform many experiments with different networks. We believe that probabilistic methods are comparable to other methods in prediction quality. In addition, the predictions generated by our methods have precise quantitative semantics which is not shared by other classification methods. Specifically, all the causal and statistical independence assumptions are made explicit in our networks thereby allowing biologists to study and experiment with different causal models in a convenient manner.


Subject(s)
Models, Molecular , Protein Structure, Secondary , Algorithms , Bayes Theorem , Decision Trees , Markov Chains , Models, Genetic , Mutation , Neural Networks, Computer , Reproducibility of Results
SELECTION OF CITATIONS
SEARCH DETAIL
...