Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 23
Filter
1.
Bioinformatics ; 16(9): 760-6, 2000 Sep.
Article in English | MEDLINE | ID: mdl-11108698

ABSTRACT

MOTIVATION: Database searching algorithms for proteins use scoring matrices based on average protein properties, and thus are dominated by globular proteins. However, since transmembrane regions of a protein are in a distinctly different environment than globular proteins, one would expect generalized substitution matrices to be inappropriate for transmembrane regions. RESULTS: We present the PHAT (predicted hydrophobic and transmembrane) matrix, which significantly outperforms generalized matrices and a previously published transmembrane matrix in searches with transmembrane queries. We conclude that a better matrix can be constructed by using background frequencies characteristic of the twilight zone, where low-scoring true positives have scores indistinguishable from high-scoring false positives, rather than the amino acid frequencies of the database. The PHAT matrix may help improve the accuracy of sequence alignments and evolutionary trees of membrane proteins.


Subject(s)
Algorithms , Computational Biology/methods , Membrane Proteins/genetics , Models, Theoretical , Sequence Alignment/methods , Amino Acid Sequence/genetics , Consensus Sequence/genetics , Databases, Factual , Predictive Value of Tests , Proteins/chemistry , Proteins/genetics , Reproducibility of Results , Sequence Homology, Amino Acid
2.
Electrophoresis ; 21(9): 1700-6, 2000 May.
Article in English | MEDLINE | ID: mdl-10870957

ABSTRACT

The most highly conserved regions of proteins can be represented as blocks of aligned sequence segments, typically with multiple blocks for a given protein family. The Blocks Database World Wide Web (http://blocks.fhcrc.org) and e-mail (blocks@blocks. fhcrc.org) servers provide tools to search DNA and protein queries against the Blocks+ Database of multiple alignments. We describe features for detection of distant relationships using blocks. Blocks+ includes protein families from the PROSITE, Prints, Pfam-A, ProDom and Domo databases. Other features include searching Blocks+ with the BLIMPS and NCBI's IMPALA programs, sequence logos, phylogenetic trees, three-dimensional display of blocks on PDB structures, and a polymerase chain reaction (PCR) primer design strategy based on blocks.


Subject(s)
Databases, Factual , Proteins/analysis , Sequence Homology, Amino Acid , Amino Acid Sequence , Animals , DNA Primers , Humans , Molecular Sequence Data , Polymerase Chain Reaction/methods , Sequence Analysis, Protein
4.
Genome Res ; 10(4): 543-6, 2000 Apr.
Article in English | MEDLINE | ID: mdl-10779495

ABSTRACT

A simple and general homology-based method for gene finding was applied to the 2.9-Mb Drosophila melanogaster Adh region, the target sequence of the Genome Annotation Assessment Project (GASP). Each strand of the entire sequence was used as query of the BLOCKS+ database of conserved regions of proteins. This led to functional assignments for more than one-third of the genes and two-thirds of the transposons. Considering the enormous size of the query, the fact that only two false-positive matches were reported emphasizes the high selectivity of protein family-based methods for gene finding. We used the search results to improve BLOCKS+ by identifying compositionally biased blocks. Our results confirm that protein family databases can be used effectively in automated sequence annotation efforts.


Subject(s)
Databases, Factual , Drosophila melanogaster/genetics , Genome , Software , Alcohol Dehydrogenase/genetics , Animals , Computational Biology , Drosophila melanogaster/enzymology , Genes, Insect/genetics , Sequence Homology, Nucleic Acid
5.
Nucleic Acids Res ; 28(1): 228-30, 2000 Jan 01.
Article in English | MEDLINE | ID: mdl-10592233

ABSTRACT

The Blocks Database WWW (http://blocks.fhcrc.org ) and Email (blocks@blocks.fhcrc.org ) servers provide tools to search DNA and protein queries against the Blocks+ Database of multiple alignments, which represent conserved protein regions. Blocks+ nearly doubles the number of protein families included in the database by adding families from the Pfam-A, ProDom and Domo databases to those from PROSITE and PRINTS. Other new features include improved Block Searcher statistics, searching with NCBI's IMPALA program and 3D display of blocks on PDB structures.


Subject(s)
Databases, Factual , Proteins/chemistry , Amino Acid Sequence , Information Storage and Retrieval , Internet , Molecular Sequence Data , Sequence Homology, Amino Acid
6.
Bioinformatics ; 15(6): 471-9, 1999 Jun.
Article in English | MEDLINE | ID: mdl-10383472

ABSTRACT

MOTIVATION: As databanks grow, sequence classification and prediction of function by searching protein family databases becomes increasingly valuable. The original Blocks Database, which contains ungapped multiple alignments for families documented in Prosite, can be searched to classify new sequences. However, Prosite is incomplete, and families from other databases are now available to expand coverage of the Blocks Database. RESULTS: To take advantage of protein family information present in several existing compilations, we have used five databases to construct Blocks+, a unified database that is built on the PROTOMAT/BLOSUM scoring model and that can be searched using a single algorithm for consistent sequence classification. The LAMA blocks-versus-blocks searching program identifies overlapping protein families, making possible a non-redundant hierarchical compilation. Blocks+ consists of all blocks derived from PROSITE, blocks from Prints not present in PROSITE, blocks from Pfam-A not present in PROSITE or Prints, and so on for ProDom and Domo, for a total of 1995 protein families represented by 8909 blocks, doubling the coverage of the original Blocks Database. A challenge for any procedure aimed at non-redundancy is to retain related but distinct families while discarding those that are duplicates. We illustrate how using multiple compilations can minimize this potential problem by examining the SNF2 family of ATPases, which is detectably similar to distinct families of helicases and ATPases. AVAILABILITY: http://blocks.fhcrc.org/


Subject(s)
Databases, Factual , Nuclear Proteins , Proteins/chemistry , Sequence Alignment/methods , Adenosine Triphosphatases/chemistry , Adenosine Triphosphatases/genetics , Algorithms , Amino Acid Sequence , Animals , Computational Biology , DNA Helicases , DNA-Binding Proteins/chemistry , DNA-Binding Proteins/genetics , Humans , Molecular Sequence Data , Sequence Alignment/statistics & numerical data , Sequence Homology, Amino Acid , Software , Software Design , Transcription Factors/chemistry , Transcription Factors/genetics
7.
Nucleic Acids Res ; 27(1): 226-8, 1999 Jan 01.
Article in English | MEDLINE | ID: mdl-9847186

ABSTRACT

Blocks are ungapped multiple sequence alignments representing conserved protein regions, and the Blocks Database consists of blocks from documented protein families. World Wide Web (http://www. blocks.fhcrc.org) and Email (blocks@blocks.fhcrc.org) servers provide tools for homology searching and for analyzing protein family relationships. New enhancements include a multiple alignment processor that extends the use of these tools to imported multiple alignments of families not present in the database and a PCR primer designer that implements a new strategy for gene isolation.


Subject(s)
Databases, Factual , Proteins/classification , Sequence Alignment , Software , DNA Primers/genetics , Information Storage and Retrieval , Internet , Polymerase Chain Reaction/methods , Proteins/chemistry , Proteins/genetics , Sequence Homology, Amino Acid
8.
Nucleic Acids Res ; 26(7): 1628-35, 1998 Apr 01.
Article in English | MEDLINE | ID: mdl-9512532

ABSTRACT

We describe a new primer design strategy for PCR amplification of unknown targets that are related to multiply-aligned protein sequences. Each primer consists of a short 3' degenerate core region and a longer 5' consensus clamp region. Only 3-4 highly conserved amino acid residues are necessary for design of the core, which is stabilized by the clamp during annealing to template molecules. During later rounds of amplification, the non-degenerate clamp permits stable annealing to product molecules. We demonstrate the practical utility of this hybrid primer method by detection of diverse reverse transcriptase-like genes in a human genome, and by detection of C5DNA methyltransferase homologs in various plant DNAs. In each case, amplified products were sufficiently pure to be cloned without gel fractionation. This COnsensus-DEgenerate Hybrid Oligonucleotide Primer (CODEHOP) strategy has been implemented as a computer program that is accessible over the World Wide Web (http://blocks.fhcrc.org/codehop.html) and is directly linked from the BlockMaker multiple sequence alignment site for hybrid primer prediction beginning with a set of related protein sequences.


Subject(s)
DNA Modification Methylases/chemistry , DNA Primers , Evolution, Molecular , Phylogeny , RNA-Directed DNA Polymerase/chemistry , Amino Acid Sequence , Animals , Arthritis, Rheumatoid/genetics , Base Sequence , Codon , Computer Communication Networks , Consensus Sequence , Conserved Sequence , DNA Modification Methylases/genetics , Humans , Molecular Sequence Data , Nucleic Acid Hybridization , Polymerase Chain Reaction/methods , RNA-Directed DNA Polymerase/genetics , Sarcoma, Kaposi/genetics , Sequence Alignment , Sequence Homology, Amino Acid , Sequence Homology, Nucleic Acid , Software
10.
Nucleic Acids Res ; 26(1): 309-12, 1998 Jan 01.
Article in English | MEDLINE | ID: mdl-9399861

ABSTRACT

The Blocks Database World Wide Web (http://www.blocks.fhcrc.org ) and Email (blocks@blocks.fhcrc.org) servers provide tools for the detection and analysis of protein homology based on alignment blocks representing conserved regions of proteins. During the past year, searching has been augmented by supplementation of the Blocks Database with blocks from the Prints Database, for a total of 4754 blocks from 1163 families. Blocks from both the Blocks and Prints Databases and blocks that are constructed from sequences submitted to Block Maker can be used for blocks-versus-blocks searching of these databases with LAMA, and for viewing logos and bootstrap trees. Sensitive searches of up-to-date protein sequence databanks are carried out via direct links to the MAST server using position-specific scoring matrices and to the BLAST and PSI-BLAST servers using consensus-embedded sequence queries. Utilizing the trypsin family to evaluate performance, we illustrate the superiority of blocks-based tools over expert pairwise searching or Hidden Markov Models.


Subject(s)
Computer Communication Networks , Databases, Factual , Proteins/chemistry , Sequence Homology, Amino Acid , Amino Acid Sequence , Animals , Conserved Sequence , Humans
11.
Protein Sci ; 6(3): 698-705, 1997 Mar.
Article in English | MEDLINE | ID: mdl-9070452

ABSTRACT

We describe a new strategy for utilizing multiple sequence alignment information to detect distant relationships in searches of sequence databases. A single sequence representing a protein family is enriched by replacing conserved regions with position-specific scoring matrices (PSSMs) or consensus residues derived from multiple alignments of family members. In comprehensive tests of these and other family representations, PSSM-embedded queries produced the best results overall when used with a special version of the Smith-Waterman searching algorithm. Moreover, embedding consensus residues instead of PSSMs improved performance with readily available single sequence query searching programs, such as BLAST and FASTA. Embedding PSSMs or consensus residues into a representative sequence improves searching performance by extracting multiple alignment information from motif regions while retaining single sequence information where alignment is uncertain.


Subject(s)
Proteins/chemistry , Sequence Alignment , Algorithms , Amino Acid Sequence , Consensus Sequence , Evaluation Studies as Topic , Molecular Sequence Data
12.
Nucleic Acids Res ; 25(1): 222-5, 1997 Jan 01.
Article in English | MEDLINE | ID: mdl-9016540

ABSTRACT

The Blocks Database contains multiple alignments of conserved regions in protein families which can be searched by e-mail (blocks@blocks.fhcrc.org) and World Wide Web (http://blocks.fhcrc.org/ ) servers to classify protein and nucleotide sequences. Recent enhancements to the servers include: (i) improved calculation of position-specific scoring matrices from blocks; (ii) availability of the Prints protein fingerprint database for searching in Blocks format; (iii) a representative sequence biased towards the Blocks of a protein family; (iv) a tree constructed from the Blocks of a protein family; (v) links to related World Wide Web pages for a family; and (vi) the new Local Alignment of Multiple Alignments (LAMA) method to search a block against a database of blocks.


Subject(s)
Databases, Factual , Proteins/genetics , Sequence Alignment/methods , Amino Acid Sequence , Animals , Base Sequence , Computer Communication Networks , Humans , Molecular Sequence Data
13.
Ann Intern Med ; 124(11): 970-9, 1996 Jun 01.
Article in English | MEDLINE | ID: mdl-8624064

ABSTRACT

OBJECTIVE: To determine whether increasing age is associated with an increased risk for bleeding during warfarin treatment. DESIGN: Combined retrospective and prospective cohort studies. SETTING: 6 anticoagulation clinics. PATIENTS: 2376 patients receiving warfarin for various indications. MEASUREMENTS: Bleeding events categorized as minor (resulting in no costs or consequences), serious (requiring testing or treatment), life-threatening, or fatal. RESULTS: 812 first bleeding events (4 fatal, 33 life-threatening, 222 serious, and 553 minor) occurred during 3702 patient-years. Age was inversely related to the mean warfarin dose and dose-adjusted prothrombin time ratio. The unadjusted incidence of minor bleeding complications did not vary according to age group: 18.0 per 100 patient-years for patients younger than 50 years of age, 21.5 for patients 50 to 59 years of age, 24.0 for patients 60 to 69 years of age; 23.5 for patients 70 to 79 years of age, and 16.3 for patient 80 years of age and older. The unadjusted incidence of serious bleeding complications also did not vary according to age group: 9.3 per 100 patient-years for patients younger than 50 years of age, 7.1 for patients 50 to 59 years of age, 6.6 for patients 60 to 69 years of age, 5.1 for patients 70 to 79 years of age, and 4.4 for patients 80 years of age and older. The unadjusted incidence of life-threatening or fatal complications combined was significantly higher among the oldest patients: 0.75 per 100 patient-years for patients younger than 50 years of age, 0.97 for patients 50 to 59 years of age, 1.10 for patients 60 to 69 years of age, 0.68 for patients 70 to 79 years of age, and 3.38 for patients 80 years of age and older. Patients 80 years of age and older had a relative risk of 4.5 (95% CI, 1.3 to 15.6) compared with patients younger than 50 years of age. After adjustment for the intensity of anticoagulation therapy and the deviation in the prothrombin time ratio using Cox and Poisson regression, age was not generally associated with the occurrence of bleeding; relative risk estimates ranged from 0.99 to 1.03 per year of age (lower-bound 95% CI, 0.97 to 1.01; upper-bound 95% CI, 1.00 to 1.09). The single exception was life-threatening and fatal complications in patients 80 years of age or older (relative risk, 4.6 [CI, 1.2 to 18.1]). CONCLUSIONS: Age did not appear to be an important determinant of risk for bleeding in patients receiving warfarin, with the possible exception of age 80 years or older. The intensity of anticoagulation therapy and the deviation in the prothrombin time ratio were much stronger predictors of risk for bleeding.


Subject(s)
Anticoagulants/adverse effects , Hemorrhage/chemically induced , Warfarin/adverse effects , Age Factors , Aged , Aged, 80 and over , Anticoagulants/administration & dosage , Female , Humans , Male , Middle Aged , Poisson Distribution , Prospective Studies , Prothrombin Time , Regression Analysis , Retrospective Studies , Risk Factors , Warfarin/administration & dosage
14.
Comput Appl Biosci ; 12(2): 135-43, 1996 Apr.
Article in English | MEDLINE | ID: mdl-8744776

ABSTRACT

Each column of amino acids in a multiple alignment of protein sequences can be represented as a vector of 20 amino acid counts. For alignment and searching applications, the count vector is an imperfect representation of a position, because the observed sequences are an incomplete sample of the full set of related sequences. One general solution to this problem is to model unobserved sequences by adding artificial 'pseudo-counts' to the observed counts. We introduce a simple method for computing pseudo-counts that combines the diversity observed in each alignment position with amino acid substitution probabilities. In extensive empirical tests, this position-based method out-performed other pseudo-count methods and was a substantial improvement over the traditional average score method used for constructing profiles.


Subject(s)
Sequence Alignment/methods , Amino Acid Sequence , Computers , Databases, Factual , Evaluation Studies as Topic , Odds Ratio , Probability , Proteins/chemistry , Proteins/genetics , Sequence Alignment/statistics & numerical data
15.
Methods Enzymol ; 266: 88-105, 1996.
Article in English | MEDLINE | ID: mdl-8743679

ABSTRACT

Protein blocks consist of multiply aligned sequence segments without gaps that represent the most highly conserved regions of protein families. A database of blocks has been constructed by successive application of the fully automated PROTOMAT system to lists of protein family members obtained from Prosite documentation. Currently, Blocks 8.0 based on protein families documented in Prosite 12 consists of 2884 blocks representing 770 families. Searches of the Blocks Database are carried out using protein or DNA sequence queries, and results are returned with measures of significance for both single and multiple block hits. The databse has also proved useful for derivation of amino acid substitution matrices (the Blosum series) and other sets of parameters. WWW and E-mail servers provide access to the database and associated functions, including a block maker for sequences provided by the user.


Subject(s)
Amino Acid Sequence , Base Sequence , DNA/chemistry , Databases, Factual , Oxidoreductases , Proteins/chemistry , Sequence Homology, Amino Acid , Computer Communication Networks , Conserved Sequence , Glutaredoxins , Molecular Sequence Data , Proteins/genetics , Saccharomyces cerevisiae , Software
16.
Nucleic Acids Res ; 24(1): 197-200, 1996 Jan 01.
Article in English | MEDLINE | ID: mdl-8594578

ABSTRACT

The Blocks Database contains multiple alignments of conserved regions in protein families. The database can be searched by e-mail and World Wide Web(WWW) servers (http://blocks.fhcrc.org/help) to classify protein and nucleotide sequences.


Subject(s)
Databases, Factual , Proteins/chemistry , Amino Acid Sequence , Computer Communication Networks , Information Storage and Retrieval , Molecular Sequence Data , Proteins/genetics , Sequence Homology, Amino Acid
17.
Gene ; 163(2): GC17-26, 1995 Oct 03.
Article in English | MEDLINE | ID: mdl-7590261

ABSTRACT

Protein blocks consist of multiply aligned sequence segments that correspond to the most highly conserved regions of protein families. Typically, a set of related proteins has more than one region in common and their relationship can be represented as a series of ungapped blocks separated by unaligned regions. Blockmaker is an automated system available by electronic mail (blockmaker@howard.fhcrc.org) and the World Wide Web (http://www.blocks.fhcrc.org4) that finds blocks in a group of related protein sequences submitted by the user. It adapts and extends existing algorithms to make them useful to biologists looking for conserved regions in a group of related proteins sequences. Two sets of blocks are returned, one in which candidate blocks are detected using the MOTIF algorithm and the other using a Gibbs sampler algorithm that has been adapted for full automation. This use of two block-finding methods based on completely different principles provides a 'reality check,' whereby a block detected by both methods is considered to be correct. Resulting blocks can be displayed using the information-based 'sequence logo' method, adapted to incorporate sequence weights, which provides an intuitive visual description of both the residue and the conservation information at each position. Blocks generated by this system are useful in diverse applications, such as searching databases and designing degenerate PCR primers. As an example, blocks made from amino acid sequences related to Caenorhabditis elegans Tc1 transposase were used to search GenBank, revealing that several fish and amphibian genomic sequences harbor previously unreported Tc1 homologs.


Subject(s)
Amino Acid Sequence , Computer Graphics , Databases, Factual , Software Design , Transposases , Algorithms , Animals , Caenorhabditis elegans/enzymology , DNA-Binding Proteins/chemistry , Information Storage and Retrieval , Molecular Sequence Data , Nucleotidyltransferases/chemistry , Proteins/chemistry , Sequence Alignment
18.
J Mol Biol ; 243(4): 574-8, 1994 Nov 04.
Article in English | MEDLINE | ID: mdl-7966282

ABSTRACT

Sequence weighting methods have been used to reduce redundancy and emphasize diversity in multiple sequence alignment and searching applications. Each of these methods is based on a notion of distance between a sequence and an ancestral or generalized sequence. We describe a different approach, which bases weights on the diversity observed at each position in the alignment, rather than on a sequence distance measure. These position-based weights make minimal assumptions, are simple to compute, and perform well in comprehensive evaluations.


Subject(s)
Sequence Alignment , Amino Acid Sequence , Base Sequence , Computer Simulation , Databases, Factual , Genetic Variation , Molecular Sequence Data
19.
J Gen Intern Med ; 9(3): 131-9, 1994 Mar.
Article in English | MEDLINE | ID: mdl-8195911

ABSTRACT

OBJECTIVE: To evaluate a computerized scheduling model that employs nonlinear optimization to recommend optimal follow-up intervals for patients taking warfarin. DESIGN: Randomized trial. SETTING: 5 anticoagulation clinics. PATIENTS/PARTICIPANTS: 620 patients expected to receive warfarin for > or = 6 weeks. INTERVENTIONS: Computer-generated recommendations for scheduling the next visit were presented to or withheld from practitioners. MEASUREMENTS AND MAIN RESULTS: The main outcome measures were the follow-up interval scheduled by the provider, the interval at which the patient actually returned to clinic, and the quality of anticoagulation control (computed as the absolute difference between the measured and target prothrombin times [PTRs] or international normalized ratios [INRs]). Follow-up intervals scheduled for the patients whose practitioners received computer-generated recommendations were significantly longer than those for control patients (mean, 4.4 vs 3.5 weeks, p < 0.001), despite the fact that the practitioners modified the suggested return interval by > 1 week on 40% of the visits. The interval at which the intervention group actually returned to clinic was also longer (mean, 4.4 vs 4.1 weeks, p < 0.05), even though the control patients tended to return at longer intervals than were scheduled by their practitioners. Control of anticoagulation was nearly the same among experimental and control patients. Life-threatening complications occurred in the care of three experimental patients and one control patient, while other serious complications occurred in the care of 16 experimental patients and 17 control patients. CONCLUSIONS: Recommendations based on nonlinear optimization prompted clinicians to schedule less frequent follow-up for patients taking warfarin, with no deterioration in anticoagulation control. This approach to scheduling can potentially reduce utilization while maintaining quality of care for patients who require long-term monitoring.


Subject(s)
Appointments and Schedules , Drug Therapy, Computer-Assisted , Monitoring, Physiologic/methods , Warfarin/therapeutic use , Continuity of Patient Care/standards , Female , Follow-Up Studies , Humans , Male , Middle Aged , Prothrombin Time
20.
Genomics ; 19(1): 97-107, 1994 Jan 01.
Article in English | MEDLINE | ID: mdl-8188249

ABSTRACT

The most highly conserved regions of proteins can be represented as "blocks" of locally aligned sequence segments. Previously, an automated system was introduced to generate a database of blocks that is searched for local similarities using a sequence query. Here, we describe a method for searching this database that can also reveal significant global similarities. Local and global alignments are scored independently, so they can be used in concert to infer homology. A set of 7082 diverse sequences not represented in the database provided queries for testing this approach. The resulting distributions of scores led to guidelines for interpretation of search data and to the classification of 289 uncatalogued sequences into known groups. Thirty-eight of these relationships appear to be new discoveries. We also show how searching a database of blocks can be used to detect repeated domains and to find distinct cross-family relationships that were missed in searches of sequence databases.


Subject(s)
Databases, Factual , Proteins/classification , Sequence Alignment , Sequence Homology, Amino Acid , Animals , DNA Helicases/chemistry , Mammals/genetics , Proteins/chemistry , Repetitive Sequences, Nucleic Acid , Saccharomyces cerevisiae/genetics , Software
SELECTION OF CITATIONS
SEARCH DETAIL
...