Search | VHL Regional Portal

Identification of consensus patterns in unaligned DNA sequences known to be functionally related.

Hertz, G Z; Hartzell, G W; Stormo, G D.

Comput Appl Biosci ; 6(2): 81-92, 1990 Apr.

Article in English | MEDLINE | ID: mdl-2193692

ABSTRACT

We have developed a method for identifying consensus patterns in a set of unaligned DNA sequences known to bind a common protein or to have some other common biochemical function. The method is based on a matrix representation of binding site patterns. Each row of the matrix represents one of the four possible bases, each column represents one of the positions of the binding site and each element is determined by the frequency the indicated base occurs at the indicated position. The goal of the method is to find the most significant matrix--i.e. the one with the lowest probability of occurring by chance--out of all the matrices that can be formed from the set of related sequences. The reliability of the method improves with the number of sequences, while the time required increases only linearly with the number of sequences. To test this method, we analysed 11 DNA sequences containing promoters regulated by the Escherichia coli LexA protein. The matrices we found were consistent with the known consensus sequence, and could distinguish the generally accepted LexA binding sites from other DNA sequences.

Subject(s)

Base Sequence , DNA , Pattern Recognition, Automated , Serine Endopeptidases , Software , Algorithms , Bacterial Proteins/genetics , Binding Sites , DNA, Bacterial/genetics , Escherichia coli/genetics , Genes, Bacterial , Molecular Sequence Data

The structure and function of the homeodomain.

Scott, M P; Tamkun, J W; Hartzell, G W.

Biochim Biophys Acta ; 989(1): 25-48, 1989 Jul 28.

Article in English | MEDLINE | ID: mdl-2568852

Subject(s)

Genes, Homeobox , Amino Acid Sequence , Animals , Base Sequence , Binding Sites , Biological Evolution , DNA/metabolism , Humans , Molecular Sequence Data , Repressor Proteins , Transcription, Genetic

Identifying protein-binding sites from unaligned DNA fragments.

Stormo, G D; Hartzell, G W.

Proc Natl Acad Sci U S A ; 86(4): 1183-7, 1989 Feb.

Article in English | MEDLINE | ID: mdl-2919167

ABSTRACT

The ability to determine important features within DNA sequences from the sequences alone is becoming essential as large-scale sequencing projects are being undertaken. We present a method that can be applied to the problem of identifying the recognition pattern for a DNA-binding protein given only a collection of sequenced DNA fragments, each known to contain somewhere within it a binding site for that protein. Information about the position or orientation of the binding sites within those fragments is not needed. The method compares the "information content" of a large number of possible binding site alignments to arrive at a matrix representation of the binding site pattern. The specificity of the protein is represented as a matrix, rather than a consensus sequence, allowing patterns that are typical of regulatory protein-binding sites to be identified. The reliability of the method improves as the number of sequences increases, but the time required increases only linearly with the number of sequences. An example, using known cAMP receptor protein-binding sites, illustrates the method.

Subject(s)

Cyclic AMP Receptor Protein , DNA/metabolism , Models, Theoretical , Proteins/metabolism , Algorithms , Base Sequence , Binding Sites , Carrier Proteins/metabolism , Information Systems , Molecular Sequence Data , Neoplasm Proteins/metabolism

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL