Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 3 de 3
Filter
Add more filters










Database
Language
Publication year range
1.
Comput Appl Biosci ; 6(2): 81-92, 1990 Apr.
Article in English | MEDLINE | ID: mdl-2193692

ABSTRACT

We have developed a method for identifying consensus patterns in a set of unaligned DNA sequences known to bind a common protein or to have some other common biochemical function. The method is based on a matrix representation of binding site patterns. Each row of the matrix represents one of the four possible bases, each column represents one of the positions of the binding site and each element is determined by the frequency the indicated base occurs at the indicated position. The goal of the method is to find the most significant matrix--i.e. the one with the lowest probability of occurring by chance--out of all the matrices that can be formed from the set of related sequences. The reliability of the method improves with the number of sequences, while the time required increases only linearly with the number of sequences. To test this method, we analysed 11 DNA sequences containing promoters regulated by the Escherichia coli LexA protein. The matrices we found were consistent with the known consensus sequence, and could distinguish the generally accepted LexA binding sites from other DNA sequences.


Subject(s)
Base Sequence , DNA , Pattern Recognition, Automated , Serine Endopeptidases , Software , Algorithms , Bacterial Proteins/genetics , Binding Sites , DNA, Bacterial/genetics , Escherichia coli/genetics , Genes, Bacterial , Molecular Sequence Data
3.
Proc Natl Acad Sci U S A ; 86(4): 1183-7, 1989 Feb.
Article in English | MEDLINE | ID: mdl-2919167

ABSTRACT

The ability to determine important features within DNA sequences from the sequences alone is becoming essential as large-scale sequencing projects are being undertaken. We present a method that can be applied to the problem of identifying the recognition pattern for a DNA-binding protein given only a collection of sequenced DNA fragments, each known to contain somewhere within it a binding site for that protein. Information about the position or orientation of the binding sites within those fragments is not needed. The method compares the "information content" of a large number of possible binding site alignments to arrive at a matrix representation of the binding site pattern. The specificity of the protein is represented as a matrix, rather than a consensus sequence, allowing patterns that are typical of regulatory protein-binding sites to be identified. The reliability of the method improves as the number of sequences increases, but the time required increases only linearly with the number of sequences. An example, using known cAMP receptor protein-binding sites, illustrates the method.


Subject(s)
Cyclic AMP Receptor Protein , DNA/metabolism , Models, Theoretical , Proteins/metabolism , Algorithms , Base Sequence , Binding Sites , Carrier Proteins/metabolism , Information Systems , Molecular Sequence Data , Neoplasm Proteins/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL
...