Search | VHL Regional Portal

Transcription factor specificity limits the number of DNA-binding motifs.

Aptekmann, Ariel A; Bulavka, Denys; Nadra, Alejandro D; Sánchez, Ignacio E.

PLoS One ; 17(1): e0263307, 2022.

Article in English | MEDLINE | ID: mdl-35089985

ABSTRACT

We study the limits imposed by transcription factor specificity on the maximum number of binding motifs that can coexist in a gene regulatory network, using the SwissRegulon Fantom5 collection of 684 human transcription factor binding sites as a model. We describe transcription factor specificity using regular expressions and find that most human transcription factor binding site motifs are separated in sequence space by one to three motif-discriminating positions. We apply theorems based on the pigeonhole principle to calculate the maximum number of transcription factors that can coexist given this degree of specificity, which is in the order of ten thousand and would fully utilize the space of DNA subsequences. Taking into account an expanded DNA alphabet with modified bases can further raise this limit by several orders of magnitude, at a lower level of sequence space usage. Our results may guide the design of transcription factors at both the molecular and system scale.

Subject(s)

DNA/metabolism , Nucleotide Motifs/genetics , Transcription Factors/metabolism , Algorithms , Base Sequence , Binding Sites , Humans , Protein Binding

Thousands of protein linear motif classes may still be undiscovered.

Bulavka, Denys; Aptekmann, Ariel A; Méndez, Nicolás A; Krick, Teresa; Sánchez, Ignacio E.

PLoS One ; 16(5): e0248841, 2021.

Article in English | MEDLINE | ID: mdl-33939703

ABSTRACT

Linear motifs are short protein subsequences that mediate protein interactions. Hundreds of motif classes including thousands of motif instances are known. Our theory estimates how many motif classes remain undiscovered. As commonly done, we describe motif classes as regular expressions specifying motif length and the allowed amino acids at each motif position. We measure motif specificity for a pair of motif classes by quantifying how many motif-discriminating positions prevent a protein subsequence from matching the two classes at once. We derive theorems for the maximal number of motif classes that can simultaneously maintain a certain number of motif-discriminating positions between all pairs of classes in the motif universe, for a given amino acid alphabet. We also calculate the fraction of all protein subsequences that would belong to a motif class if all potential motif classes came into existence. Naturally occurring pairs of motif classes present most often a single motif-discriminating position. This mild specificity maximizes the potential number of coexisting motif classes, the expansion of the motif universe due to amino acid modifications and the fraction of amino acid sequences that code for a motif instance. As a result, thousands of linear motif classes may remain undiscovered.

Subject(s)

Amino Acid Motifs , Sequence Analysis, Protein/methods , Humans , Sensitivity and Specificity , Sequence Analysis, Protein/standards

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL