RESUMO
We developed an approach for effective estimating the correlations in the noise component of gene expression data. An efficent noise reduction technique has been suggested. The resulting technique has been applied to E. coli microarray data and tested on SOS response modulated genes.
Assuntos
Escherichia coli/fisiologia , Regulação Bacteriana da Expressão Gênica/fisiologia , Genes Bacterianos/fisiologia , Modelos Biológicos , Óperon/fisiologiaRESUMO
Nucleotide DNA sequences within the clusters of transcription starts, determined by the method of cap analysis of gene expression, have some distinguishing features. The sequences of these clusters are rich in nucleotides C and G, and there is an asymmetry of the nucleotide content, which correlates with the choice of chain from which the transcription in the cluster occurs. On the coding chain, the concentration of guanine exceeds the concentration of cytosine. In the nucleotide sequence of the cluster on the coding chain, the frequency of the polynucleotide tracts of the avoided nucleotide (cytosine), normalized to the frequency expected based on the content of this nucleotide in the cluster, is significantly higher compared with the normalized frequency of the polynucleotide tracts of the other nucleotide (guanine). Similarly, the statistical significance of the C-rich variant of the site of specific binding of the transcription initiation factor Sp1 in the coding chain is higher than that of the G-rich variant. However, the assumption can hardly be confirmed that the choice of the variant of the binding region of protein Sp1 correlates with the choice of the transcription chain. It is more likely that both variants are more or less equally probable, and the binding region of protein Sp1 acts in this case as a factor of selection, which counteracts the mutations inducing the shift of the nucleotide content.