Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
Add more filters










Database
Language
Publication year range
1.
PLoS One ; 15(12): e0243332, 2020.
Article in English | MEDLINE | ID: mdl-33347457

ABSTRACT

Creating a complete picture of the regulation of transcription seems to be an urgent task of modern biology. Regulation of transcription is a complex process carried out by transcription factors (TFs) and auxiliary proteins. Over the past decade, ChIP-Seq has become the most common experimental technology studying genome-wide interactions between TFs and DNA. We assessed the transcriptional significance of cell line-specific features using regression analysis of ChIP-Seq datasets from the GTRD database and transcriptional start site (TSS) activities from the FANTOM5 expression atlas. For this purpose, we initially generated a large number of features that were defined as the presence or absence of TFs in different promoter regions around TSSs. Using feature selection and regression analysis, we identified sets of the most important TFs that affect expression activity of TSSs in human cell lines such as HepG2, K562 and HEK293. We demonstrated that some TFs can be classified as repressors and activators depending on their location relative to TSS.


Subject(s)
Databases, Nucleic Acid , Gene Expression Profiling , Transcription Factors , Transcriptome , HEK293 Cells , Hep G2 Cells , Humans , K562 Cells , Transcription Factors/classification , Transcription Factors/metabolism
2.
PLoS One ; 14(8): e0221760, 2019.
Article in English | MEDLINE | ID: mdl-31465497

ABSTRACT

Chromatin immunoprecipitation followed by sequencing, i.e. ChIP-Seq, is a widely used experimental technology for the identification of functional protein-DNA interactions. Nowadays, such databases as ENCODE, GTRD, ChIP-Atlas and ReMap systematically collect and annotate a large number of ChIP-Seq datasets. Comprehensive control of dataset quality is currently indispensable to select the most reliable data for further analysis. In addition to existing quality control metrics, we have developed two novel metrics that allow to control false positives and false negatives in ChIP-Seq datasets. For this purpose, we have adapted well-known population size estimate for determination of unknown number of genuine transcription factor binding regions. Determination of the proposed metrics was based on overlapping distinct binding sites derived from processing one ChIP-Seq experiment by different peak callers. Moreover, the metrics also can be useful for assessing quality of datasets obtained from processing distinct ChIP-Seq experiments by a given peak caller. We also have shown that these metrics appear to be useful not only for dataset selection but also for comparison of peak callers and identification of site motifs based on ChIP-Seq datasets. The developed algorithm for determination of the false positive control metric and false negative control metric for ChIP-Seq datasets was implemented as a plugin for a BioUML platform: https://ict.biouml.org/bioumlweb/chipseq_analysis.html.


Subject(s)
Chromatin Immunoprecipitation Sequencing , Databases, Nucleic Acid , Sequence Analysis, DNA , Algorithms , Area Under Curve , Binding Sites , Quality Control , ROC Curve , Transcription Factors/metabolism
3.
J Bioinform Comput Biol ; 16(2): 1840013, 2018 04.
Article in English | MEDLINE | ID: mdl-29739305

ABSTRACT

RNA plays an important role in the intracellular cell life and in the organism in general. Besides the well-established protein coding RNAs (messenger RNAs, mRNAs), long non-coding RNAs (lncRNAs) have gained the attention of recent researchers. Although lncRNAs have been classified as non-coding, some authors reported the presence of corresponding sequences in ribosome profiling data (Ribo-seq). Ribo-seq technology is a powerful experimental tool utilized to characterize RNA translation in cell with focus on initiation (harringtonine, lactimidomycin) and elongation (cycloheximide). By exploiting translation starts obtained from the Ribo-seq experiment, we developed a novel position weight matrix model for the prediction of translation starts. This model allowed us to achieve 96% accuracy of discrimination between human mRNAs and lncRNAs. When the same model was used for the prediction of putative ORFs in RNAs, we discovered that the majority of lncRNAs contained only small ORFs ([Formula: see text][Formula: see text]nt) in contrast to mRNAs.


Subject(s)
Computational Biology/methods , Proteins/genetics , RNA, Long Noncoding , 3' Untranslated Regions , 5' Untranslated Regions , Algorithms , Open Reading Frames , Protein Biosynthesis , RNA, Messenger/genetics , Ribosomes/genetics , Sequence Analysis, RNA
4.
J Bioinform Comput Biol ; 14(2): 1641006, 2016 04.
Article in English | MEDLINE | ID: mdl-27122318

ABSTRACT

Ribosome profiling technology (Ribo-Seq) allowed to highlight more details of mRNA translation in cell and get additional information on importance of mRNA sequence features for this process. Application of translation inhibitors like harringtonine and cycloheximide along with mRNA-Seq technique helped to assess such important characteristic as translation efficiency. We assessed the translational importance of features of mRNA sequences with the help of statistical analysis of Ribo-Seq and mRNA-Seq data. Translationally important features known from literature as well as proposed by the authors were used in analysis. Such comparisons as protein coding versus non-coding RNAs and high- versus low-translated mRNAs were performed. We revealed a set of features that allowed to discriminate the compared categories of RNA. Significant relationships between mRNA features and efficiency of translation were also established.


Subject(s)
Mammals/genetics , RNA, Messenger/genetics , Sequence Analysis, RNA/methods , 3' Untranslated Regions , 5' Untranslated Regions , Animals , Codon, Initiator , Humans , Mice , Protein Biosynthesis , Proteins/genetics , RNA, Long Noncoding/genetics , RNA, Long Noncoding/metabolism , Ribosomes/genetics
5.
In Silico Biol ; 8(5-6): 383-411, 2008.
Article in English | MEDLINE | ID: mdl-19374127

ABSTRACT

Albeit the great number of microarray data available on breast cancer, reliable identification of genes associated with breast cancer development remains a challenge. The aim of this work was to develop a novel method of meta-analysis for the identification of differentially expressed genes integrating results of several independent microarray experiments. We developed a statistical method for identification of up- and down-regulated genes to perform meta-analysis. The method takes advantage of hypergeometric and binomial distributions. Using our method we performed meta-analysis of five data sets from independent cDNA-microarray experiments on breast cancer. The meta-analysis revealed that 3.2% and 2.8% of the 24,726 analyzed genes are significantly (P-value < 0.01) down- and up-regulated, respectively. We also show that properly applied meta-analysis is a good tool for comparison of different breast cancer subtypes. Our meta-analysis showed that the expression of the majority of genes does not show significant differences in different subtypes of breast cancer. Here, we report the rationale, development and application of meta-analysis that enable us to identify biologically meaningful features of breast cancer. The algorithm we propose for the meta-analysis can reveal the features specific to the breast cancer subtypes and those common to breast cancer. The results allow us to revise the previously generated lists of genes associated with breast cancer and also identify most promising anticancer drug-target genes.


Subject(s)
Breast Neoplasms/genetics , Gene Expression Regulation, Neoplastic/genetics , Oligonucleotide Array Sequence Analysis , Algorithms , Breast Neoplasms/classification , Genetic Heterogeneity , Humans
6.
BMC Bioinformatics ; 8: 56, 2007 Feb 19.
Article in English | MEDLINE | ID: mdl-17309789

ABSTRACT

BACKGROUND: Computational analysis of gene regulatory regions is important for prediction of functions of many uncharacterized genes. With this in mind, search of the target genes for interferon (IFN) induction appears of interest. IFNs are multi-functional cytokines. Their effects are immunomodulatory, antiviral, antibacterial, and antitumor. The interaction of the IFNs with their cell surface receptors produces an activation of several transcription factors. Four regulatory factors, ISGF3, STAT1, IRF1, and NF-kappaB, are essential for the function of the IFN system. The aim of this work is the development of computational approaches for the recognition of DNA binding sites for these factors and computer programs for the prediction of the IFN-inducible regions. RESULTS: We developed computational approaches to the recognition of the binding sites for ISGF3, STAT1, IRF1, and NF-kappaB. Analysis of the distribution of these binding sites demonstrated that the regions -500 upstream of the transcription start site in IFN-inducible genes are enriched in putative binding sites for these transcription factors. Based on selected combinations of the sites whose frequencies were significantly higher than in the other functional gene groups, we developed methods for the prediction of the IFN-inducible promoters and enhancers. We analyzed 1004 sequences of the IFN-inducible genes compiled using microarray data analyses and also about 10,000 human gene sequences from the EPD and RefSeq databases; 74 of 1,664 human genes annotated in EPD were significantly IFN-inducible. CONCLUSION: Analyses of several control datasets demonstrated that the developed methods have a high accuracy of prediction of the IFN-inducible genes. Application of these methods to several datasets suggested that the number of the IFN-inducible genes is approximately 1500-2000 in the human genome.


Subject(s)
Chromosome Mapping/methods , Enhancer Elements, Genetic/genetics , Interferons/genetics , Regulatory Elements, Transcriptional/genetics , Sequence Analysis, DNA/methods , Transcription Factors/genetics , Transcriptional Activation/genetics , Base Sequence , Binding Sites , Molecular Sequence Data , Promoter Regions, Genetic , Protein Binding , Sequence Alignment
SELECTION OF CITATIONS
SEARCH DETAIL
...