Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 22
Filter
Add more filters











Publication year range
1.
BMC Genomics ; 25(Suppl 3): 834, 2024 Sep 05.
Article in English | MEDLINE | ID: mdl-39237856

ABSTRACT

BACKGROUND: Novel protein-coding genes were considered to be born by re-organization of pre-existing genes, such as gene duplication and gene fusion. However, recent progress of genome research revealed that more protein-coding genes than expected were born de novo, that is, gene origination by accumulating mutations in non-genic DNA sequences. Nonetheless, the in-depth process (scenario) for de novo origination is not well understood. RESULTS: We have conceived bioinformatic analysis for sketching a scenario for de novo origination of protein-coding genes. For each de novo protein-coding gene, we firstly identified an edge of a given phylogenetic tree where the gene was born based on parsimony. Then, from a multiple sequence alignment of the de novo gene and its orthologous regions, we constructed ancestral DNA sequences of the gene corresponding to both end nodes of the edge. We finally revealed statistical features observed in evolution between the two ancestral sequences. In the analysis of the Saccharomyces cerevisiae lineage, we have successfully sketched a putative scenario for de novo origination of protein-coding genes. (1) In the beginning was GC-rich genome regions. (2) Neutral mutations were accumulated in the regions. (3) ORFs were extended/combined, and then (4) translation signature (Kozak consensus sequence) was recruited. Interestingly, as the scenario progresses from (2) to (4), the specificity of mutations increases. CONCLUSION: To the best of our knowledge, this is the first report outlining a scenario of de novo origination of protein-coding genes. Our bioinformatic analysis can capture events that occur during a short evolutionary time by directly observing the evolution of the ancestral sequences from non-genic to genic. This property is suitable for the analysis of fast evolving de novo genes.


Subject(s)
Evolution, Molecular , Open Reading Frames , Phylogeny , Saccharomyces cerevisiae , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae Proteins/genetics , Computational Biology/methods , Mutation , Genome, Fungal
2.
J Theor Biol ; 526: 110808, 2021 10 07.
Article in English | MEDLINE | ID: mdl-34118264

ABSTRACT

We discuss the dynamical robustness of biological networks represented by directed graphs, such as neural networks and gene regulatory networks. The theoretical results indicate that networks with low indegree variance and high outdegree variance are dynamically robust. We propose a machine learning method that gives equilibrium states to input-output networks with a recurrent hidden layer. We verify the theory by using the learned networks having various indegree and outdegree distributions. We also show that the basin of attraction of an equilibrium state is narrow when networks are dynamically robust.


Subject(s)
Machine Learning , Neural Networks, Computer , Gene Regulatory Networks
3.
J Biochem ; 169(4): 421-434, 2021 Apr 29.
Article in English | MEDLINE | ID: mdl-33386847

ABSTRACT

Whole transcriptome analyses have revealed that mammalian genomes are massively transcribed, resulting in the production of huge numbers of transcripts with unknown functions (TUFs). Previous research has categorized most TUFs as noncoding RNAs (ncRNAs) because most previously studied TUFs do not encode open reading frames (ORFs) with biologically significant lengths [>100 amino acids (AAs)]. Recent studies, however, have reported that several transcripts harbouring small ORFs that encode peptides shorter than 100 AAs are translated and play important biological functions. Here, we examined the translational capacity of transcripts annotated as ncRNAs in human cells, and identified several hundreds of ribosome-associated transcripts previously annotated as ncRNAs. Ribosome footprinting and polysome profiling analyses revealed that 61 of them are potentially translatable. Among them, 45 were nonnonsense-mediated mRNA decay targets, suggesting that they are productive mRNAs. We confirmed the translation of one ncRNA, LINC00493, by luciferase reporter assaying and western blotting of a FLAG-tagged LINC00493 peptide. While proteomic analysis revealed that the LINC00493 peptide interacts with many mitochondrial proteins, immunofluorescence assays showed that its peptide is mitochondrially localized. Our findings indicate that some transcripts annotated as ncRNAs encode peptides and that unannotated peptides may perform important roles in cells.


Subject(s)
Open Reading Frames , Peptides , RNA, Long Noncoding/genetics , RNA, Messenger , HeLa Cells , Humans , Peptides/genetics , Peptides/metabolism , RNA, Long Noncoding/biosynthesis , RNA, Messenger/biosynthesis , RNA, Messenger/genetics
4.
Phys Rev E ; 97(6-1): 062315, 2018 Jun.
Article in English | MEDLINE | ID: mdl-30011527

ABSTRACT

Although outdegree distributions of gene regulatory networks have scale-free characteristics similar to other biological networks, indegree distributions have single-scale characteristics with significantly lower variance than that of outdegree distributions. In this study, we mathematically explain that such asymmetric characteristics arise from dynamical robustness, which is the property of maintaining an equilibrium state of gene expressions against inevitable perturbations to the networks, such as gene dysfunction and mutation of promoters. We reveal that the expression of a single gene is robust to a perturbation for a large number of inputs and a small number of outputs. Applying these results to the networks, we also show that an equilibrium state of the networks is robust if the variance of the indegree distribution is low (i.e., single-scale characteristics) and that of the outdegree distribution is high (i.e., scale-free characteristics). These asymmetric characteristics are conserved across a wide range of species, from bacteria to humans.


Subject(s)
Gene Regulatory Networks , Models, Genetic , Animals , Drosophila/genetics , Escherichia coli/genetics , Gene Expression , Humans
5.
Front Genet ; 9: 144, 2018.
Article in English | MEDLINE | ID: mdl-29922328

ABSTRACT

Integrative analysis using omics-based technologies results in the identification of a large number of putative short open reading frames (sORFs) with protein-coding capacity within transcripts previously identified as long noncoding RNAs (lncRNAs) or transcripts of unknown function (TUFs). sORFs were previously overlooked because of their diminutive size and the difficulty of identification by bioinformatics analyses. There is now growing evidence of the existence of potentially functional micropeptides produced from sORFs within cells of diverse species. Recent characterization of a few of these revealed their significant divergent roles in many fundamental biological processes, where some also show important relationships with pathogenesis. Recent works therefore provide new insights for exploring the wealth of information that may lie within sORF-encoded short proteins. Here, we summarize the current progress and view of micropeptides encoded in sORFs of protein-coding genes.

6.
Nucleic Acids Res ; 45(13): e124, 2017 Jul 27.
Article in English | MEDLINE | ID: mdl-28531296

ABSTRACT

In recent years, the dramatic increase in the number of applications for massively parallel reporter assay (MPRA) technology has produced a large body of data for various purposes. However, a computational model that can be applied to decipher regulatory codes for diverse MPRAs does not exist yet. Here, we propose a new computational method to predict the transcriptional activity of MPRAs, as well as luciferase reporter assays, based on the TRANScription FACtor database. We employed regression trees and multivariate adaptive regression splines to obtain these predictions and considered a feature redundancy-dependent formula for conventional regression trees to enable adaptation to diverse data. The developed method was applicable to various MPRAs despite the use of different types of transfected cells, sequence lengths, construct numbers and sequence types. We demonstrate that this method can predict the transcriptional activity of promoters in HEK293 cells through predictive functions that were estimated by independent assays in eight tumor cell lines. The prediction was generally good (Pearson's r = 0.68) which suggested that common active transcription factor binding sites across different cell types make greater contributions to transcriptional activity and that known promoter activity could confer transcriptional activity of unknown promoters in some instances, regardless of cell type.


Subject(s)
DNA/genetics , DNA/metabolism , Genes, Reporter , Transcription, Genetic , Binding Sites/genetics , Computational Biology , Databases, Nucleic Acid/statistics & numerical data , HEK293 Cells , High-Throughput Nucleotide Sequencing/statistics & numerical data , Humans , Promoter Regions, Genetic , Regression Analysis , Sequence Analysis, DNA/statistics & numerical data , Transcription Factors/genetics , Transcription Factors/metabolism
7.
PLoS One ; 12(4): e0176492, 2017.
Article in English | MEDLINE | ID: mdl-28430819

ABSTRACT

To estimate gene regulatory networks, it is important that we know the number of connections, or sparseness of the networks. It can be expected that the robustness to perturbations is one of the factors determining the sparseness. We reconstruct a semi-quantitative model of gene networks from gene expression data in embryonic development and detect the optimal sparseness against perturbations. The dense networks are robust to connection-removal perturbation, whereas the sparse networks are robust to misexpression perturbation. We show that there is an optimal sparseness that serves as a trade-off between these perturbations, in agreement with the optimal result of validation for testing data. These results suggest that the robustness to the two types of perturbations determines the sparseness of gene networks.


Subject(s)
Gene Regulatory Networks , Models, Genetic , Computer Simulation
8.
Front Genet ; 8: 208, 2017.
Article in English | MEDLINE | ID: mdl-29632545

ABSTRACT

The MALAT1 long noncoding RNA is strongly linked to cancer progression. Here we report a MALAT1 function in repressing the promoter of p53 (TP53) tumor suppressor gene. p21 and FAS, well-known p53 targets, were upregulated by MALAT1 knockdown in A549 human lung adenocarcinoma cells. We found that these upregulations were mediated by transcriptional activation of p53 through MALAT1 depletion. In addition, we identified a minimal MALAT1-responsive region in the P1 promoter of p53 gene. Flow cytometry analysis revealed that MALAT1-depleted cells exhibited G1 cell cycle arrest. These results suggest that MALAT1 affects the expression of p53 target genes through repressing p53 promoter activity, leading to influence the cell cycle progression.

9.
BMC Genomics ; 16: 154, 2015 Mar 06.
Article in English | MEDLINE | ID: mdl-25879614

ABSTRACT

BACKGROUND: Histone epigenome data determined by chromatin immunoprecipitation sequencing (ChIP-seq) is used in identifying transcript regions and estimating expression levels. However, this estimation does not always correlate with eventual RNA expression levels measured by RNA sequencing (RNA-seq). Part of the inconsistency may arise from the variance in RNA stability, where the transcripts that are more or less abundant than predicted RNA expression from histone epigenome data are inferred to be more or less stable. However, there is little systematic analysis to validate this assumption. Here, we used stability data of whole transcriptome measured by 5'-bromouridine immunoprecipitation chase sequencing (BRIC-seq), which enabled us to determine the half-lives of whole transcripts including lincRNAs, and we integrated BRIC-seq with ChIP-seq to achieve better estimation of the eventual transcript levels and to understand the importance of post-transcriptional regulation that determine the eventual transcript levels. RESULTS: We identified discrepancies between the RNA abundance estimated by ChIP-seq and measured RNA expression from RNA-seq; for number of genes and estimated that the expression level of 865 genes was controlled at the level of RNA stability in HeLa cells. ENCODE data analysis supported the idea that RNA stability control aids to determine transcript levels in multiple cell types. We identified UPF1, EXOSC5 and STAU1, well-studied RNA degradation factors, as controlling factors for 8% of cases. Computational simulations reasonably explained the changes of eventual mRNA levels attributable to the changes in the rates of mRNA half-lives. In addition, we propose a feedback circuit that includes the regulated degradation of mRNAs encoding transcription factors to maintain the steady state level of RNA abundance. Intriguingly, these regulatory mechanisms were distinct between mRNAs and lincRNAs. CONCLUSIONS: Integrative analysis of ChIP-seq, RNA-seq and our BRIC-seq showed that transcriptional regulation and RNA degradation are independently regulated. In addition, RNA stability is an important determinant of eventual transcript levels. RNA binding proteins, such as UPF1, STAU1 and EXOSC5 may play active roles in such controls.


Subject(s)
RNA Stability , RNA/metabolism , Antigens, Neoplasm/metabolism , Chromatin Immunoprecipitation , Cytoskeletal Proteins/metabolism , Exosome Multienzyme Ribonuclease Complex/metabolism , Gene Expression Regulation , Half-Life , HeLa Cells , High-Throughput Nucleotide Sequencing , Histones/metabolism , Humans , RNA/chemistry , RNA Helicases , RNA, Long Noncoding/chemistry , RNA, Long Noncoding/metabolism , RNA, Messenger/chemistry , RNA, Messenger/metabolism , RNA-Binding Proteins/metabolism , Sequence Analysis, RNA , Trans-Activators/metabolism
10.
Mol Cell ; 53(3): 393-406, 2014 Feb 06.
Article in English | MEDLINE | ID: mdl-24507715

ABSTRACT

Although thousands of long noncoding RNAs (lncRNAs) are localized in the nucleus, only a few dozen have been functionally characterized. Here we show that nuclear enriched abundant transcript 1 (NEAT1), an essential lncRNA for the formation of nuclear body paraspeckles, is induced by influenza virus and herpes simplex virus infection as well as by Toll-like receptor3-p38 pathway-triggered poly I:C stimulation, resulting in excess formation of paraspeckles. We found that NEAT1 facilitates the expression of antiviral genes including cytokines such as interleukin-8 (IL8). We found that splicing factor proline/glutamine-rich (SFPQ), a NEAT1-binding paraspeckle protein, is a repressor of IL8 transcription, and that NEAT1 induction relocates SFPQ from the IL8 promoter to the paraspeckles, leading to transcriptional activation of IL8. Together, our data show that NEAT1 plays an important role in the innate immune response through the transcriptional regulation of antiviral genes by the stimulus-responsive cooperative action of NEAT1 and SFPQ.


Subject(s)
Immunity, Innate/genetics , Interleukin-8/genetics , RNA, Long Noncoding/physiology , RNA-Binding Proteins/metabolism , Gene Expression Regulation , HeLa Cells , Herpesvirus 1, Human/immunology , Humans , Measles virus/immunology , Orthomyxoviridae/immunology , PTB-Associated Splicing Factor , Promoter Regions, Genetic , Protein Transport , RNA, Long Noncoding/genetics , Transcription, Genetic
11.
PLoS One ; 9(1): e86133, 2014.
Article in English | MEDLINE | ID: mdl-24475080

ABSTRACT

We propose a tetrahedral Gray code that facilitates visualization of genome information on the surfaces of a tetrahedron, where the relative abundance of each [Formula: see text]-mer in the genomic sequence is represented by a color of the corresponding cell of a triangular lattice. For biological significance, the code is designed such that the [Formula: see text]-mers corresponding to any adjacent pair of cells differ from each other by only one nucleotide. We present a simple procedure to draw such a pattern on the development surfaces of a tetrahedron. The thus constructed tetrahedral Gray code can demonstrate evolutionary conservation and variation of the genome information of many organisms at a glance. We also apply the tetrahedral Gray code to the honey bee (Apis mellifera) genome to analyze its methylation structure. The results indicate that the honey bee genome exhibits CpG overrepresentation in spite of its methylation ability and that two conserved motifs, CTCGAG and CGCGCG, in the unmethylated regions are responsible for the overrepresentation of CpG.


Subject(s)
Computational Biology/methods , Genome/genetics , Genomics/methods , Animals , Bees/genetics , Color , Computational Biology/instrumentation
13.
BMC Genomics ; 14 Suppl 2: S3, 2013.
Article in English | MEDLINE | ID: mdl-23445489

ABSTRACT

BACKGROUND: microRNAs (miRNAs) are tiny endogenous RNAs that have been discovered in animals and plants, and direct the post-transcriptional regulation of target mRNAs for degradation or translational repression via binding to the 3'UTRs and the coding exons. To gain insight into the biological role of miRNAs, it is essential to identify the full repertoire of mRNA targets (target genes). A number of computer programs have been developed for miRNA-target prediction. These programs essentially focus on potential binding sites in 3'UTRs, which are recognized by miRNAs according to specific base-pairing rules. RESULTS: Here, we introduce a novel method for miRNA-target prediction that is entirely independent of existing approaches. The method is based on the hypothesis that transcription of a miRNA and its target genes tend to be co-regulated by common transcription factors. This hypothesis predicts the frequent occurrence of common cis-elements between promoters of a miRNA and its target genes. That is, our proposed method first identifies putative cis-elements in a promoter of a given miRNA, and then identifies genes that contain common putative cis-elements in their promoters. In this paper, we show that a significant number of common cis-elements occur in ~28% of experimentally supported human miRNA-target data. Moreover, we show that the prediction of human miRNA-targets based on our method is statistically significant. Further, we discuss the random incidence of common cis-elements, their consensus sequences, and the advantages and disadvantages of our method. CONCLUSIONS: This is the first report indicating prevalence of transcriptional regulation of a miRNA and its target genes by common transcription factors and the predictive ability of miRNA-targets based on this property.


Subject(s)
Computational Biology/methods , Gene Expression Regulation , MicroRNAs/genetics , Promoter Regions, Genetic/genetics , Animals , Consensus Sequence , Databases, Nucleic Acid , Humans , MicroRNAs/classification , Transcription Factors/metabolism
14.
RNA Biol ; 9(11): 1370-9, 2012 Nov.
Article in English | MEDLINE | ID: mdl-23064114

ABSTRACT

UPF1 eliminates aberrant mRNAs harboring premature termination codons, and regulates the steady-state levels of normal physiological mRNAs. Although genome-wide studies of UPF1 targets performed, previous studies did not distinguish indirect UPF1 targets because they could not determine UPF1-dependent altered RNA stabilities. Here, we measured the decay rates of the whole transcriptome in UPF1-depleted HeLa cells using BRIC-seq, an inhibitor-free method for directly measuring RNA stability. We determined the half-lives and expression levels of 9,229 transcripts. An amount of 785 transcripts were stabilized in UPF1-depleted cells. Among these, the expression levels of 76 transcripts were increased, but those of the other 709 transcripts were not altered. RNA immunoprecipitation showed UPF1 bound to the stabilized transcripts, suggesting that UPF1 directly degrades the 709 transcripts. Many UPF1 targets in this study were newly identified. This study clearly demonstrates that direct determination of RNA stability is a powerful approach for identifying targets of RNA degradation factors.


Subject(s)
Codon, Nonsense , RNA Stability , RNA, Messenger/genetics , Trans-Activators/genetics , Trans-Activators/metabolism , Transcriptome , Cell Line, Tumor , HeLa Cells , High-Throughput Nucleotide Sequencing , Humans , RNA Helicases , RNA Interference , RNA, Small Interfering , Sequence Analysis, RNA
15.
Bioinformatics ; 28(1): 25-31, 2012 Jan 01.
Article in English | MEDLINE | ID: mdl-22057160

ABSTRACT

MOTIVATION: How to find motifs from genome-scale functional sequences, such as all the promoters in a genome, is a challenging problem. Word-based methods count the occurrences of oligomers to detect excessively represented ones. This approach is known to be fast and accurate compared with other methods. However, two problems have hampered the application of such methods to large-scale data. One is the computational cost necessary for clustering similar oligomers, and the other is the bias in the frequency of fixed-length oligomers, which complicates the detection of significant words. RESULTS: We introduce a method that uses a DNA Gray code and equiprobable oligomers, which solve the clustering problem and the oligomer bias, respectively. Our method can analyze 18 000 sequences of ~1 kbp long in 30 s. We also show that the accuracy of our method is superior to that of a leading method, especially for large-scale data and small fractions of motif-containing sequences. AVAILABILITY: The online and stand-alone versions of the application, named Hegma, are available at our website: http://www.genome.ist.i.kyoto-u.ac.jp/~ichinose/hegma/ CONTACT: ichinose@i.kyoto-u.ac.jp; o.gotoh@i.kyoto-u.ac.jp


Subject(s)
Algorithms , Genome , Nucleotide Motifs , Sequence Analysis, DNA/methods , Cluster Analysis , Humans , Promoter Regions, Genetic , RNA, Untranslated/genetics
16.
Nucleic Acids Res ; 39(11): e75, 2011 Jun.
Article in English | MEDLINE | ID: mdl-21486745

ABSTRACT

We developed a computer program that can predict the intrinsic promoter activities of primary human DNA sequences. We observed promoter activity using a quantitative luciferase assay and generated a prediction model using multiple linear regression. Our program achieved a prediction accuracy correlation coefficient of 0.87 between the predicted and observed promoter activities. We evaluated the prediction accuracy of the program using massive sequencing analysis of transcriptional start sites in vivo. We found that it is still difficult to predict transcript levels in a strictly quantitative manner in vivo; however, it was possible to select active promoters in a given cell from the other silent promoters. Using this program, we analyzed the transcriptional landscape of the entire human genome. We demonstrate that many human genomic regions have potential promoter activity, and the expression of some previously uncharacterized putatively non-protein-coding transcripts can be explained by our prediction model. Furthermore, we found that nucleosomes occasionally formed open chromatin structures with RNA polymerase II recruitment where the program predicted significant promoter activities, although no transcripts were observed.


Subject(s)
Promoter Regions, Genetic , Software , Transcriptional Activation , Base Sequence , Binding Sites , DNA/chemistry , Genes, Reporter , Genome, Human , HEK293 Cells , Humans , Linear Models , Luciferases/analysis , Luciferases/genetics , Transcription Factors/metabolism , Transcription Initiation Site
17.
Genome Inform ; 25(1): 53-60, 2011.
Article in English | MEDLINE | ID: mdl-22230939

ABSTRACT

We developed linear regression models which predict strength of transcriptional activity of promoters from their sequences. Intrinsic transcriptional strength data of 451 human promoter sequences in three cell lines (HEK293, MCF7 and 3T3), which were measured by systematic luciferase reporter gene assays, were used to build the models. The models sum up contributions of CG dinucleotide content and transcription factor binding sites (TFBSs) to transcriptional strength. We evaluated prediction accuracies of the models by cross validation tests and found that they have adequate ability for predicting transcriptional strength of promoters in spite of their simple formalization. We also evaluated statistical significance of the contributions and proposed a picture of regulatory code hidden in promoter sequences. That is, CG dinucleotide content and TFBSs mainly determine strength of transcriptional activity under ubiquitous and specific environments, respectively.


Subject(s)
Models, Genetic , Promoter Regions, Genetic , Transcription, Genetic , 3T3 Cells , Animals , Base Composition , Binding Sites , HEK293 Cells , Humans , Linear Models , MCF-7 Cells , Mice , Transcription Factors/metabolism
18.
J Reprod Dev ; 52(2): 315-20, 2006 Apr.
Article in English | MEDLINE | ID: mdl-16462094

ABSTRACT

Although whey acidic protein (WAP) has been identified in the milk of a range of species, it has been predicted that WAP is not secreted into human milk as a result of critical point mutations within the coding region. In the present study, we first investigated computationally the promoter region of mutated human WAP genes by comparing with those of other known WAP genes. Computational database analyses showed that the human WAP promoter region was highly conserved, as in other species with milk WAP. Next, we evaluated the activity of the human WAP promoter (2.6 kb) using a reporter gene assay. MCF-7 cells were stably transfected with the hWAP/hGH (human growth hormone) fusion gene, cultured on Matrigel, and treated with lactogenic hormones. Radioimmunoassay detected hGH in the culture medium, indicating that the human WAP promoter was responsible for the lactogenic hormones. The human WAP promoter was significantly more active in MCF-7 cells than the mouse WAP promoter (2.4 kb). The present results provide us with important information on the molecular evolution of milk protein genes.


Subject(s)
Promoter Regions, Genetic , Cell Line, Tumor , Collagen/metabolism , Collagen/pharmacology , Computational Biology , Conserved Sequence , Databases, Genetic , Drug Combinations , Evolution, Molecular , Humans , Laminin/metabolism , Laminin/pharmacology , Milk Proteins/genetics , Milk, Human , Models, Genetic , Models, Statistical , Point Mutation , Proteoglycans/metabolism , Proteoglycans/pharmacology , Radioimmunoassay , Sequence Analysis, DNA , Transfection
20.
Tanpakushitsu Kakusan Koso ; 47(3): 276-80, 2002 Mar.
Article in Japanese | MEDLINE | ID: mdl-11889803
SELECTION OF CITATIONS
SEARCH DETAIL