Search | VHL Regional Portal

cirCodAn: A GHMM-based tool for accurate prediction of coding regions in circRNA.

Barbosa, Denilson Fagundes; Oliveira, Liliane Santana; Nachtigall, Pedro Gabriel; Valentini Junior, Rodolpho; de Souza, Nayane; Paschoal, Alexandre Rossi; Kashiwabara, André Yoshiaki.

Adv Protein Chem Struct Biol ; 139: 289-334, 2024.

Article in English | MEDLINE | ID: mdl-38448139

ABSTRACT

Studies focusing on characterizing circRNAs with the potential to translate into peptides are quickly advancing. It is helping to elucidate the roles played by circRNAs in several biological processes, especially in the emergence and development of diseases. While various tools are accessible for predicting coding regions within linear sequences, none have demonstrated accurate open reading frame detection in circular sequences, such as circRNAs. Here, we present cirCodAn, a novel tool designed to predict coding regions in circRNAs. We evaluated the performance of cirCodAn using datasets of circRNAs with strong translation evidence and showed that cirCodAn outperformed the other tools available to perform a similar task. Our findings demonstrate the applicability of cirCodAn to identify coding regions in circRNAs, which reveals the potential of use of cirCodAn in future research focusing on elucidating the biological roles of circRNAs and their encoded proteins. cirCodAn is freely available at https://github.com/denilsonfbar/cirCodAn.

Subject(s)

RNA, Circular , Open Reading Frames/genetics

DNA methylation landscape of hepatoblastomas reveals arrest at early stages of liver differentiation and cancer-related alterations.

Maschietto, Mariana; Rodrigues, Tatiane Cristina; Kashiwabara, André Yoshiaki; de Araujo, Érica Sara Souza; Marques Aguiar, Talita Ferreira; da Costa, Cecilia Maria Lima; da Cunha, Isabela Werneck; Dos Reis Vasques, Luciana; Cypriano, Monica; Brentani, Helena; de Toledo, Silvia Regina Caminada; Pearson, Peter Lees; Carraro, Dirce Maria; Rosenberg, Carla; Krepischi, Ana C V.

Oncotarget ; 8(58): 97871-97889, 2017 Nov 17.

Article in English | MEDLINE | ID: mdl-29228658

ABSTRACT

Hepatoblastomas are uncommon embryonal liver tumors accounting for approximately 80% of childhood hepatic cancer. We hypothesized that epigenetic changes, including DNA methylation, could be relevant to hepatoblastoma onset. The methylomes of eight matched hepatoblastomas and non-tumoral liver tissues were characterized, and data were validated in an independent group (11 hepatoblastomas). In comparison to differentiated livers, hepatoblastomas exhibited a widespread and non-stochastic pattern of global low-level hypomethylation. The analysis revealed 1,359 differentially methylated CpG sites (DMSs) between hepatoblastomas and control livers, which are associated with 765 genes. Hypomethylation was detected in hepatoblastomas for ~58% of the DMSs with enrichment at intergenic sites, and most of the hypermethylated CpGs were located in CpG islands. Functional analyses revealed enrichment in signaling pathways involved in metabolism, negative regulation of cell differentiation, liver development, cancer, and Wnt signaling pathway. Strikingly, an important overlap was observed between the 1,359 DMSs and the CpG sites reported to exhibit methylation changes through liver development (p<0.0001), with similar patterns of methylation in both hepatoblastomas and fetal livers compared to adult livers. Overall, our results suggest an arrest at early stages of liver cell differentiation, in line with the hypothesis that hepatoblastoma ontogeny involves the disruption of liver development. This genome-wide methylation dysfunction, taken together with a relatively small number of driver genetic mutations reported for both adult and pediatric liver cancers, shed light on the relevance of epigenetic mechanisms for hepatic tumorigenesis.

ToPS: a framework to manipulate probabilistic models of sequence data.

Kashiwabara, André Yoshiaki; Bonadio, Igor; Onuchic, Vitor; Amado, Felipe; Mathias, Rafael; Durham, Alan Mitchell.

PLoS Comput Biol ; 9(10): e1003234, 2013.

Article in English | MEDLINE | ID: mdl-24098098

ABSTRACT

Discrete Markovian models can be used to characterize patterns in sequences of values and have many applications in biological sequence analysis, including gene prediction, CpG island detection, alignment, and protein profiling. We present ToPS, a computational framework that can be used to implement different applications in bioinformatics analysis by combining eight kinds of models: (i) independent and identically distributed process; (ii) variable-length Markov chain; (iii) inhomogeneous Markov chain; (iv) hidden Markov model; (v) profile hidden Markov model; (vi) pair hidden Markov model; (vii) generalized hidden Markov model; and (viii) similarity based sequence weighting. The framework includes functionality for training, simulation and decoding of the models. Additionally, it provides two methods to help parameter setting: Akaike and Bayesian information criteria (AIC and BIC). The models can be used stand-alone, combined in Bayesian classifiers, or included in more complex, multi-model, probabilistic architectures using GHMMs. In particular the framework provides a novel, flexible, implementation of decoding in GHMMs that detects when the architecture can be traversed efficiently.

Subject(s)

Computational Biology/methods , Markov Chains , Sequence Analysis/methods , Bayes Theorem , CpG Islands/genetics

Decreasing the number of false positives in sequence classification.

Machado-Lima, Ariane; Kashiwabara, André Yoshiaki; Durham, Alan Mitchell.

BMC Genomics ; 11 Suppl 5: S10, 2010 Dec 22.

Article in English | MEDLINE | ID: mdl-21210966

ABSTRACT

BACKGROUND: A large number of probabilistic models used in sequence analysis assign non-zero probability values to most input sequences. To decide when a given probability is sufficient the most common way is bayesian binary classification, where the probability of the model characterizing the sequence family of interest is compared to that of an alternative probability model. We can use as alternative model a null model. This is the scoring technique used by sequence analysis tools such as HMMER, SAM and INFERNAL. The most prevalent null models are position-independent residue distributions that include: the uniform distribution, genomic distribution, family-specific distribution and the target sequence distribution. This paper presents a study to evaluate the impact of the choice of a null model in the final result of classifications. In particular, we are interested in minimizing the number of false predictions in a classification. This is a crucial issue to reduce costs of biological validation. RESULTS: For all the tests, the target null model presented the lowest number of false positives, when using random sequences as a test. The study was performed in DNA sequences using GC content as the measure of content bias, but the results should be valid also for protein sequences. To broaden the application of the results, the study was performed using randomly generated sequences. Previous studies were performed on aminoacid sequences, using only one probabilistic model (HMM) and on a specific benchmark, and lack more general conclusions about the performance of null models. Finally, a benchmark test with P. falciparum confirmed these results. CONCLUSIONS: Of the evaluated models the best suited for classification are the uniform model and the target model. However, the use of the uniform model presents a GC bias that can cause more false positives for candidate sequences with extreme compositional bias, a characteristic not described in previous studies. In these cases the target model is more dependable for biological validation due to its higher specificity.

Subject(s)

Base Sequence/genetics , Classification/methods , Models, Statistical , Sequence Analysis, DNA/methods , Base Composition , Bayes Theorem , Likelihood Functions , Plasmodium falciparum/genetics , ROC Curve , Research Design , Sensitivity and Specificity

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL