Search | VHL Regional Portal

Secondary structure prediction of long noncoding RNA: review and experimental comparison of existing approaches.

Bugnon, L A; Edera, A A; Prochetto, S; Gerard, M; Raad, J; Fenoy, E; Rubiolo, M; Chorostecki, U; Gabaldón, T; Ariel, F; Di Persia, L E; Milone, D H; Stegmayer, G.

Brief Bioinform ; 23(4)2022 07 18.

Article in English | MEDLINE | ID: mdl-35692094

ABSTRACT

MOTIVATION: In contrast to messenger RNAs, the function of the wide range of existing long noncoding RNAs (lncRNAs) largely depends on their structure, which determines interactions with partner molecules. Thus, the determination or prediction of the secondary structure of lncRNAs is critical to uncover their function. Classical approaches for predicting RNA secondary structure have been based on dynamic programming and thermodynamic calculations. In the last 4 years, a growing number of machine learning (ML)-based models, including deep learning (DL), have achieved breakthrough performance in structure prediction of biomolecules such as proteins and have outperformed classical methods in short transcripts folding. Nevertheless, the accurate prediction for lncRNA still remains far from being effectively solved. Notably, the myriad of new proposals has not been systematically and experimentally evaluated. RESULTS: In this work, we compare the performance of the classical methods as well as the most recently proposed approaches for secondary structure prediction of RNA sequences using a unified and consistent experimental setup. We use the publicly available structural profiles for 3023 yeast RNA sequences, and a novel benchmark of well-characterized lncRNA structures from different species. Moreover, we propose a novel metric to assess the predictive performance of methods, exclusively based on the chemical probing data commonly used for profiling RNA structures, avoiding any potential bias incorporated by computational predictions when using dot-bracket references. Our results provide a comprehensive comparative assessment of existing methodologies, and a novel and public benchmark resource to aid in the development and comparison of future approaches. AVAILABILITY: Full source code and benchmark datasets are available at: https://github.com/sinc-lab/lncRNA-folding. CONTACT: lbugnon@sinc.unl.edu.ar.

Subject(s)

RNA, Long Noncoding , Computational Biology/methods , Protein Structure, Secondary , RNA, Long Noncoding/genetics , RNA, Long Noncoding/metabolism , RNA, Messenger , Software

DL4papers: a deep learning approach for the automatic interpretation of scientific articles.

Bugnon, L A; Yones, C; Raad, J; Gerard, M; Rubiolo, M; Merino, G; Pividori, M; Di Persia, L; Milone, D H; Stegmayer, G.

Bioinformatics ; 36(11): 3499-3506, 2020 06 01.

Article in English | MEDLINE | ID: mdl-32091584

ABSTRACT

MOTIVATION: In precision medicine, next-generation sequencing and novel preclinical reports have led to an increasingly large amount of results, published in the scientific literature. However, identifying novel treatments or predicting a drug response in, for example, cancer patients, from the huge amount of papers available remains a laborious and challenging work. This task can be considered a text mining problem that requires reading a lot of academic documents for identifying a small set of papers describing specific relations between key terms. Due to the infeasibility of the manual curation of these relations, computational methods that can automatically identify them from the available literature are urgently needed. RESULTS: We present DL4papers, a new method based on deep learning that is capable of analyzing and interpreting papers in order to automatically extract relevant relations between specific keywords. DL4papers receives as input a query with the desired keywords, and it returns a ranked list of papers that contain meaningful associations between the keywords. The comparison against related methods showed that our proposal outperformed them in a cancer corpus. The reliability of the DL4papers output list was also measured, revealing that 100% of the first two documents retrieved for a particular search have relevant relations, in average. This shows that our model can guarantee that in the top-2 papers of the ranked list, the relation can be effectively found. Furthermore, the model is capable of highlighting, within each document, the specific fragments that have the associations of the input keywords. This can be very useful in order to pay attention only to the highlighted text, instead of reading the full paper. We believe that our proposal could be used as an accurate tool for rapidly identifying relationships between genes and their mutations, drug responses and treatments in the context of a certain disease. This new approach can certainly be a very useful and valuable resource for the advancement of the precision medicine field. AVAILABILITY AND IMPLEMENTATION: A web-demo is available at: http://sinc.unl.edu.ar/web-demo/dl4papers/. Full source code and data are available at: https://sourceforge.net/projects/sourcesinc/files/dl4papers/. CONTACT: lbugnon@sinc.unl.edu.ar. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Deep Learning , Software , Data Mining , Humans , Precision Medicine , Reproducibility of Results

Extreme learning machines for reverse engineering of gene regulatory networks from expression time series.

Rubiolo, M; Milone, D H; Stegmayer, G.

Bioinformatics ; 34(7): 1253-1260, 2018 04 01.

Article in English | MEDLINE | ID: mdl-29182723

ABSTRACT

Motivation: The reconstruction of gene regulatory networks (GRNs) from genes profiles has a growing interest in bioinformatics for understanding the complex regulatory mechanisms in cellular systems. GRNs explicitly represent the cause-effect of regulation among a group of genes and its reconstruction is today a challenging computational problem. Several methods were proposed, but most of them require different input sources to provide an acceptable prediction. Thus, it is a great challenge to reconstruct a GRN only from temporal gene expression data. Results: Extreme Learning Machine (ELM) is a new supervised neural model that has gained interest in the last years because of its higher learning rate and better performance than existing supervised models in terms of predictive power. This work proposes a novel approach for GRNs reconstruction in which ELMs are used for modeling the relationships between gene expression time series. Artificial datasets generated with the well-known benchmark tool used in DREAM competitions were used. Real datasets were used for validation of this novel proposal with well-known GRNs underlying the time series. The impact of increasing the size of GRNs was analyzed in detail for the compared methods. The results obtained confirm the superiority of the ELM approach against very recent state-of-the-art methods in the same experimental conditions. Availability and implementation: The web demo can be found at http://sinc.unl.edu.ar/web-demo/elm-grnnminer/. The source code is available at https://sourceforge.net/projects/sourcesinc/files/elm-grnnminer. Contact: mrubiolo@santafe-conicet.gov.ar. Supplementary information: Supplementary data are available at Bioinformatics online.

Subject(s)

Computational Biology/methods , Gene Expression Regulation , Gene Regulatory Networks , Software , Supervised Machine Learning , Escherichia coli/genetics , Models, Genetic , Saccharomyces cerevisiae/genetics

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL