Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Curr Gene Ther ; 22(3): 228-244, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34254917

RESUMO

Long non-coding RNAs (LncRNAs) are a type of RNA with little or no protein-coding ability. Their length is more than 200 nucleotides. A large number of studies have indicated that lncRNAs play a significant role in various biological processes, including chromatin organizations, epigenetic programmings, transcriptional regulations, post-transcriptional processing, and circadian mechanism at the cellular level. Since lncRNAs perform vast functions through their interactions with proteins, identifying lncRNA-protein interaction is crucial to the understandings of the lncRNA molecular functions. However, due to the high cost and time-consuming disadvantage of experimental methods, a variety of computational methods have emerged. Recently, many effective and novel machine learning methods have been developed. In general, these methods fall into two categories: semisupervised learning methods and supervised learning methods. The latter category can be further classified into the deep learning-based method, the ensemble learning-based method, and the hybrid method. In this paper, we focused on supervised learning methods. We summarized the state-of-the-art methods in predicting lncRNA-protein interactions. Furthermore, the performance and the characteristics of different methods have also been compared in this work. Considering the limits of the existing models, we analyzed the problems and discussed future research potentials.


Assuntos
RNA Longo não Codificante , Biologia Computacional/métodos , Regulação da Expressão Gênica , Aprendizado de Máquina , Proteínas/genética , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo
2.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33822882

RESUMO

Noncoding RNAs (ncRNAs) play crucial roles in many biological processes. Experimental methods for identifying ncRNA-protein interactions (NPIs) are always costly and time-consuming. Many computational approaches have been developed as alternative ways. In this work, we collected five benchmarking datasets for predicting NPIs. Based on these datasets, we evaluated and compared the prediction performances of existing machine-learning based methods. Graph neural network (GNN) is a recently developed deep learning algorithm for link predictions on complex networks, which has never been applied in predicting NPIs. We constructed a GNN-based method, which is called Noncoding RNA-Protein Interaction prediction using Graph Neural Networks (NPI-GNN), to predict NPIs. The NPI-GNN method achieved comparable performance with state-of-the-art methods in a 5-fold cross-validation. In addition, it is capable of predicting novel interactions based on network information and sequence information. We also found that insufficient sequence information does not affect the NPI-GNN prediction performance much, which makes NPI-GNN more robust than other methods. As far as we can tell, NPI-GNN is the first end-to-end GNN predictor for predicting NPIs. All benchmarking datasets in this work and all source codes of the NPI-GNN method have been deposited with documents in a GitHub repo (https://github.com/AshuiRUA/NPI-GNN).


Assuntos
Aprendizado Profundo , Proteínas/metabolismo , RNA não Traduzido/metabolismo , Software , Benchmarking , Conjuntos de Dados como Assunto , Humanos , Internet , Ligação Proteica , Proteínas/genética , RNA não Traduzido/genética , Sensibilidade e Especificidade
3.
Brief Bioinform ; 22(4)2021 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-33147622

RESUMO

With the development of high-throughput sequencing technology, the genomic sequences increased exponentially over the last decade. In order to decode these new genomic data, machine learning methods were introduced for genome annotation and analysis. Due to the requirement of most machines learning methods, the biological sequences must be represented as fixed-length digital vectors. In this representation procedure, the physicochemical properties of k-tuple nucleotides are important information. However, the values of the physicochemical properties of k-tuple nucleotides are scattered in different resources. To facilitate the studies on genomic sequences, we developed the first comprehensive database, namely KNIndex (https://knindex.pufengdu.org), for depositing and visualizing physicochemical properties of k-tuple nucleotides. Currently, the KNIndex database contains 182 properties including one for mononucleotide (DNA), 169 for dinucleotide (147 for DNA and 22 for RNA) and 12 for trinucleotide (DNA). KNIndex database also provides a user-friendly web-based interface for the users to browse, query, visualize and download the physicochemical properties of k-tuple nucleotides. With the built-in conversion and visualization functions, users are allowed to display DNA/RNA sequences as curves of multiple physicochemical properties. We wish that the KNIndex will facilitate the related studies in computational biology.


Assuntos
DNA/genética , Bases de Dados de Ácidos Nucleicos , Sequenciamento de Nucleotídeos em Larga Escala , Nucleotídeos/genética , RNA/genética , Software , Genômica
4.
Front Pharmacol ; 12: 784171, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-35095495

RESUMO

Drug repositioning provides a promising and efficient strategy to discover potential associations between drugs and diseases. Many systematic computational drug-repositioning methods have been introduced, which are based on various similarities of drugs and diseases. In this work, we proposed a new computational model, DDA-SKF (drug-disease associations prediction using similarity kernels fusion), which can predict novel drug indications by utilizing similarity kernel fusion (SKF) and Laplacian regularized least squares (LapRLS) algorithms. DDA-SKF integrated multiple similarities of drugs and diseases. The prediction performances of DDA-SKF are better, or at least comparable, to all state-of-the-art methods. The DDA-SKF can work without sufficient similarity information between drug indications. This allows us to predict new purpose for orphan drugs. The source code and benchmarking datasets are deposited in a GitHub repository (https://github.com/GCQ2119216031/DDA-SKF).

5.
Front Genet ; 11: 615144, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33362868

RESUMO

Long non-coding RNAs (lncRNAs) play an important role in serval biological activities, including transcription, splicing, translation, and some other cellular regulation processes. lncRNAs perform their biological functions by interacting with various proteins. The studies on lncRNA-protein interactions are of great value to the understanding of lncRNA functional mechanisms. In this paper, we proposed a novel model to predict potential lncRNA-protein interactions using the SKF (similarity kernel fusion) and LapRLS (Laplacian regularized least squares) algorithms. We named this method the LPI-SKF. Various similarities of both lncRNAs and proteins were integrated into the LPI-SKF. LPI-SKF can be applied in predicting potential interactions involving novel proteins or lncRNAs. We obtained an AUROC (area under receiver operating curve) of 0.909 in a 5-fold cross-validation, which outperforms other state-of-the-art methods. A total of 19 out of the top 20 ranked interaction predictions were verified by existing data, which implied that the LPI-SKF had great potential in discovering unknown lncRNA-protein interactions accurately. All data and codes of this work can be downloaded from a GitHub repository (https://github.com/zyk2118216069/LPI-SKF).

6.
Bioinformatics ; 36(4): 1277-1278, 2020 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-31504195

RESUMO

SUMMARY: Many efforts have been made in developing bioinformatics algorithms to predict functional attributes of genes and proteins from their primary sequences. One challenge in this process is to intuitively analyze and to understand the statistical features that have been selected by heuristic or iterative methods. In this paper, we developed VisFeature, which aims to be a helpful software tool that allows the users to intuitively visualize and analyze statistical features of all types of biological sequence, including DNA, RNA and proteins. VisFeature also integrates sequence data retrieval, multiple sequence alignments and statistical feature generation functions. AVAILABILITY AND IMPLEMENTATION: VisFeature is a desktop application that is implemented using JavaScript/Electron and R. The source codes of VisFeature are freely accessible from the GitHub repository (https://github.com/wangjun1996/VisFeature). The binary release, which includes an example dataset, can be freely downloaded from the same GitHub repository (https://github.com/wangjun1996/VisFeature/releases). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Proteínas , Software , Algoritmos , Alinhamento de Sequência , Análise de Sequência de DNA
7.
J Theor Biol ; 473: 38-43, 2019 07 21.
Artigo em Inglês | MEDLINE | ID: mdl-31051179

RESUMO

Golgi apparatus is an important subcellular organelle that participates the secretion pathway. The role of Golgi apparatus in cellular process is related with Golgi-resident proteins. Knowing the sub-Golgi locations of Golgi-resident proteins is helpful in understanding their molecular functions. In this work, we proposed a computational method to predict the sub-Golgi locations for the Golgi-resident proteins. We take three sub-Golgi locations into consideration: the cis-Golgi network (CGN), the Golgi stack and the trans-Golgi network (TGN). By combining Pseudo-Amino Acid Compositions (Type-II PseAAC) and the Functional Domain Enrichment Score (FunDES), our method not only achieved better performances than existing methods, but also capable of recognizing proteins of the Golgi stack location, which is never considered in other state-of-the-art works.


Assuntos
Aminoácidos/metabolismo , Complexo de Golgi/metabolismo , Proteínas/química , Proteínas/metabolismo , Algoritmos , Calibragem , Bases de Dados de Proteínas , Domínios Proteicos
8.
Front Genet ; 10: 1341, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-32038709

RESUMO

Long non-coding RNAs (lncRNAs) play important roles in various biological processes, where lncRNA-protein interactions are usually involved. Therefore, identifying lncRNA-protein interactions is of great significance to understand the molecular functions of lncRNAs. Since the experiments to identify lncRNA-protein interactions are always costly and time consuming, computational methods are developed as alternative approaches. However, existing lncRNA-protein interaction predictors usually require prior knowledge of lncRNA-protein interactions with experimental evidences. Their performances are limited due to the number of known lncRNA-protein interactions. In this paper, we explored a novel way to predict lncRNA-protein interactions without direct prior knowledge. MiRNAs were picked up as mediators to estimate potential interactions between lncRNAs and proteins. By validating our results based on known lncRNA-protein interactions, our method achieved an AUROC (Area Under Receiver Operating Curve) of 0.821, which is comparable to the state-of-the-art methods. Moreover, our method achieved an improved AUROC of 0.852 by further expanding the training dataset. We believe that our method can be a useful supplement to the existing methods, as it provides an alternative way to estimate lncRNA-protein interactions in a heterogeneous network without direct prior knowledge. All data and codes of this work can be downloaded from GitHub (https://github.com/zyk2118216069/LncRNA-protein-interactions-prediction).

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...