Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
IEEE/ACM Trans Comput Biol Bioinform ; 16(6): 2066-2077, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-29994224

RESUMO

Over the past few years, it has been established that a number of long intergenic non-coding RNAs (lincRNAs) are linked to a wide variety of human diseases. The relationship among many other lincRNAs still remains as puzzle. Validation of such link between the two entities through biological experiments is expensive. However, piles of information about the two are becoming available, thanks to the High Throughput Sequencing (HTS) platforms, Genome Wide Association Studies (GWAS), etc., thereby opening opportunity for cutting-edge machine learning and data mining approaches. However, there are only a few in silico lincRNA-disease association inference tools available to date, and none of these utilizes side information of both the entities. The recently developed Inductive Matrix Completion (IMC) technique provides a recommendation platform among two entities considering respective side information. But, the formulation of IMC is incapable of handling noise and outliers that may present in the dataset, while data sparsity consideration is another issue with the standard IMC method. Thus, a robust version of IMC is needed that can solve these two issues. As a remedy, in this paper, we propose Robust Inductive Matrix Completion (RIMC) using l2,1 norm loss function as well as l2,1 norm based regularization. We applied RIMC to the available association data between human lincRNAs and OMIM disease phenotypes as well as a diverse set of side information about the lincRNAs and the diseases. Our method performs better than the state-of-the-art methods in terms of precision@k and recall@k at the top- k disease prioritization to the subject lincRNAs. We also demonstrate that RIMC is equally effective for querying about novel lincRNAs, as well as predicting rank of a newly known disease for a set of well-characterized lincRNAs. Availability: All the supporting datasets are available at the publicly accessible URL located at http://biomecis.uta.edu/~ashis/res/RIMC/.


Assuntos
Biologia Computacional/métodos , Estudo de Associação Genômica Ampla , Sequenciamento de Nucleotídeos em Larga Escala , Polimorfismo de Nucleotídeo Único , RNA Longo não Codificante , Algoritmos , Área Sob a Curva , Mineração de Dados , Bases de Dados Factuais , Humanos , Aprendizado de Máquina , Modelos Estatísticos , Fenótipo
2.
BMC Med Genomics ; 10(Suppl 5): 77, 2017 12 28.
Artigo em Inglês | MEDLINE | ID: mdl-29297358

RESUMO

BACKGROUNDS: A large number of long intergenic non-coding RNAs (lincRNAs) are linked to a broad spectrum of human diseases. The disease association with many other lincRNAs still remain as puzzle. Validation of such links between the two entities through biological experiments are expensive. However, a plethora lincRNA-data are available now, thanks to the High Throughput Sequencing (HTS) platforms, Genome Wide Association Studies (GWAS), etc, which opens the opportunity for cutting-edge machine learning and data mining approaches to extract meaningful relationships among lincRNAs and diseases. However, there are only a few in silico lincRNA-disease association inference tools available to date, and none of them utilizes side information of both the entities simultaneously in a single framework. METHODS: The recently developed Inductive Matrix Completion (IMC) technique provides a recommendation platform among two entities considering respective side information about them. However, the formulation of IMC is incapable of handling noise and outliers that may be present in the datasets, while data sparsity consideration is another issue with the standard IMC method. Thus, a robust version of IMC is needed that can solve the two issues. As a remedy, in this paper, we propose Stable Robust Inductive Matrix Completion (SRIMC) that utilizes the l 2,1 norm based regularization to optimize the objective function with a unique 2-step stable solution approach. RESULTS: We applied SRIMC to the available association data between human lincRNAs and OMIM disease phenotypes as well as a diverse set of side information about the lincRNAs and the diseases. The method performs better than the state-of-the-art methods in terms of p r e c i s i o n @ k and r e c a l l @ k at the top-k disease prioritization to the subject lincRNAs. We also demonstrate that SRIMC is equally effective for querying about novel lincRNAs, as well as predicting rank of a newly known disease for a set of well-characterized lincRNAs. CONCLUSIONS: With the experimental results and computational evaluation, we show that SRIMC is robust in handling datasets with noise and outliers as well as dealing with novel lincRNAs and disease phenotypes.


Assuntos
Biologia Computacional/métodos , Doença/genética , RNA Longo não Codificante/genética , Algoritmos , Estudo de Associação Genômica Ampla , Humanos
3.
J Bioinform Comput Biol ; 14(5): 1650027, 2016 10.
Artigo em Inglês | MEDLINE | ID: mdl-27455882

RESUMO

RNA-seq, the next generation sequencing platform, enables researchers to explore deep into the transcriptome of organisms, such as identifying functional non-coding RNAs (ncRNAs), and quantify their expressions on tissues. The functions of ncRNAs are mostly related to their secondary structures. Thus by exploring the clustering in terms of structural profiles of the corresponding read-segments would be essential and this fuels in our motivation behind this research. In this manuscript we proposed PR2S2Clust, Patched RNA-seq Read Segments' Structure-oriented Clustering, which is an analysis platform to extract features to prepare the secondary structure profiles of the RNA-seq read segments. It provides a strategy to employ the profiles to annotate the segments into ncRNA classes using several clustering strategies. The system considers seven pairwise structural distance metrics by considering short-read mappings onto each structure, which we term as the "patched structure" while clustering the segments. In this regard, we show applications of both classical and ensemble clusterings of the partitional and hierarchical variations. Extensive real-world experiments over three publicly available RNA-seq datasets and a comparative analysis over four competitive systems confirm the effectiveness and superiority of the proposed system. The source codes and dataset of PR2S2Clust are available at the http://biomecis.uta.edu/~ashis/res/PR2S2Clust-suppl/ .


Assuntos
Algoritmos , RNA não Traduzido/química , Análise de Sequência de RNA/métodos , Análise por Conglomerados , Bases de Dados Genéticas , Humanos , Conformação de Ácido Nucleico , RNA não Traduzido/genética
4.
J Bioinform Comput Biol ; 11(5): 1342002, 2013 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-24131051

RESUMO

The statistics about the open reading frames, the base compositions and the properties of the predicted secondary structures have potential to address the problem of discriminating coding and noncoding transcripts. Again, the Next Generation Sequencing platform, RNA-seq, provides us bounty of data from which expression profiles of the transcripts can be extracted which urged us adding a new set of dimension in this classification task. In this paper, we proposed CNCTDiscriminator -- a coding and noncoding transcript discriminating system where we applied the integration of these four categories of features about the transcripts. The feature integration was done using both hypothesis learning and feature specific ensemble learning approaches. The CNCTDiscriminator model which was trained with composition and ORF features outperforms (precision 83.86%, recall 82.01%) other three popular methods -- CPC (precision 98.31%, recall 25.95%), CPAT (precision 97.74%, recall 52.50%) and PORTRAIT (precision 84.37%, recall 73.2%) when applied to an independent benchmark dataset. However, the CNCTDiscriminator model that was trained using the ensemble approach shows comparable performance (precision 89.85%, recall 71.08%).


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , RNA Mensageiro/genética , RNA não Traduzido/genética , Análise de Sequência de RNA/estatística & dados numéricos , Algoritmos , Inteligência Artificial , Composição de Bases , Biologia Computacional , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Humanos , Modelos Estatísticos , Conformação de Ácido Nucleico , Fases de Leitura Aberta , RNA Mensageiro/química , RNA não Traduzido/química , Software
5.
BMC Bioinformatics ; 11: 273, 2010 May 21.
Artigo em Inglês | MEDLINE | ID: mdl-20492656

RESUMO

BACKGROUND: Most of the existing in silico phosphorylation site prediction systems use machine learning approach that requires preparing a good set of classification data in order to build the classification knowledge. Furthermore, phosphorylation is catalyzed by kinase enzymes and hence the kinase information of the phosphorylated sites has been used as major classification data in most of the existing systems. Since the number of kinase annotations in protein sequences is far less than that of the proteins being sequenced to date, the prediction systems that use the information found from the small clique of kinase annotated proteins can not be considered as completely perfect for predicting outside the clique. Hence the systems are certainly not generalized. In this paper, a novel generalized prediction system, PPRED (Phosphorylation PREDictor) is proposed that ignores the kinase information and only uses the evolutionary information of proteins for classifying phosphorylation sites. RESULTS: Experimental results based on cross validations and an independent benchmark reveal the significance of using the evolutionary information alone to classify phosphorylation sites from protein sequences. The prediction performance of the proposed system is better than those of the existing prediction systems that also do not incorporate kinase information. The system is also comparable to systems that incorporate kinase information in predicting such sites. CONCLUSIONS: The approach presented in this paper provides an efficient way to identify phosphorylation sites in a given protein primary sequence that would be a valuable information for the molecular biologists working on protein phosphorylation sites and for bioinformaticians developing generalized prediction systems for the post translational modifications like phosphorylation or glycosylation. PPRED is publicly available at the URL http://www.cse.univdhaka.edu/~ashis/ppred/index.php.


Assuntos
Inteligência Artificial , Proteínas/química , Proteínas/metabolismo , Sequência de Aminoácidos , Sítios de Ligação , Bases de Dados de Proteínas , Fosforilação , Proteínas Quinases/química , Proteínas Quinases/metabolismo , Análise de Sequência de Proteína , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...