Pesquisa | Portal Regional da BVS

Du, Shengwang; Li, Junyi; Bian, Naizheng.

PLoS One ; 15(11): e0238220, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-33237908

RESUMO

The development of high-throughput sequencing technology has generated huge amounts DNA data. Many general compression algorithms are not ideal for compressing DNA data, such as the LZ77 algorithm. On the basis of Nour and Sharawi's method,we propose a new, lossless and reference-free method to increase the compression performance. The original sequences are converted into eight intermediate files and six final files. Then, the LZ77 algorithm is used to compress the six final files. The results show that the compression time is decreased by 83% and the decompression time is decreased by 54% on average.The compression rate is almost the same as Nour and Sharawi's method which is the fastest method so far. What's more, our method has a wider range of application than Nour and Sharawi's method. Compared to some very advanced compression tools at present, such as XM and FCM-Mx, the time for compression in our method is much smaller, on average decreasing the time by more than 90%.

Assuntos

DNA/genética , Compressão de Dados/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software

DNILMF-LDA: Prediction of lncRNA-Disease Associations by Dual-Network Integrated Logistic Matrix Factorization and Bayesian Optimization.

Li, Yan; Li, Junyi; Bian, Naizheng.

Genes (Basel) ; 10(8)2019 08 12.

Artigo em Inglês | MEDLINE | ID: mdl-31409034

RESUMO

Identifying associations between lncRNAs and diseases can help understand disease-related lncRNAs and facilitate disease diagnosis and treatment. The dual-network integrated logistic matrix factorization (DNILMF) model has been used for drug-target interaction prediction, and good results have been achieved. We firstly applied DNILMF to lncRNA-disease association prediction (DNILMF-LDA). We combined different similarity kernel matrices of lncRNAs and diseases by using nonlinear fusion to extract the most important information in fused matrices. Then, lncRNA-disease association networks and similarity networks were built simultaneously. Finally, the Gaussian process mutual information (GP-MI) algorithm of Bayesian optimization was adopted to optimize the model parameters. The 10-fold cross-validation result showed that the area under receiving operating characteristic (ROC) curve (AUC) value of DNILMF-LDA was 0.9202, and the area under precision-recall (PR) curve (AUPR) was 0.5610. Compared with LRLSLDA, SIMCLDA, BiwalkLDA, and TPGLDA, the AUC value of our method increased by 38.81%, 13.07%, 8.35%, and 6.75%, respectively. The AUPR value of our method increased by 52.66%, 40.05%, 37.01%, and 44.25%. These results indicate that DNILMF-LDA is an effective method for predicting the associations between lncRNAs and diseases.

Assuntos

Predisposição Genética para Doença , Estudo de Associação Genômica Ampla/métodos , RNA Longo não Codificante/genética , Software , Teorema de Bayes , Redes Reguladoras de Genes , Humanos , RNA Longo não Codificante/metabolismo

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA