Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Mais filtros











Base de dados
Intervalo de ano de publicação
1.
Interdiscip Sci ; 2024 Sep 04.
Artigo em Inglês | MEDLINE | ID: mdl-39230798

RESUMO

Using genes which have been experimentally-validated for diseases (functions) can develop machine learning methods to predict new disease/function-genes. However, the prediction of both function-genes and disease-genes faces the same problem: there are only certain positive examples, but no negative examples. To solve this problem, we proposed a function/disease-genes prediction algorithm based on network embedding (Variational Graph Auto-Encoders, VGAE) and one-class classification (Fast Minimum Covariance Determinant, Fast-MCD): VGAEMCD. Firstly, we constructed a protein-protein interaction (PPI) network centered on experimentally-validated genes; then VGAE was used to get the embeddings of nodes (genes) in the network; finally, the embeddings were input into the improved deep learning one-class classifier based on Fast-MCD to predict function/disease-genes. VGAEMCD can predict function-gene and disease-gene in a unified way, and only the experimentally-verified genes are needed to provide (no need for expression profile). VGAEMCD outperforms classical one-class classification algorithms in Recall, Precision, F-measure, Specificity, and Accuracy. Further experiments show that seven metrics of VGAEMCD are higher than those of state-of-art function/disease-genes prediction algorithms. The above results indicate that VGAEMCD can well learn the distribution characteristics of positive examples and accurately identify function/disease-genes.

2.
Front Genet ; 14: 1226905, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37576553

RESUMO

Neuropeptides contain more chemical information than other classical neurotransmitters and have multiple receptor recognition sites. These characteristics allow neuropeptides to have a correspondingly higher selectivity for nerve receptors and fewer side effects. Traditional experimental methods, such as mass spectrometry and liquid chromatography technology, still need the support of a complete neuropeptide precursor database and the basic characteristics of neuropeptides. Incomplete neuropeptide precursor and information databases will lead to false-positives or reduce the sensitivity of recognition. In recent years, studies have proven that machine learning methods can rapidly and effectively predict neuropeptides. In this work, we have made a systematic attempt to create an ensemble tool based on four convolution neural network models. These baseline models were separately trained on one-hot encoding, AAIndex, G-gap dipeptide encoding and word2vec and integrated using Gaussian Naive Bayes (NB) to construct our predictor designated NeuroCNN_GNB. Both 5-fold cross-validation tests using benchmark datasets and independent tests showed that NeuroCNN_GNB outperformed other state-of-the-art methods. Furthermore, this novel framework provides essential interpretations that aid the understanding of model success by leveraging the powerful Shapley Additive exPlanation (SHAP) algorithm, thereby highlighting the most important features relevant for predicting neuropeptides.

3.
Math Biosci Eng ; 19(1): 775-791, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34903012

RESUMO

As one of the most significant protein post-translational modifications (PTMs) in eukaryotes, ubiquitylation plays an essential role in regulating diverse cellular functions, such as apoptosis, cell division, DNA repair and replication, intracellular transport and immune reactions. Traditional experimental methods have the defect of being time-consuming, costly and labor-intensive. Therefore, it is highly desired to develop automated computational methods that can recognize potential ubiquitylation sites rapidly and accurately. In this study, we propose a novel predictor, named UPFPSR, for predicting lysine ubiquitylation sites in plant. UPFPSR is developed using multiple physicochemical properties of amino acids and sequence-based statistical information. In order to find a suitable classification algorithm, four traditional algorithms and two deep learning networks are compared, and the random forest with superior performance is selected ultimately. An extensive benchmarking shows that UPFPSR outperforms the most advanced ubiquitylation prediction tool on each measurement indicator, with the accuracy of 77.3%, precision of 75%, recall of 81.7%, F1-score of 0.7824, and AUC of 0.84 on the independent test dataset. The results indicate that UPFPSR can provide new guidance for further experimental study on ubiquitylation. The data sets and source code used in this study are freely available at https://github.com/ysw-sunshine/UPFPSR.


Assuntos
Lisina , Software , Algoritmos , Biologia Computacional/métodos , Lisina/química , Lisina/metabolismo , Processamento de Proteína Pós-Traducional , Ubiquitinação
4.
Sci Rep ; 11(1): 5517, 2021 03 09.
Artigo em Inglês | MEDLINE | ID: mdl-33750838

RESUMO

To further improve the effect of gene modules identification, combining the Newman algorithm in community detection and K-means algorithm framework, a new method of gene module identification, GCNA-Kpca algorithm, was proposed. The core idea of the algorithm was to build a gene co-expression network (GCN) based on gene expression data firstly; Then the Newman algorithm was used to initially identify gene modules based on the topology of GCN, and the number of clusters and clustering centers were determined; Finally the number of clusters and clustering centers were input into the K-means algorithm framework, and the secondary clustering was performed based on the gene expression profile to obtain the final gene modules. The algorithm took into account the role of modularity in the clustering process, and could find the optimal membership module for each gene through multiple iterations. Experimental results showed that the algorithm proposed in this paper had the best performance in error rate, biological significance and CNN classification indicators (Precision, Recall and F-score). The gene module obtained by GCNA-Kpca was used for the task of key gene identification, and these key genes had the highest prognostic significance. Moreover, GCNA-Kpca algorithm was used to identify 10 key genes in hepatocellular carcinoma (HCC): CDC20, CCNB1, EIF4A3, H2AFX, NOP56, RFC4, NOP58, AURKA, PCNA, and FEN1. According to the validation, it was reasonable to speculate that these 10 key genes could be biomarkers for HCC. And NOP56 and NOP58 are key genes for HCC that we discovered for the first time.


Assuntos
Algoritmos , Carcinoma Hepatocelular , Bases de Dados de Ácidos Nucleicos , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Neoplasias Hepáticas , Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/metabolismo , Feminino , Humanos , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/metabolismo , Masculino
5.
PeerJ ; 9: e10594, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33552715

RESUMO

BACKGROUND: Hepatocellular carcinoma (HCC), the main type of liver cancer in human, is one of the most prevalent and deadly malignancies in the world. The present study aimed to identify hub genes and key biological pathways by integrated bioinformatics analysis. METHODS: A bioinformatics pipeline based on gene co-expression network (GCN) analysis was built to analyze the gene expression profile of HCC. Firstly, differentially expressed genes (DEGs) were identified and a GCN was constructed with Pearson correlation analysis. Then, the gene modules were identified with 3 different community detection algorithms, and the correlation analysis between gene modules and clinical indicators was performed. Moreover, we used the Search Tool for the Retrieval of Interacting Genes (STRING) database to construct a protein protein interaction (PPI) network of the key gene module, and we identified the hub genes using nine topology analysis algorithms based on this PPI network. Further, we used the Oncomine analysis, survival analysis, GEO data set and random forest algorithm to verify the important roles of hub genes in HCC. Lastly, we explored the methylation changes of hub genes using another GEO data (GSE73003). RESULTS: Firstly, among the expression profiles, 4,130 up-regulated genes and 471 down-regulated genes were identified. Next, the multi-level algorithm which had the highest modularity divided the GCN into nine gene modules. Also, a key gene module (m1) was identified. The biological processes of GO enrichment of m1 mainly included the processes of mitosis and meiosis and the functions of catalytic and exodeoxyribonuclease activity. Besides, these genes were enriched in the cell cycle and mitotic pathway. Furthermore, we identified 11 hub genes, MCM3, TRMT6, AURKA, CDC20, TOP2A, ECT2, TK1, MCM2, FEN1, NCAPD2 and KPNA2 which played key roles in HCC. The results of multiple verification methods indicated that the 11 hub genes had highly diagnostic efficiencies to distinguish tumors from normal tissues. Lastly, the methylation changes of gene CDC20, TOP2A, TK1, FEN1 in HCC samples had statistical significance (P-value < 0.05). CONCLUSION: MCM3, TRMT6, AURKA, CDC20, TOP2A, ECT2, TK1, MCM2, FEN1, NCAPD2 and KPNA2 could be potential biomarkers or therapeutic targets for HCC. Meanwhile, the metabolic pathway, the cell cycle and mitotic pathway might played vital roles in the progression of HCC.

6.
Medicine (Baltimore) ; 99(49): e22655, 2020 Dec 04.
Artigo em Inglês | MEDLINE | ID: mdl-33285674

RESUMO

To explore the gene modules and key genes of head and neck squamous cell carcinoma (HNSCC), a bioinformatics algorithm based on the gene co-expression network analysis was proposed in this study.Firstly, differentially expressed genes (DEGs) were identified and a gene co-expression network (i-GCN) was constructed with Pearson correlation analysis. Then, the gene modules were identified with 5 different community detection algorithms, and the correlation analysis between gene modules and clinical indicators was performed. Gene Ontology (GO) analysis was used to annotate the biological pathways of the gene modules. Then, the key genes were identified with 2 methods, gene significance (GS) and PageRank algorithm. Moreover, we used the Disgenet database to search the related diseases of the key genes. Lastly, the online software onclnc was used to perform the survival analysis on the key genes and draw survival curves.There were 2600 up-regulated and 1547 down-regulated genes identified in HNSCC. An i-GCN was constructed with Pearson correlation analysis. Then, the i-GCN was divided into 9 gene modules. The result of association analysis showed that, sex was mainly related to mitosis and meiosis processes, event was mainly related to responding to interferons, viruses and T cell differentiation processes, T stage was mainly related to muscle development and contraction, regulation of protein transport activity processes, N stage was mainly related to mitosis and meiosis processes, while M stage was mainly related to responding to interferons and immune response processes. Lastly, 34 key genes were identified, such as CDKN2A, HOXA1, CDC7, PPL, EVPL, PXN, PDGFRB, CALD1, and NUSAP1. Among them, HOXA1, PXN, and NUSAP1 were negatively correlated with the survival prognosis.HOXA1, PXN, and NUSAP1 might play important roles in the progression of HNSCC and severed as potential biomarkers for future diagnosis.


Assuntos
Redes Reguladoras de Genes/fisiologia , Neoplasias de Cabeça e Pescoço/genética , Carcinoma de Células Escamosas de Cabeça e Pescoço/genética , Divisão do Núcleo Celular/fisiologia , Biologia Computacional/métodos , Regulação para Baixo , Ontologia Genética , Neoplasias de Cabeça e Pescoço/imunologia , Humanos , Fatores Sexuais , Carcinoma de Células Escamosas de Cabeça e Pescoço/imunologia , Linfócitos T/metabolismo , Regulação para Cima
7.
Sensors (Basel) ; 20(7)2020 Mar 30.
Artigo em Inglês | MEDLINE | ID: mdl-32235653

RESUMO

Daily activity forecasts play an important role in the daily lives of residents in smart homes. Category forecasts and occurrence time forecasts of daily activity are two key tasks. Category forecasts of daily activity are correlated with occurrence time forecasts, however, existing research has only focused on one of the two tasks. Moreover, the performance of daily activity forecasts is low when the two tasks are performed in series. In this paper, a forecast model based on multi-task learning is proposed to forecast category and occurrence time of daily activity mutually and iteratively. Firstly, raw sensor events are pre-processed to form a feature space of daily activity. Secondly, a parallel multi-task learning model which combines a convolutional neural network (CNN) with bidirectional long short-term memory (Bi-LSTM) units are developed as the forecast model. Finally, five distinct datasets are used to evaluate the proposed model. The experimental results show that compared with the state-of-the-art single-task learning models, this model improves accuracy by at least 2.22%, and the metrics of NMAE, NRMSE and R2 are improved by at least 1.542%, 7.79% and 1.69%, respectively.


Assuntos
Atividades Cotidianas , Aprendizado Profundo , Redes Neurais de Computação , Previsões , Humanos
8.
Sensors (Basel) ; 15(11): 29129-48, 2015 Nov 17.
Artigo em Inglês | MEDLINE | ID: mdl-26593923

RESUMO

Wireless sensor networks are widely used to monitor valuable objects such as rare animals or armies. Once an object is detected, the source, i.e., the sensor nearest to the object, generates and periodically sends a packet about the object to the base station. Since attackers can capture the object by localizing the source, many protocols have been proposed to protect source location. Instead of transmitting the packet to the base station directly, typical source location protection protocols first transmit packets randomly for a few hops to a phantom location, and then forward the packets to the base station. The problem with these protocols is that the generated phantom locations are usually not only near the true source but also close to each other. As a result, attackers can easily trace a route back to the source from the phantom locations. To address the above problem, we propose a new protocol for source location protection based on limited flooding, named SLP. Compared with existing protocols, SLP can generate phantom locations that are not only far away from the source, but also widely distributed. It improves source location security significantly with low communication cost. We further propose a protocol, namely SLP-E, to protect source location against more powerful attackers with wider fields of vision. The performance of our SLP and SLP-E are validated by both theoretical analysis and simulation results.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA