Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 17 de 17
Filter
Add more filters










Publication year range
1.
Methods ; 226: 71-77, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38641084

ABSTRACT

Biomedical Named Entity Recognition (BioNER) is one of the most basic tasks in biomedical text mining, which aims to automatically identify and classify biomedical entities in text. Recently, deep learning-based methods have been applied to Biomedical Named Entity Recognition and have shown encouraging results. However, many biological entities are polysemous and ambiguous, which is one of the main obstacles to the task of biomedical named entity recognition. Deep learning methods require large amounts of training data, so the lack of data also affect the performance of model recognition. To solve the problem of polysemous words and insufficient data, for the task of biomedical named entity recognition, we propose a multi-task learning framework fused with language model based on the BiLSTM-CRF architecture. Our model uses a language model to design a differential encoding of the context, which could obtain dynamic word vectors to distinguish words in different datasets. Moreover, we use a multi-task learning method to collectively share the dynamic word vector of different types of entities to improve the recognition performance of each type of entity. Experimental results show that our model reduces the false positives caused by polysemous words through differentiated coding, and improves the performance of each subtask by sharing information between different entity data. Compared with other state-of-the art methods, our model achieved superior results in four typical training sets, and achieved the best results in F1 values.


Subject(s)
Data Mining , Deep Learning , Data Mining/methods , Humans , Natural Language Processing , Neural Networks, Computer , Language
2.
Front Physiol ; 14: 1241370, 2023.
Article in English | MEDLINE | ID: mdl-38028809

ABSTRACT

Recent studies on medical image fusion based on deep learning have made remarkable progress, but the common and exclusive features of different modalities, especially their subsequent feature enhancement, are ignored. Since medical images of different modalities have unique information, special learning of exclusive features should be designed to express the unique information of different modalities so as to obtain a medical fusion image with more information and details. Therefore, we propose an attention mechanism-based disentangled representation network for medical image fusion, which designs coordinate attention and multimodal attention to extract and strengthen common and exclusive features. First, the common and exclusive features of each modality were obtained by the cross mutual information and adversarial objective methods, respectively. Then, coordinate attention is focused on the enhancement of the common and exclusive features of different modalities, and the exclusive features are weighted by multimodal attention. Finally, these two kinds of features are fused. The effectiveness of the three innovation modules is verified by ablation experiments. Furthermore, eight comparison methods are selected for qualitative analysis, and four metrics are used for quantitative comparison. The values of the four metrics demonstrate the effect of the DRCM. Furthermore, the DRCM achieved better results on SCD, Nabf, and MS-SSIM metrics, which indicates that the DRCM achieved the goal of further improving the visual quality of the fused image with more information from source images and less noise. Through the comprehensive comparison and analysis of the experimental results, it was found that the DRCM outperforms the comparison method.

3.
Comput Biol Med ; 166: 107531, 2023 Oct 04.
Article in English | MEDLINE | ID: mdl-37806056

ABSTRACT

Medical images with different modalities have different semantic characteristics. Medical image fusion aiming to promotion of the visual quality and practical value has become important in medical diagnostics. However, the previous methods do not fully represent semantic and visual features, and the model generalization ability needs to be improved. Furthermore, the brightness-stacking phenomenon is easy to occur during the fusion process. In this paper, we propose an asymmetric dual deep network with sharing mechanism (ADDNS) for medical image fusion. In our asymmetric model-level dual framework, primal Unet part learns to fuse medical images of different modality into a fusion image, while dual Unet part learns to invert the fusion task for multi-modal image reconstruction. This asymmetry of network settings not only enables the ADDNS to fully extract semantic and visual features, but also reduces the model complexity and accelerates the convergence. Furthermore, the sharing mechanism designed according to task relevance also reduces the model complexity and improves the generalization ability of our model. In the end, we use the intermediate supervision method to minimize the difference between fusion image and source images so as to prevent the brightness-stacking problem. Experimental results show that our algorithm achieves better results on both quantitative and qualitative experiments than several state-of-the-art methods.

4.
Int J Mol Sci ; 23(9)2022 Apr 24.
Article in English | MEDLINE | ID: mdl-35563090

ABSTRACT

Motif occupancy identification is a binary classification task predicting the binding of DNA motif instances to transcription factors, for which several sequence-based methods have been proposed. However, through direct training, these end-to-end methods are lack of biological interpretability within their sequence representations. In this work, we propose a contrastive learning method to pre-train interpretable and robust DNA encoding for motif occupancy identification. We construct two alternative models to pre-train DNA sequential encoder, respectively: a self-supervised model and a supervised model. We augment the original sequences for contrastive learning with edit operations defined in edit distance. Specifically, we propose a sequence similarity criterion based on the Needleman-Wunsch algorithm to discriminate positive and negative sample pairs in self-supervised learning. Finally, a DNN classifier is fine-tuned along with the pre-trained encoder to predict the results of motif occupancy identification. Both proposed contrastive learning models outperform the baseline end-to-end CNN model and SimCLR method, reaching AUC of 0.811 and 0.823, respectively. Compared with the baseline method, our models show better robustness for small samples. Specifically, the self-supervised model is proved to be practicable in transfer learning.


Subject(s)
Algorithms
5.
Front Genet ; 13: 885627, 2022.
Article in English | MEDLINE | ID: mdl-35432476

ABSTRACT

Background Classification and annotation of enzyme proteins are fundamental for enzyme research on biological metabolism. Enzyme Commission (EC) numbers provide a standard for hierarchical enzyme class prediction, on which several computational methods have been proposed. However, most of these methods are dependent on prior distribution information and none explicitly quantifies amino-acid-level relations and possible contribution of sub-sequences. Methods In this study, we propose a double-scale attention enzyme class prediction model named DAttProt with high reusability and interpretability. DAttProt encodes sequence by self-supervised Transformer encoders in pre-training and gathers local features by multi-scale convolutions in fine-tuning. Specially, a probabilistic double-scale attention weight matrix is designed to aggregate multi-scale features and positional prediction scores. Finally, a full connection linear classifier conducts a final inference through the aggregated features and prediction scores. Results On DEEPre and ECPred datasets, DAttProt performs as competitive with the compared methods on level 0 and outperforms them on deeper task levels, reaching 0.788 accuracy on level 2 of DEEPre and 0.967 macro-F 1 on level 1 of ECPred. Moreover, through case study, we demonstrate that the double-scale attention matrix learns to discover and focus on the positions and scales of bio-functional sub-sequences in the protein. Conclusion Our DAttProt provides an effective and interpretable method for enzyme class prediction. It can predict enzyme protein classes accurately and furthermore discover enzymatic functional sub-sequences such as protein motifs from both positional and spatial scales.

6.
Comput Methods Programs Biomed ; 219: 106739, 2022 Jun.
Article in English | MEDLINE | ID: mdl-35344766

ABSTRACT

BACKGROUND AND OBJECTIVE: Early fundus screening and timely treatment of ophthalmology diseases can effectively prevent blindness. Previous studies just focus on fundus images of single eye without utilizing the useful relevant information of the left and right eyes. While clinical ophthalmologists usually use binocular fundus images to help ocular disease diagnosis. Besides, previous works usually target only one ocular diseases at a time. Considering the importance of patient-level bilateral eye diagnosis and multi-label ophthalmic diseases classification, we propose a bilateral feature enhancement network (BFENet) to address the above two problems. METHODS: We propose a two-stream interactive CNN architecture for multi-label ophthalmic diseases classification with bilateral fundus images. Firstly, we design a feature enhancement module, which makes use of the interaction between bilateral fundus images to strengthen the extracted feature information. Specifically, attention mechanism is used to learn the interdependence between local and global information in the designed interactive architecture for two-stream, which leads to the reweighting of these features, and recover more details. In order to capture more disease characteristics, we further design a novel multiscale module, which enriches the feature maps by superimposing feature information of different resolutions images extracted through dilated convolution. RESULTS: In the off-site set, the Kappa, F1, AUC and Final score are 0.535, 0.892, 0.912 and 0.780, respectively. In the on-site set, the Kappa, F1, AUC and Final score are 0.513, 0.886, 0.903 and 0.767 respectively. Comparing with existing methods, BFENet achieves the best classification performance. CONCLUSIONS: Comprehensive experiments are conducted to demonstrate the effectiveness of this proposed model. Besides, our method can be extended to similar tasks where the correlation between different images is important.


Subject(s)
Eye Diseases , Neural Networks, Computer , Diagnostic Techniques, Ophthalmological , Fundus Oculi , Humans
7.
Interdiscip Sci ; 14(1): 182-195, 2022 Mar.
Article in English | MEDLINE | ID: mdl-34536209

ABSTRACT

The severity of fundus arteriosclerosis can be determined and divided into four grades according to fundus images. Automatically grading of the fundus arteriosclerosis is helpful in clinical practices, so this paper proposes a convolutional neural network (CNN) method based on hierarchical attention maps to solve the automatic grading problem. First, we use the retinal vessel segmentation model to separate the important vascular region and the non-vascular background region from the fundus image and obtain two attention maps. The two maps are regarded as inputs to construct a two-stream CNN (TSNet), to focus on feature information through mutual reference between the two regions. In addition, we use convex hull attention maps in the one-stream CNN (OSNet) to learn valuable areas where the retinal vessels are concentrated. Then, we design an integrated OTNet model which is composed of TSNet that learns image feature information and OSNet that learns discriminative areas. After obtaining the representation learning parts of the two networks, we can train the classification layer to achieve better results. Our proposed TSNet reaches the AUC value of 0.796 and the ACC value of 0.592 on the testing set, and the integrated model OTNet reaches the AUC value of 0.806 and the ACC value of 0.606, which are better than the results of other comparable models. As far as we know, this is the first attempt to use deep learning to classify the severity of atherosclerosis in fundus images. The prediction results of our proposed method can be accepted by doctors, which shows that our method has a certain application value.


Subject(s)
Algorithms , Arteriosclerosis , Arteriosclerosis/diagnostic imaging , Attention , Fundus Oculi , Humans , Image Processing, Computer-Assisted , Neural Networks, Computer
8.
Comput Methods Programs Biomed ; 208: 106274, 2021 Sep.
Article in English | MEDLINE | ID: mdl-34325376

ABSTRACT

BACKGROUND AND OBJECTIVE: Arteriosclerosis can reflect the severity of hypertension, which is one of the main diseases threatening human life safety. But Arteriosclerosis retinopathy detection involves costly and time-consuming manual assessment. To meet the urgent needs of automation, this paper developed a novel arteriosclerosis retinopathy grading method based on convolutional neural network. METHODS: Firstly, we propose a good scheme for extracting features facing the fundus blood vessel background using image merging for contour enhancement. In this step, the original image is dealt with adaptive threshold processing to generate the new contour channel, which merge with the original three-channel image. Then, we employ the pre-trained convolutional neural network with transfer learning to speed up training and contour image channel parameter with Kaiming initialization. Moreover, ArcLoss is applied to increase inter-class differences and intra-class similarity aiming to the high similarity of images of different classes in the dataset. RESULTS: The accuracy of arteriosclerosis retinopathy grading achieved by the proposed method is up to 65.354%, which is nearly 4% higher than those of the exiting methods. The Kappa of our method is 0.508 in arteriosclerosis retinopathy grading. CONCLUSIONS: An experimental study on multiple metrics demonstrates the superiority of our method, which will be a useful to the toolbox for arteriosclerosis retinopathy grading.


Subject(s)
Arteriosclerosis , Retinal Diseases , Automation , Fundus Oculi , Humans , Image Processing, Computer-Assisted , Neural Networks, Computer , Retinal Diseases/diagnostic imaging
9.
BMC Bioinformatics ; 22(1): 136, 2021 Mar 21.
Article in English | MEDLINE | ID: mdl-33745450

ABSTRACT

BACKGROUND: Numerous studies have demonstrated that long non-coding RNAs are related to plenty of human diseases. Therefore, it is crucial to predict potential lncRNA-disease associations for disease prognosis, diagnosis and therapy. Dozens of machine learning and deep learning algorithms have been adopted to this problem, yet it is still challenging to learn efficient low-dimensional representations from high-dimensional features of lncRNAs and diseases to predict unknown lncRNA-disease associations accurately. RESULTS: We proposed an end-to-end model, VGAELDA, which integrates variational inference and graph autoencoders for lncRNA-disease associations prediction. VGAELDA contains two kinds of graph autoencoders. Variational graph autoencoders (VGAE) infer representations from features of lncRNAs and diseases respectively, while graph autoencoders propagate labels via known lncRNA-disease associations. These two kinds of autoencoders are trained alternately by adopting variational expectation maximization algorithm. The integration of both the VGAE for graph representation learning, and the alternate training via variational inference, strengthens the capability of VGAELDA to capture efficient low-dimensional representations from high-dimensional features, and hence promotes the robustness and preciseness for predicting unknown lncRNA-disease associations. Further analysis illuminates that the designed co-training framework of lncRNA and disease for VGAELDA solves a geometric matrix completion problem for capturing efficient low-dimensional representations via a deep learning approach. CONCLUSION: Cross validations and numerical experiments illustrate that VGAELDA outperforms the current state-of-the-art methods in lncRNA-disease association prediction. Case studies indicate that VGAELDA is capable of detecting potential lncRNA-disease associations. The source code and data are available at https://github.com/zhanglabNKU/VGAELDA .


Subject(s)
Computational Biology , RNA, Long Noncoding , Algorithms , Humans , Machine Learning , RNA, Long Noncoding/genetics , Software
10.
BMC Bioinformatics ; 20(1): 730, 2019 Dec 23.
Article in English | MEDLINE | ID: mdl-31870282

ABSTRACT

BACKGROUND: Antibiotic resistance has become an increasingly serious problem in the past decades. As an alternative choice, antimicrobial peptides (AMPs) have attracted lots of attention. To identify new AMPs, machine learning methods have been commonly used. More recently, some deep learning methods have also been applied to this problem. RESULTS: In this paper, we designed a deep learning model to identify AMP sequences. We employed the embedding layer and the multi-scale convolutional network in our model. The multi-scale convolutional network, which contains multiple convolutional layers of varying filter lengths, could utilize all latent features captured by the multiple convolutional layers. To further improve the performance, we also incorporated additional information into the designed model and proposed a fusion model. Results showed that our model outperforms the state-of-the-art models on two AMP datasets and the Antimicrobial Peptide Database (APD)3 benchmark dataset. The fusion model also outperforms the state-of-the-art model on an anti-inflammatory peptides (AIPs) dataset at the accuracy. CONCLUSIONS: Multi-scale convolutional network is a novel addition to existing deep neural network (DNN) models. The proposed DNN model and the modified fusion model outperform the state-of-the-art models for new AMP discovery. The source code and data are available at https://github.com/zhanglabNKU/APIN.


Subject(s)
Neural Networks, Computer , Peptides/chemistry , Humans
11.
BMC Bioinformatics ; 20(Suppl 16): 589, 2019 Dec 02.
Article in English | MEDLINE | ID: mdl-31787083

ABSTRACT

BACKGROUND: Predicting disease-related genes is helpful for understanding the disease pathology and the molecular mechanisms during the disease progression. However, traditional methods are not suitable for screening genes related to the disease development, because there are some samples with weak label information in the disease dataset and a small number of genes are known disease-related genes. RESULTS: We designed a disease-related gene mining method based on the weakly supervised learning model in this paper. The method is separated into two steps. Firstly, the differentially expressed genes are screened based on the weakly supervised learning model. In the model, the strong and weak label information at different stages of the disease progression is fully utilized. The obtained differentially expressed gene set is stable and complete after the algorithm converges. Then, we screen disease-related genes in the obtained differentially expressed gene set using transductive support vector machine based on the difference kernel function. The difference kernel function can map the input space of the original Huntington's disease gene expression dataset to the difference space. The relation between the two genes can be evaluated more accurately in the difference space and the known disease-related gene information can be used effectively. CONCLUSIONS: The experimental results show that the disease-related gene mining method based on the weakly supervised learning model can effectively improve the precision of the disease-related gene prediction compared with other excellent methods.


Subject(s)
Algorithms , Data Mining , Disease/genetics , Disease Progression , Gene Expression Regulation , Humans , ROC Curve , Support Vector Machine
12.
Front Genet ; 10: 1077, 2019.
Article in English | MEDLINE | ID: mdl-31781160

ABSTRACT

Gene expression profiling has been widely used to characterize cell status to reflect the health of the body, to diagnose genetic diseases, etc. In recent years, although the cost of genome-wide expression profiling is gradually decreasing, the cost of collecting expression profiles for thousands of genes is still very high. Considering gene expressions are usually highly correlated in humans, the expression values of the remaining target genes can be predicted by analyzing the values of 943 landmark genes. Hence, we designed an algorithm for predicting gene expression values based on XGBoost, which integrates multiple tree models and has stronger interpretability. We tested the performance of XGBoost model on the GEO dataset and RNA-seq dataset and compared the result with other existing models. Experiments showed that the XGBoost model achieved a significantly lower overall error than the existing D-GEX algorithm, linear regression, and KNN methods. In conclusion, the XGBoost algorithm outperforms existing models and will be a significant contribution to the toolbox for gene expression value prediction.

13.
IEEE/ACM Trans Comput Biol Bioinform ; 16(6): 1948-1957, 2019.
Article in English | MEDLINE | ID: mdl-29993985

ABSTRACT

Recently, non-negative matrix factorization (NMF) has been shown to perform well in the analysis of omics data. NMF assumes that the expression level of one gene is a linear additive composition of metagenes. The elements in metagene matrix represent the regulation effects and are restricted to non-negativity. However, according to the real biological meaning, there are two kinds of regulation effects, i.e., up-regulation and down-regulation. Few methods based on NMF have considered this biological meaning. Therefore, we designed a flexible non-negative matrix factorization (FNMF) algorithm by further considering the biological meaning of gene expression data. It allows negative numbers in the metagene matrix, and negative numbers represent down-regulation effects. We separated gene expression data into disease-driven gene expression and background gene expression. Subsequently, we computed disease-driven gene relative expression, and a ranked list of genes was obtained. The top ranked genes are considered to be involved in some disease-related biological processes. Experimental results on two real-world gene expression data demonstrate the feasibility and effectiveness of FNMF. Compared with conventional disease-related gene identification algorithms, FNMF has superior performance in analyzing gene expression data of diseases with complex pathology.


Subject(s)
Computational Biology/methods , Diabetes Mellitus, Type 2/genetics , Gene Expression Regulation , Huntington Disease/genetics , Algorithms , Animals , Area Under Curve , Diabetes Mellitus, Type 2/metabolism , Disease Progression , Gene Expression Profiling , Genomics , Humans , Huntington Disease/metabolism , Linear Models , Mice , Phenotype , ROC Curve , Reproducibility of Results , Signal Transduction
14.
Genes (Basel) ; 9(7)2018 Jul 12.
Article in English | MEDLINE | ID: mdl-30002337

ABSTRACT

Due to the complexity of the pathological mechanisms of neurodegenerative diseases, traditional differentially-expressed gene selection methods cannot detect disease-associated genes accurately. Recent studies have shown that consensus-guided unsupervised feature selection (CGUFS) performs well in feature selection for identifying disease-associated genes. Since the random initialization of the feature selection matrix in CGUFS results in instability of the final disease-associated gene set, for the purposes of this study we proposed an ensemble method based on CGUFS-namely, ensemble consensus-guided unsupervised feature selection (ECGUFS) in order to further improve the accuracy of disease-associated genes and the stability of feature gene sets. We also proposed a bagging integration strategy to integrate the results of CGUFS. Lastly, we conducted experiments with Huntington's disease RNA sequencing (RNA-Seq) data and obtained the final feature gene set, where we detected 287 disease-associated genes. Enrichment analysis on these genes has shown that postsynaptic density and the postsynaptic membrane, synapse, and cell junction are all affected during the disease's progression. However, ECGUFS greatly improved the accuracy of disease-associated gene prediction and the stability of the disease-associated gene set. We conducted a classification of samples with labels based on the linear support vector machine with 10-fold cross-validation. The average accuracy is 0.9, which suggests the effectiveness of the feature gene set.

15.
BMC Bioinformatics ; 18(1): 447, 2017 Oct 11.
Article in English | MEDLINE | ID: mdl-29020921

ABSTRACT

BACKGROUND: Predicting disease-associated genes is helpful for understanding the molecular mechanisms during the disease progression. Since the pathological mechanisms of neurodegenerative diseases are very complex, traditional statistic-based methods are not suitable for identifying key genes related to the disease development. Recent studies have shown that the computational models with deep structure can learn automatically the features of biological data, which is useful for exploring the characteristics of gene expression during the disease progression. RESULTS: In this paper, we propose a deep learning approach based on the restricted Boltzmann machine to analyze the RNA-seq data of Huntington's disease, namely stacked restricted Boltzmann machine (SRBM). According to the SRBM, we also design a novel framework to screen the key genes during the Huntington's disease development. In this work, we assume that the effects of regulatory factors can be captured by the hierarchical structure and narrow hidden layers of the SRBM. First, we select disease-associated factors with different time period datasets according to the differentially activated neurons in hidden layers. Then, we select disease-associated genes according to the changes of the gene energy in SRBM at different time periods. CONCLUSIONS: The experimental results demonstrate that SRBM can detect the important information for differential analysis of time series gene expression datasets. The identification accuracy of the disease-associated genes is improved to some extent using the novel framework. Moreover, the prediction precision of disease-associated genes for top ranking genes using SRBM is effectively improved compared with that of the state of the art methods.


Subject(s)
Algorithms , Genetic Association Studies , Genetic Predisposition to Disease , Huntington Disease/genetics , Sequence Analysis, RNA , Cluster Analysis , Computer Simulation , Disease Progression , Gene Expression Regulation , Humans , Molecular Sequence Annotation , Neurons/pathology , ROC Curve
16.
PLoS One ; 12(5): e0178006, 2017.
Article in English | MEDLINE | ID: mdl-28542379

ABSTRACT

Detecting disease-related gene modules by analyzing gene expression data is of great significance. It is helpful for exploratory analysis of the interaction mechanisms of genes under complex disease phenotypes. The multi-label propagation algorithm (MLPA) has been widely used in module detection for its fast and easy implementation. The accuracy of MLPA greatly depends on the connections between nodes, and most existing research focuses on measuring the similarity between nodes. However, MLPA does not perform well with loose connections between disease-related genes. Moreover, the biological significance of modules obtained by MLPA has not been demonstrated. To solve these problems, we designed a double label propagation clustering algorithm (DLPCA) based on MLPA to study Huntington's disease. In DLPCA, in addition to category labels, we introduced pathogenic labels to supervise the process of multi-label propagation clustering. The pathogenic labels contain pathogenic information about disease genes and the hierarchical structure of gene expression data. Experimental results demonstrated the superior performance of DLPCA compared with other conventional gene-clustering algorithms.


Subject(s)
Algorithms , Computational Biology/methods , Disease Models, Animal , Gene Regulatory Networks , Huntington Disease/genetics , Animals , Cluster Analysis , Gene Expression Profiling , Genetic Markers , Humans , Mice , Models, Theoretical
17.
Biomed Res Int ; 2016: 3962761, 2016.
Article in English | MEDLINE | ID: mdl-28042568

ABSTRACT

Screening disease-related genes by analyzing gene expression data has become a popular theme. Traditional disease-related gene selection methods always focus on identifying differentially expressed gene between case samples and a control group. These traditional methods may not fully consider the changes of interactions between genes at different cell states and the dynamic processes of gene expression levels during the disease progression. However, in order to understand the mechanism of disease, it is important to explore the dynamic changes of interactions between genes in biological networks at different cell states. In this study, we designed a novel framework to identify disease-related genes and developed a differentially coexpressed disease-related gene identification method based on gene coexpression network (DCGN) to screen differentially coexpressed genes. We firstly constructed phase-specific gene coexpression network using time-series gene expression data and defined the conception of differential coexpression of genes in coexpression network. Then, we designed two metrics to measure the value of gene differential coexpression according to the change of local topological structures between different phase-specific networks. Finally, we conducted meta-analysis of gene differential coexpression based on the rank-product method. Experimental results demonstrated the feasibility and effectiveness of DCGN and the superior performance of DCGN over other popular disease-related gene selection methods through real-world gene expression data sets.


Subject(s)
Disease/genetics , Gene Expression Regulation/genetics , Gene Regulatory Networks/genetics , Models, Theoretical , Algorithms , Databases, Genetic , Humans , Oligonucleotide Array Sequence Analysis
SELECTION OF CITATIONS
SEARCH DETAIL
...