Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 271
Filter
1.
Interdiscip Sci ; 2024 Apr 04.
Article in English | MEDLINE | ID: mdl-38573456

ABSTRACT

Autism Spectrum Disorder (ASD) is defined as a neurodevelopmental condition distinguished by unconventional neural activities. Early intervention is key to managing the progress of ASD, and current research primarily focuses on the use of structural magnetic resonance imaging (sMRI) or resting-state functional magnetic resonance imaging (rs-fMRI) for diagnosis. Moreover, the use of autoencoders for disease classification has not been sufficiently explored. In this study, we introduce a new framework based on autoencoder, the Deep Canonical Correlation Fusion algorithm based on Denoising Autoencoder (DCCF-DAE), which proves to be effective in handling high-dimensional data. This framework involves efficient feature extraction from different types of data with an advanced autoencoder, followed by the fusion of these features through the DCCF model. Then we utilize the fused features for disease classification. DCCF integrates functional and structural data to help accurately diagnose ASD and identify critical Regions of Interest (ROIs) in disease mechanisms. We compare the proposed framework with other methods by the Autism Brain Imaging Data Exchange (ABIDE) database and the results demonstrate its outstanding performance in ASD diagnosis. The superiority of DCCF-DAE highlights its potential as a crucial tool for early ASD diagnosis and monitoring.

2.
Brief Bioinform ; 25(2)2024 Jan 22.
Article in English | MEDLINE | ID: mdl-38517696

ABSTRACT

With the rapid development of single-molecule sequencing (SMS) technologies, the output read length is continuously increasing. Mapping such reads onto a reference genome is one of the most fundamental tasks in sequence analysis. Mapping sensitivity is becoming a major concern since high sensitivity can detect more aligned regions on the reference and obtain more aligned bases, which are useful for downstream analysis. In this study, we present pathMap, a novel k-mer graph-based mapper that is specifically designed for mapping SMS reads with high sensitivity. By viewing the alignment chain as a path containing as many anchors as possible in the matched k-mer graph, pathMap treats chaining as a path selection problem in the directed graph. pathMap iteratively searches the longest path in the remaining nodes; more candidate chains with high quality can be effectively detected and aligned. Compared to other state-of-the-art mapping methods such as minimap2 and Winnowmap2, experiment results on simulated and real-life datasets demonstrate that pathMap obtains the number of mapped chains at least 11.50% more than its closest competitor and increases the mapping sensitivity by 17.28% and 13.84% of bases over the next-best mapper for Pacific Biosciences and Oxford Nanopore sequencing data, respectively. In addition, pathMap is more robust to sequence errors and more sensitive to species- and strain-specific identification of pathogens using MinION reads.


Subject(s)
High-Throughput Nucleotide Sequencing , Nanopore Sequencing , Sequence Analysis, DNA/methods , High-Throughput Nucleotide Sequencing/methods , Genome , Software , Algorithms
3.
Comput Biol Med ; 171: 108153, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38364660

ABSTRACT

Cervical cytology image classification is of great significance to the cervical cancer diagnosis and prognosis. Recently, convolutional neural network (CNN) and visual transformer have been adopted as two branches to learn the features for image classification by simply adding local and global features. However, such the simple addition may not be effective to integrate these features. In this study, we explore the synergy of local and global features for cytology images for classification tasks. Specifically, we design a Deep Integrated Feature Fusion (DIFF) block to synergize local and global features of cytology images from a CNN branch and a transformer branch. Our proposed method is evaluated on three cervical cell image datasets (SIPaKMeD, CRIC, Herlev) and another large blood cell dataset BCCD for several multi-class and binary classification tasks. Experimental results demonstrate the effectiveness of the proposed method in cervical cell classification, which could assist medical specialists to better diagnose cervical cancer.


Subject(s)
Uterine Cervical Neoplasms , Female , Humans , Learning , Neural Networks, Computer , Image Processing, Computer-Assisted
4.
IEEE Trans Med Imaging ; 43(4): 1554-1567, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38096101

ABSTRACT

The short frames of low-count positron emission tomography (PET) images generally cause high levels of statistical noise. Thus, improving the quality of low-count images by using image postprocessing algorithms to achieve better clinical diagnoses has attracted widespread attention in the medical imaging community. Most existing deep learning-based low-count PET image enhancement methods have achieved satisfying results, however, few of them focus on denoising low-count PET images with the magnetic resonance (MR) image modality as guidance. The prior context features contained in MR images can provide abundant and complementary information for single low-count PET image denoising, especially in ultralow-count (2.5%) cases. To this end, we propose a novel two-stream dual PET/MR cross-modal interactive fusion network with an optical flow pre-alignment module, namely, OIF-Net. Specifically, the learnable optical flow registration module enables the spatial manipulation of MR imaging inputs within the network without any extra training supervision. Registered MR images fundamentally solve the problem of feature misalignment in the multimodal fusion stage, which greatly benefits the subsequent denoising process. In addition, we design a spatial-channel feature enhancement module (SC-FEM) that considers the interactive impacts of multiple modalities and provides additional information flexibility in both the spatial and channel dimensions. Furthermore, instead of simply concatenating two extracted features from these two modalities as an intermediate fusion method, the proposed cross-modal feature fusion module (CM-FFM) adopts cross-attention at multiple feature levels and greatly improves the two modalities' feature fusion procedure. Extensive experimental assessments conducted on real clinical datasets, as well as an independent clinical testing dataset, demonstrate that the proposed OIF-Net outperforms the state-of-the-art methods.


Subject(s)
Image Processing, Computer-Assisted , Optic Flow , Image Processing, Computer-Assisted/methods , Positron-Emission Tomography/methods , Magnetic Resonance Imaging/methods , Brain/diagnostic imaging
5.
Bioinformatics ; 39(12)2023 12 01.
Article in English | MEDLINE | ID: mdl-38058196

ABSTRACT

MOTIVATION: Longer reads produced by PacBio or Oxford Nanopore sequencers could more frequently span the breakpoints of structural variations (SVs) than shorter reads. Therefore, existing long-read mapping methods often generate wrong alignments and variant calls. Compared to deletions and insertions, inversion events are more difficult to be detected since the anchors in inversion regions are nonlinear to those in SV-free regions. To address this issue, this study presents a novel long-read mapping algorithm (named as invMap). RESULTS: For each long noisy read, invMap first locates the aligned region with a specifically designed scoring method for chaining, then checks the remaining anchors in the aligned region to discover potential inversions. We benchmark invMap on simulated datasets across different genomes and sequencing coverages, experimental results demonstrate that invMap is more accurate to locate aligned regions and call SVs for inversions than the competing methods. The real human genome sequencing dataset of NA12878 illustrates that invMap can effectively find more candidate variant calls for inversions than the competing methods. AVAILABILITY AND IMPLEMENTATION: The invMap software is available at https://github.com/zhang134/invMap.git.


Subject(s)
Genomics , High-Throughput Nucleotide Sequencing , Humans , Genomics/methods , High-Throughput Nucleotide Sequencing/methods , Software , Algorithms , Genome, Human , Chromosome Inversion , Sequence Analysis, DNA/methods
6.
PLoS One ; 18(11): e0294772, 2023.
Article in English | MEDLINE | ID: mdl-38019798

ABSTRACT

Alzheimer's disease (AD) is a common neurodegenerative disease having complex pathogenesis, approved drugs can only alleviate symptoms of AD for a period of time. Traditional Chinese medicine (TCM) contains multiple active ingredients that can act on multiple targets simultaneously. In this paper, a novel algorithm based on entropy and random walk with the restart of heterogeneous network (RWRHE) is proposed for predicting active ingredients for AD and screening out the effective TCMs for AD. First, Six TCM compounds containing 20 herbs from the AD drug reviews in the CNKI (China National Knowledge Internet) are collected, their active ingredients and targets are retrieved from different databases. Then, comprehensive similarity networks of active ingredients and targets are constructed based on different aspects and entropy weight, respectively. A comprehensive heterogeneous network is constructed by integrating the known active ingredient-target association information and two comprehensive similarity networks. Subsequently, bi-random walks are applied on the heterogeneous network to predict active ingredient-target associations. AD related targets are selected as the seed nodes, a random walk is carried out on the target similarity network to predict the AD-target associations, and the associations of AD-active ingredients are inferred and scored. The effective herbs and compounds for AD are screened out based on their active ingredients' scores. The results measured by machine learning and bioinformatics show that the RWRHE algorithm achieves better prediction accuracy, the top 15 active ingredients may act as multi-target agents in the prevention and treatment of AD, Danshen, Gouteng and Chaihu are recommended as effective TCMs for AD, Yiqitongyutang is recommended as effective compound for AD.


Subject(s)
Alzheimer Disease , Drugs, Chinese Herbal , Neurodegenerative Diseases , Humans , Medicine, Chinese Traditional , Alzheimer Disease/drug therapy , Entropy , Network Pharmacology , Neurodegenerative Diseases/drug therapy , Drugs, Chinese Herbal/pharmacology , Drugs, Chinese Herbal/therapeutic use , Molecular Docking Simulation
7.
Comput Biol Med ; 165: 107473, 2023 10.
Article in English | MEDLINE | ID: mdl-37690288

ABSTRACT

BACKGROUND: Synchrotron radiation computed tomography (SR-CT) holds promise for high-resolution in vivo imaging. Notably, the reconstruction of SR-CT images necessitates a large set of data to be captured with sufficient photons from multiple angles, resulting in high radiation dose received by the object. Reducing the number of projections and/or photon flux is a straightforward means to lessen the radiation dose, however, compromises data completeness, thus introducing noises and artifacts. Deep learning (DL)-based supervised methods effectively denoise and remove artifacts, but they heavily depend on high-quality paired data acquired at high doses. Although algorithms exist for training without high-quality references, they struggle to effectively eliminate persistent artifacts present in real-world data. METHODS: This work presents a novel low-dose imaging strategy namely Sparse2Noise, which combines the reconstruction data from paired sparse-view CT scan (normal-flux) and full-view CT scan (low-flux) using a convolutional neural network (CNN). Sparse2Noise does not require high-quality reconstructed data as references and allows for fresh training on data with very small size. Sparse2Noise was evaluated by both simulated and experimental data. RESULTS: Sparse2Noise effectively reduces noise and ring artifacts while maintaining high image quality, outperforming state-of-the-art image denoising methods at same dose levels. Furthermore, Sparse2Noise produces impressive high image quality for ex vivo rat hindlimb imaging with the acceptable low radiation dose (i.e., 0.5 Gy with the isotropic voxel size of 26 µm). CONCLUSIONS: This work represents a significant advance towards in vivo SR-CT imaging. It is noteworthy that Sparse2Noise can also be used for denoising in conventional CT and/or phase-contrast CT.


Subject(s)
Synchrotrons , Tomography, X-Ray Computed , Animals , Rats , Photons , Algorithms , Artifacts
8.
Methods ; 216: 21-38, 2023 08.
Article in English | MEDLINE | ID: mdl-37315825

ABSTRACT

Single-cell RNA-sequencing (scRNA-seq) data suffer from a lot of zeros. Such dropout events impede the downstream data analyses. We propose BayesImpute to infer and impute dropouts from the scRNA-seq data. Using the expression rate and coefficient of variation of the genes within the cell subpopulation, BayesImpute first determines likely dropouts, and then constructs the posterior distribution for each gene and uses the posterior mean to impute dropout values. Some simulated and real experiments show that BayesImpute can effectively identify dropout events and reduce the introduction of false positive signals. Additionally, BayesImpute successfully recovers the true expression levels of missing values, restores the gene-to-gene and cell-to-cell correlation coefficient, and maintains the biological information in bulk RNA-seq data. Furthermore, BayesImpute boosts the clustering and visualization of cell subpopulations and improves the identification of differentially expressed genes. We further demonstrate that, in comparison to other statistical-based imputation methods, BayesImpute is scalable and fast with minimal memory usage.


Subject(s)
Single-Cell Gene Expression Analysis , Software , Sequence Analysis, RNA/methods , Bayes Theorem , Single-Cell Analysis/methods , Probability , Gene Expression Profiling
9.
Micromachines (Basel) ; 14(6)2023 May 31.
Article in English | MEDLINE | ID: mdl-37374757

ABSTRACT

To develop standard optical biosensors, the simulation procedure takes a lot of time. For reducing that enormous amount of time and effort, machine learning might be a better solution. Effective indices, core power, total power, and effective area are the most crucial parameters for evaluating optical sensors. In this study, several machine learning (ML) approaches have been applied to predict those parameters while considering the core radius, cladding radius, pitch, analyte, and wavelength as the input vectors. We have utilized least squares (LS), LASSO, Elastic-Net (ENet), and Bayesian ridge regression (BRR) to make a comparative discussion using a balanced dataset obtained with the COMSOL Multiphysics simulation tool. Furthermore, a more extensive analysis of sensitivity, power fraction, and confinement loss is also demonstrated using the predicted and simulated data. The suggested models were also examined in terms of R2-score, mean average error (MAE), and mean squared error (MSE), with all of the models having an R2-score of more than 0.99, and it was also shown that optical biosensors had a design error rate of less than 3%. This research might pave the way for machine learning-based optimization approaches to be used to improve optical biosensors.

10.
Neural Netw ; 165: 553-561, 2023 Aug.
Article in English | MEDLINE | ID: mdl-37354807

ABSTRACT

Liver disease is a potentially asymptomatic clinical entity that may progress to patient death. This study proposes a multi-modal deep neural network for multi-class malignant liver diagnosis. In parallel with the portal venous computed tomography (CT) scans, pathology data is utilized to prognosticate primary liver cancer variants and metastasis. The processed CT scans are fed to the deep dilated convolution neural network to explore salient features. The residual connections are further added to address vanishing gradient problems. Correspondingly, five pathological features are learned using a wide and deep network that gives a benefit of memorization with generalization. The down-scaled hierarchical features from CT scan and pathology data are concatenated to pass through fully connected layers for classification between liver cancer variants. In addition, the transfer learning of pre-trained deep dilated convolution layers assists in handling insufficient and imbalanced dataset issues. The fine-tuned network can predict three-class liver cancer variants with an average accuracy of 96.06% and an Area Under Curve (AUC) of 0.832. To the best of our knowledge, this is the first study to classify liver cancer variants by integrating pathology and image data, hence following the medical perspective of malignant liver diagnosis. The comparative analysis on the benchmark dataset shows that the proposed multi-modal neural network outperformed most of the liver diagnostic studies and is comparable to others.


Subject(s)
Deep Learning , Liver Neoplasms , Humans , Neural Networks, Computer , Liver Neoplasms/diagnostic imaging , Diagnosis, Computer-Assisted/methods
11.
Front Genet ; 14: 1133775, 2023.
Article in English | MEDLINE | ID: mdl-37144127

ABSTRACT

Introduction: The physical interactions between enhancers and promoters are often involved in gene transcriptional regulation. High tissue-specific enhancer-promoter interactions (EPIs) are responsible for the differential expression of genes. Experimental methods are time-consuming and labor-intensive in measuring EPIs. An alternative approach, machine learning, has been widely used to predict EPIs. However, most existing machine learning methods require a large number of functional genomic and epigenomic features as input, which limits the application to different cell lines. Methods: In this paper, we developed a random forest model, HARD (H3K27ac, ATAC-seq, RAD21, and Distance), to predict EPI using only four types of features. Results: Independent tests on a benchmark dataset showed that HARD outperforms other models with the fewest features. Discussion: Our results revealed that chromatin accessibility and the binding of cohesin are important for cell-line-specific EPIs. Furthermore, we trained the HARD model in the GM12878 cell line and performed testing in the HeLa cell line. The cross-cell-lines prediction also performs well, suggesting it has the potential to be applied to other cell lines.

12.
IEEE J Biomed Health Inform ; 27(6): 2864-2875, 2023 06.
Article in English | MEDLINE | ID: mdl-37030746

ABSTRACT

The axial field of view (FOV) is a key factor that affects the quality of PET images. Due to hardware FOV restrictions, conventional short-axis PET scanners with FOVs of 20 to 35 cm can acquire only low-quality PET (LQ-PET) images in fast scanning times (2-3 minutes). To overcome hardware restrictions and improve PET image quality for better clinical diagnoses, several deep learning-based algorithms have been proposed. However, these approaches use simple convolution layers with residual learning and local attention, which insufficiently extract and fuse long-range contextual information. To this end, we propose a novel two-branch network architecture with swin transformer units and graph convolution operation, namely SW-GCN. The proposed SW-GCN provides additional spatial- and channel-wise flexibility to handle different types of input information flow. Specifically, considering the high computational cost of calculating self-attention weights in full-size PET images, in our designed spatial adaptive branch, we take the self-attention mechanism within each local partition window and introduce global information interactions between nonoverlapping windows by shifting operations to prevent the aforementioned problem. In addition, the convolutional network structure considers the information in each channel equally during the feature extraction process. In our designed channel adaptive branch, we use a Watts Strogatz topology structure to connect each feature map to only its most relevant features in each graph convolutional layer, substantially reducing information redundancy. Moreover, ensemble learning is adopted in our SW-GCN for mapping distinct features from the two well-designed branches to the enhanced PET images. We carried out extensive experiments on three single-bed position scans for 386 patients. The test results demonstrate that our proposed SW-GCN approach outperforms state-of-the-art methods in both quantitative and qualitative evaluations.


Subject(s)
Algorithms , Neural Networks, Computer , Humans , Electric Power Supplies , Positron-Emission Tomography
13.
Brief Funct Genomics ; 22(5): 411-419, 2023 11 10.
Article in English | MEDLINE | ID: mdl-37118891

ABSTRACT

Cyclin proteins are a group of proteins that activate the cell cycle by forming complexes with cyclin-dependent kinases. Identifying cyclins correctly can provide key clues to understanding the function of cyclins. However, due to the low similarity between cyclin protein sequences, the advancement of a machine learning-based approach to identify cycles is urgently needed. In this study, cyclin protein sequence features were extracted using the profile-based auto-cross covariance method. Then the features were ranked and selected with maximum relevance-maximum distance (MRMD) 1.0 and MRMD2.0. Finally, the prediction model was assessed through 10-fold cross-validation. The computational experiments showed that the best protein sequence features generated by MRMD1.0 could correctly predict 98.2% of cyclins using the random forest (RF) classifier, whereas seven-dimensional key protein sequence features identified with MRMD2.0 could correctly predict 96.1% of cyclins, which was superior to previous studies on the same dataset both in terms of dimensionality and performance comparisons. Therefore, our work provided a valuable tool for identifying cyclins. The model data can be downloaded from https://github.com/YUshunL/cyclin.


Subject(s)
Cyclins , Proteins , Cyclins/genetics , Cyclins/metabolism , Amino Acid Sequence , Cyclin-Dependent Kinases/metabolism , Cell Cycle
14.
IEEE/ACM Trans Comput Biol Bioinform ; 20(3): 2136-2146, 2023.
Article in English | MEDLINE | ID: mdl-37018561

ABSTRACT

Biomolecules, microRNAs (miRNAs) and long non-coding RNAs (lncRNAs), play critical roles in diverse fundamental and vital biological processes. They can serve as disease biomarkers as their dysregulations could cause complex human diseases. Identifying those biomarkers is helpful with the diagnosis, treatment, prognosis, and prevention of diseases. In this study, we propose a factorization machine-based deep neural network with binary pairwise encoding, DFMbpe, to identify the disease-related biomarkers. First, to comprehensively consider the interdependence of features, a binary pairwise encoding method is designed to obtain the raw feature representations for each biomarker-disease pair. Second, the raw features are mapped into their corresponding embedding vectors. Then, the factorization machine is conducted to get the wide low-order feature interdependence, while the deep neural network is applied to obtain the deep high-order feature interdependence. Finally, two kinds of features are combined to get the final prediction results. Unlike other biomarker identification models, the binary pairwise encoding considers the interdependence of features even though they never appear in the same sample, and the DFMbpe architecture emphasizes both low-order and high-order feature interactions simultaneously. The experimental results show that DFMbpe greatly outperforms the state-of-the-art identification models on both cross-validation and independent dataset evaluation. Besides, three types of case studies further demonstrate the effectiveness of this model.


Subject(s)
MicroRNAs , RNA, Long Noncoding , Humans , Neural Networks, Computer , Computational Biology/methods
15.
Bioinformatics ; 39(3)2023 03 01.
Article in English | MEDLINE | ID: mdl-36821425

ABSTRACT

MOTIVATION: Integration of growing single-cell RNA sequencing datasets helps better understand cellular identity and function. The major challenge for integration is removing batch effects while preserving biological heterogeneities. Advances in contrastive learning have inspired several contrastive learning-based batch correction methods. However, existing contrastive-learning-based methods exhibit noticeable ad hoc trade-off between batch mixing and preservation of cellular heterogeneities (mix-heterogeneity trade-off). Therefore, a deliberate mix-heterogeneity trade-off is expected to yield considerable improvements in scRNA-seq dataset integration. RESULTS: We develop a novel contrastive learning-based batch correction framework, CIAIRE, which achieves superior mix-heterogeneity trade-off. The key contributions of CLAIRE are proposal of two complementary strategies: construction strategy and refinement strategy, to improve the appropriateness of positive pairs. Construction strategy dynamically generates positive pairs by augmenting inter-batch mutual nearest neighbors (MNN) with intra-batch k-nearest neighbors (KNN), which improves the coverage of positive pairs for the whole distribution of shared cell types between batches. Refinement strategy aims to automatically reduce the potential false positive pairs from the construction strategy, which resorts to the memory effect of deep neural networks. We demonstrate that CLAIRE possesses superior mix-heterogeneity trade-off over existing contrastive learning-based methods. Benchmark results on six real datasets also show that CLAIRE achieves the best integration performance against eight state-of-the-art methods. Finally, comprehensive experiments are conducted to validate the effectiveness of CLAIRE. AVAILABILITY AND IMPLEMENTATION: The source code and data used in this study can be found in https://github.com/CSUBioGroup/CLAIRE-release. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Single-Cell Analysis , Software , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , Neural Networks, Computer , Cluster Analysis
16.
Article in English | MEDLINE | ID: mdl-34914594

ABSTRACT

Prediction of drug-target interactions (DTIs) plays a significant role in drug development and drug discovery. Although this task requires a large investment in terms of time and cost, especially when it is performed experimentally, the results are not necessarily significant. Computational DTI prediction is a shortcut to reduce the risks of experimental methods. In this study, we propose an effective approach of nonnegative matrix tri-factorization, referred to as NMTF-DTI, to predict the interaction scores between drugs and targets. NMTF-DTI utilizes multiple kernels (similarity measures) for drugs and targets and Laplacian regularization to boost the prediction performance. The performance of NMTF-DTI is evaluated via cross-validation and is compared with existing DTI prediction methods in terms of the area under the receiver operating characteristic (ROC) curve (AUC) and the area under the precision and recall curve (AUPR). We evaluate our method on four gold standard datasets, comparing to other state-of-the-art methods. Cross-validation and a separate, manually created dataset are used to set parameters. The results show that NMTF-DTI outperforms other competing methods. Moreover, the results of a case study also confirm the superiority of NMTF-DTI.


Subject(s)
Algorithms , Drug Development , Drug Discovery/methods , Drug Interactions , ROC Curve
17.
Bioinformatics ; 39(1)2023 01 01.
Article in English | MEDLINE | ID: mdl-36458923

ABSTRACT

MOTIVATION: Protein essentiality is usually accepted to be a conditional trait and strongly affected by cellular environments. However, existing computational methods often do not take such characteristics into account, preferring to incorporate all available data and train a general model for all cell lines. In addition, the lack of model interpretability limits further exploration and analysis of essential protein predictions. RESULTS: In this study, we proposed DeepCellEss, a sequence-based interpretable deep learning framework for cell line-specific essential protein predictions. DeepCellEss utilizes a convolutional neural network and bidirectional long short-term memory to learn short- and long-range latent information from protein sequences. Further, a multi-head self-attention mechanism is used to provide residue-level model interpretability. For model construction, we collected extremely large-scale benchmark datasets across 323 cell lines. Extensive computational experiments demonstrate that DeepCellEss yields effective prediction performance for different cell lines and outperforms existing sequence-based methods as well as network-based centrality measures. Finally, we conducted some case studies to illustrate the necessity of considering specific cell lines and the superiority of DeepCellEss. We believe that DeepCellEss can serve as a useful tool for predicting essential proteins across different cell lines. AVAILABILITY AND IMPLEMENTATION: The DeepCellEss web server is available at http://csuligroup.com:8000/DeepCellEss. The source code and data underlying this study can be obtained from https://github.com/CSUBioGroup/DeepCellEss. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Deep Learning , Proteins/metabolism , Amino Acid Sequence , Software , Cell Line , Computational Biology/methods
18.
Brief Bioinform ; 23(6)2022 11 19.
Article in English | MEDLINE | ID: mdl-36326080

ABSTRACT

Anticancer peptides (ACPs) are bioactive peptides with antitumor activity and have become the most promising drugs in the treatment of cancer. Therefore, the accurate prediction of ACPs is of great significance to the research of cancer diseases. In the paper, we developed a more efficient prediction model called ACP_MS. Firstly, the monoMonoKGap method is used to extract the characteristic of anticancer peptide sequences and form the digital features. Then, the AdaBoost model is used to select the most discriminating features from the digital features. Finally, a stochastic gradient descent algorithm is introduced to identify anticancer peptide sequences. We adopt 7-fold cross-validation and independent test set validation, and the final accuracy of the main dataset reached 92.653% and 91.597%, respectively. The accuracy of the alternate dataset reached 98.678% and 98.317%, respectively. Compared with other advanced prediction models, the ACP_MS model improves the identification ability of anticancer peptide sequences. The data of this model can be downloaded from the public website for free https://github.com/Zhoucaimao1998/Zc.


Subject(s)
Neoplasms , Peptides , Humans , Amino Acid Sequence , Neoplasms/drug therapy , Algorithms
19.
Front Pharmacol ; 13: 1031759, 2022.
Article in English | MEDLINE | ID: mdl-36299898

ABSTRACT

DNA-binding proteins (DBP) play an essential role in the genetics and evolution of organisms. A particular DNA sequence could provide underlying therapeutic benefits for hereditary diseases and cancers. Studying these proteins can timely and effectively understand their mechanistic analysis and play a particular function in disease prevention and treatment. The limitation of identifying DNA-binding protein members from the sequence database is time-consuming, costly, and ineffective. Therefore, efficient methods for improving DBP classification are crucial to disease research. In this paper, we developed a novel predictor Hybrid _DBP, which identified potential DBP by using hybrid features and convolutional neural networks. The method combines two feature selection methods, MonoDiKGap and Kmer, and then used MRMD2.0 to remove redundant features. According to the results, 94% of DBP were correctly recognized, and the accuracy of the independent test set reached 91.2%. This means Hybrid_ DBP can become a useful prediction tool for predicting DBP.

20.
PLoS One ; 17(7): e0270852, 2022.
Article in English | MEDLINE | ID: mdl-35862409

ABSTRACT

Computational drug repositioning aims to identify potential applications of existing drugs for the treatment of diseases for which they were not designed. This approach can considerably accelerate the traditional drug discovery process by decreasing the required time and costs of drug development. Tensor decomposition enables us to integrate multiple drug- and disease-related data to boost the performance of prediction. In this study, a nonnegative tensor decomposition for drug repositioning, NTD-DR, is proposed. In order to capture the hidden information in drug-target, drug-disease, and target-disease networks, NTD-DR uses these pairwise associations to construct a three-dimensional tensor representing drug-target-disease triplet associations and integrates them with similarity information of drugs, targets, and disease to make a prediction. We compare NTD-DR with recent state-of-the-art methods in terms of the area under the receiver operating characteristic (ROC) curve (AUC) and the area under the precision and recall curve (AUPR) and find that our method outperforms competing methods. Moreover, case studies with five diseases also confirm the reliability of predictions made by NTD-DR. Our proposed method identifies more known associations among the top 50 predictions than other methods. In addition, novel associations identified by NTD-DR are validated by literature analyses.


Subject(s)
Computational Biology , Drug Repositioning , Algorithms , Computational Biology/methods , Drug Discovery/methods , Drug Repositioning/methods , ROC Curve , Reproducibility of Results
SELECTION OF CITATIONS
SEARCH DETAIL
...