Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 15 de 15
Filter
Add more filters










Publication year range
1.
BMC Genomics ; 25(1): 411, 2024 May 09.
Article in English | MEDLINE | ID: mdl-38724911

ABSTRACT

BACKGROUND: In recent years, there has been a growing interest in utilizing computational approaches to predict drug-target binding affinity, aiming to expedite the early drug discovery process. To address the limitations of experimental methods, such as cost and time, several machine learning-based techniques have been developed. However, these methods encounter certain challenges, including the limited availability of training data, reliance on human intervention for feature selection and engineering, and a lack of validation approaches for robust evaluation in real-life applications. RESULTS: To mitigate these limitations, in this study, we propose a method for drug-target binding affinity prediction based on deep convolutional generative adversarial networks. Additionally, we conducted a series of validation experiments and implemented adversarial control experiments using straw models. These experiments serve to demonstrate the robustness and efficacy of our predictive models. We conducted a comprehensive evaluation of our method by comparing it to baselines and state-of-the-art methods. Two recently updated datasets, namely the BindingDB and PDBBind, were used for this purpose. Our findings indicate that our method outperforms the alternative methods in terms of three performance measures when using warm-start data splitting settings. Moreover, when considering physiochemical-based cold-start data splitting settings, our method demonstrates superior predictive performance, particularly in terms of the concordance index. CONCLUSION: The results of our study affirm the practical value of our method and its superiority over alternative approaches in predicting drug-target binding affinity across multiple validation sets. This highlights the potential of our approach in accelerating drug repurposing efforts, facilitating novel drug discovery, and ultimately enhancing disease treatment. The data and source code for this study were deposited in the GitHub repository, https://github.com/mojtabaze7/DCGAN-DTA . Furthermore, the web server for our method is accessible at https://dcgan.shinyapps.io/bindingaffinity/ .


Subject(s)
Drug Discovery , Drug Discovery/methods , Computational Biology/methods , Humans , Neural Networks, Computer , Protein Binding , Machine Learning
2.
J Transl Med ; 21(1): 727, 2023 10 16.
Article in English | MEDLINE | ID: mdl-37845681

ABSTRACT

This paper addresses the crucial task of identifying DNA/RNA binding sites, which has implications in drug/vaccine design, protein engineering, and cancer research. Existing methods utilize complex neural network structures, diverse input types, and machine learning techniques for feature extraction. However, the growing volume of sequences poses processing challenges. This study introduces KDeep, employing a CNN-LSTM architecture with a novel encoding method called 2Lk. 2Lk enhances prediction accuracy, reduces memory consumption by up to 84%, reduces trainable parameters, and improves interpretability by approximately 79% compared to state-of-the-art approaches. KDeep offers a promising solution for accurate and efficient binding site prediction.


Subject(s)
DNA , Neural Networks, Computer , Protein Binding , Binding Sites , Transcription Factors
3.
Commun Biol ; 6(1): 492, 2023 05 05.
Article in English | MEDLINE | ID: mdl-37147498

ABSTRACT

The Major Histocompatibility Complex (MHC) binds to the derived peptides from pathogens to present them to killer T cells on the cell surface. Developing computational methods for accurate, fast, and explainable peptide-MHC binding prediction can facilitate immunotherapies and vaccine development. Various deep learning-based methods rely on separate feature extraction from the peptide and MHC sequences and ignore their pairwise binding information. This paper develops a capsule neural network-based method to efficiently capture the peptide-MHC complex features to predict the peptide-MHC class I binding. Various evaluations confirmed our method outperformance over the alternative methods, while it can provide accurate prediction over less available data. Moreover, for providing precise insights into the results, we explored the essential features that contributed to the prediction. Since the simulation results demonstrated consistency with the experimental studies, we concluded that our method can be utilized for the accurate, rapid, and interpretable peptide-MHC binding prediction to assist biological therapies.


Subject(s)
Algorithms , Peptides , Peptides/metabolism , Proteins , Neural Networks, Computer , Major Histocompatibility Complex , Histocompatibility Antigens
4.
BMC Genomics ; 24(1): 266, 2023 May 18.
Article in English | MEDLINE | ID: mdl-37202721

ABSTRACT

BACKGROUND: The prevalence of the COVID-19 disease in recent years and its widespread impact on mortality, as well as various aspects of life around the world, has made it important to study this disease and its viral cause. However, very long sequences of this virus increase the processing time, complexity of calculation, and memory consumption required by the available tools to compare and analyze the sequences. RESULTS: We present a new encoding method, named PC-mer, based on the k-mer and physic-chemical properties of nucleotides. This method minimizes the size of encoded data by around 2 k times compared to the classical k-mer based profiling method. Moreover, using PC-mer, we designed two tools: 1) a machine-learning-based classification tool for coronavirus family members with the ability to recive input sequences from the NCBI database, and 2) an alignment-free computational comparison tool for calculating dissimilarity scores between coronaviruses at the genus and species levels. CONCLUSIONS: PC-mer achieves 100% accuracy despite the use of very simple classification algorithms based on Machine Learning. Assuming dynamic programming-based pairwise alignment as the ground truth approach, we achieved a degree of convergence of more than 98% for coronavirus genus-level sequences and 93% for SARS-CoV-2 sequences using PC-mer in the alignment-free classification method. This outperformance of PC-mer suggests that it can serve as a replacement for alignment-based approaches in certain sequence analysis applications that rely on similarity/dissimilarity scores, such as searching sequences, comparing sequences, and certain types of phylogenetic analysis methods that are based on sequence comparison.


Subject(s)
COVID-19 , SARS-CoV-2 , Humans , SARS-CoV-2/genetics , Phylogeny , Sequence Analysis, DNA , Nucleotides/genetics , Base Sequence , Algorithms
5.
BMC Genomics ; 24(1): 227, 2023 May 02.
Article in English | MEDLINE | ID: mdl-37127578

ABSTRACT

BACKGROUND: It is now possible to analyze cellular heterogeneity at the single-cell level thanks to the rapid developments in single-cell sequencing technologies. The clustering of cells is a fundamental and common step in heterogeneity analysis. Even so, accurate cell clustering remains a challenge due to the high levels of noise, the high dimensions, and the high sparsity of data. RESULTS: Here, we present SCEA, a clustering approach for scRNA-seq data. Using two consecutive units, an encoder based on MLP and a graph attention auto-encoder, to obtain cell embedding and gene embedding, SCEA can simultaneously achieve cell low-dimensional representation and clustering performing various examinations to obtain the optimal value for each parameter, the presented result is in its most optimal form. To evaluate the performance of SCEA, we performed it on several real scRNA-seq datasets for clustering and visualization analysis. CONCLUSIONS: The experimental results show that SCEA generally outperforms several popular single-cell analysis methods. As a result of using all available datasets, SCEA, in average, improves clustering accuracy by 4.4% in ARI Parameters over the well-known method scGAC. Also, the accuracy improvement of 11.65% is achieved by SCEA, compared to the Seurat model.


Subject(s)
Gene Expression Profiling , Single-Cell Gene Expression Analysis , Gene Expression Profiling/methods , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , Cluster Analysis , Algorithms
6.
PLoS Comput Biol ; 19(3): e1011036, 2023 03.
Article in English | MEDLINE | ID: mdl-37000857

ABSTRACT

Drug-target binding affinity prediction plays a key role in the early stage of drug discovery. Numerous experimental and data-driven approaches have been developed for predicting drug-target binding affinity. However, experimental methods highly rely on the limited structural-related information from drug-target pairs, domain knowledge, and time-consuming assays. On the other hand, learning-based methods have shown an acceptable prediction performance. However, most of them utilize several simple and complex types of proteins and drug compounds data, ranging from the protein sequences to the topology of a graph representation of drug compounds, employing multiple deep neural networks for encoding and feature extraction, and so, leads to the computational overheads. In this study, we propose a unified measure for protein sequence encoding, named BiComp, which provides compression-based and evolutionary-related features from the protein sequences. Specifically, we employ Normalized Compression Distance and Smith-Waterman measures for capturing complementary information from the algorithmic information theory and biological domains, respectively. We utilize the proposed measure to encode the input proteins feeding a new deep neural network-based method for drug-target binding affinity prediction, named BiComp-DTA. BiComp-DTA is evaluated utilizing four benchmark datasets for drug-target binding affinity prediction. Compared to the state-of-the-art methods, which employ complex models for protein encoding and feature extraction, BiComp-DTA provides superior efficiency in terms of accuracy, runtime, and the number of trainable parameters. The latter achievement facilitates execution of BiComp-DTA on a normal desktop computer in a fast fashion. As a comparative study, we evaluate BiComp's efficiency against its components for drug-target binding affinity prediction. The results have shown superior accuracy of BiComp due to the orthogonality and complementary nature of Smith-Waterman and Normalized Compression Distance measures for protein sequences. Such a protein sequence encoding provides efficient representation with no need for multiple sources of information, deep domain knowledge, and complex neural networks.


Subject(s)
Drug Development , Neural Networks, Computer , Proteins/chemistry , Amino Acid Sequence , Drug Discovery/methods
7.
J Biomed Inform ; 139: 104316, 2023 03.
Article in English | MEDLINE | ID: mdl-36781036

ABSTRACT

The classification of different organisms into subtypes is one of the most important tools of organism studies, and among them, the classification of viruses itself has been the focus of many studies due to their use in virology and epidemiology. Many methods have been proposed to classify viruses, some of which are designed for a specific family of organisms and some of which are more general. But still, especially for certain categories such as Influenza and HIV, classification is facing performance challenges as well as processing and memory bottlenecks. In this way, we designed an automated classifier, called PC-mer, that is based on k-mer and physicochemical characteristics of nucleotides, which reduces the number of features about 2 k times compared to the alternative methods based on k-mer, and compared to integer and one-hot encoding methods, it is possible to keep the number of features constant despite the growth of the sequence length. In this way, it also increases the training speed by an average of 17.93 times. This improvement in processing complexity is provided while PC-mer can also improve the classifying performance for a variety of virus families.


Subject(s)
Genome, Viral , Viruses , Viruses/genetics , Algorithms , Software
8.
PLoS Comput Biol ; 18(11): e1010665, 2022 11.
Article in English | MEDLINE | ID: mdl-36409684

ABSTRACT

In response to the imperfections of current sequence alignment methods, originated from the inherent serialism within their corresponding electrical systems, a few optical approaches for biological data comparison have been proposed recently. However, due to their low performance, raised from their inefficient coding scheme, this paper presents a novel all-optical high-throughput method for aligning DNA, RNA, and protein sequences, named HELIOS. The HELIOS method employs highly sophisticated operations to locate character matches, single or multiple mutations, and single or multiple indels within various biological sequences. On the other hand, the HELIOS optical architecture exploits high-speed processing and operational parallelism in optics, by adopting wavelength and polarization of optical beams. For evaluation, the functionality and accuracy of the HELIOS method are approved through behavioral and optical simulation studies, while its complexity and performance are estimated through analytical computation. The accuracy evaluations indicate that the HELIOS method achieves a precise pairwise alignment of two sequences, highly similar to those of Smith-Waterman, Needleman-Wunsch, BLAST, MUSCLE, ClustalW, ClustalΩ, T-Coffee, Kalign, and MAFFT. According to our performance evaluations, the HELIOS optical architecture outperforms all alternative electrical and optical algorithms in terms of processing time and memory requirement, relying on its highly sophisticated method and optical architecture. Moreover, the employed compact coding scheme highly escalates the number of input characters, and hence, it offers reduced time and space complexities, compared to the electrical and optical alternatives. It makes the HELIOS method and optical architecture highly applicable for biomedical applications.


Subject(s)
Algorithms , INDEL Mutation , Sequence Alignment , Amino Acid Sequence , Computer Simulation
9.
Sci Rep ; 12(1): 17232, 2022 10 14.
Article in English | MEDLINE | ID: mdl-36241863

ABSTRACT

The classification performance of all-optical Convolutional Neural Networks (CNNs) is greatly influenced by components' misalignment and translation of input images in the practical applications. In this paper, we propose a free-space all-optical CNN (named Trans-ONN) which accurately classifies translated images in the horizontal, vertical, or diagonal directions. Trans-ONN takes advantages of an optical motion pooling layer which provides the translation invariance property by implementing different optical masks in the Fourier plane for classifying translated test images. Moreover, to enhance the translation invariance property, global average pooling (GAP) is utilized in the Trans-ONN structure, rather than fully connected layers. The comparative studies confirm that taking advantage of vertical and horizontal masks along GAP operation provide the best translation invariance property, compared to the alternative network models, for classifying horizontally and vertically shifted test images up to 50 pixel shifts of Kaggle Cats and Dogs, CIFAR-10, and MNIST datasets, respectively. Also, adopting the diagonal mask along GAP operation achieves the best classification accuracy for classifying translated test images in the diagonal direction for large number of pixel shifts (i.e. more than 30 pixel shifts). It is worth mentioning that the proposed translation invariant networks are capable of classifying the translated test images not included in the training procedure.


Subject(s)
Neural Networks, Computer , Deep Learning
10.
Sci Rep ; 12(1): 11158, 2022 07 01.
Article in English | MEDLINE | ID: mdl-35778592

ABSTRACT

Bio-sequence comparators are one of the most basic and significant methods for assessing biological data, and so, due to the importance of proteins, protein sequence comparators are particularly crucial. On the other hand, the complexity of the problem, the growing number of extracted protein sequences, and the growth of studies and data analysis applications addressing protein sequences have necessitated the development of a rapid and accurate approach to account for the complexities in this field. As a result, we propose a protein sequence comparison approach, called PCV, which improves comparison accuracy by producing vectors that encode sequence data as well as physicochemical properties of the amino acids. At the same time, by partitioning the long protein sequences into fix-length blocks and providing encoding vector for each block, this method allows for parallel and fast implementation. To evaluate the performance of PCV, like other alignment-free methods, we used 12 benchmark datasets including classes with homologous sequences which may require a simple preprocessing search tool to select the homologous data. And then, we compared the protein sequence comparison outcomes to those of alternative alignment-based and alignment-free methods, using various evaluation criteria. These results indicate that our method provides significant improvement in sequence classification accuracy, compared to the alternative alignment-free methods and has an average correlation of about 94% with the ClustalW method as our reference method, while considerably reduces the processing time.


Subject(s)
Algorithms , Amino Acids , Amino Acid Sequence , Amino Acids/chemistry , Proteins/chemistry , Sequence Alignment
11.
PLoS One ; 17(4): e0267106, 2022.
Article in English | MEDLINE | ID: mdl-35427371

ABSTRACT

The classification of biological sequences is an open issue for a variety of data sets, such as viral and metagenomics sequences. Therefore, many studies utilize neural network tools, as the well-known methods in this field, and focus on designing customized network structures. However, a few works focus on more effective factors, such as input encoding method or implementation technology, to address accuracy and efficiency issues in this area. Therefore, in this work, we propose an image-based encoding method, called as WalkIm, whose adoption, even in a simple neural network, provides competitive accuracy and superior efficiency, compared to the existing classification methods (e.g. VGDC, CASTOR, and DLM-CNN) for a variety of biological sequences. Using WalkIm for classifying various data sets (i.e. viruses whole-genome data, metagenomics read data, and metabarcoding data), it achieves the same performance as the existing methods, with no enforcement of parameter initialization or network architecture adjustment for each data set. It is worth noting that even in the case of classifying high-mutant data sets, such as Coronaviruses, it achieves almost 100% accuracy for classifying its various types. In addition, WalkIm achieves high-speed convergence during network training, as well as reduction of network complexity. Therefore WalkIm method enables us to execute the classifying neural networks on a normal desktop system in a short time interval. Moreover, we addressed the compatibility of WalkIm encoding method with free-space optical processing technology. Taking advantages of optical implementation of convolutional layers, we illustrated that the training time can be reduced by up to 500 time. In addition to all aforementioned advantages, this encoding method preserves the structure of generated images in various modes of sequence transformation, such as reverse complement, complement, and reverse modes.


Subject(s)
Metagenomics , Neural Networks, Computer , Data Collection , Research Design
12.
PLoS One ; 16(1): e0245095, 2021.
Article in English | MEDLINE | ID: mdl-33449928

ABSTRACT

In this study, optical technology is considered as SA issues' solution with the potential ability to increase the speed, overcome memory-limitation, reduce power consumption, and increase output accuracy. So we examine the effect of bio-data encoding and the creation of input images on the pattern-recognition error-rate at the output of optical Vander-lugt correlator. Moreover, we present a genetic algorithm-based coding approach, named as GAC, to minimize output noises of cross-correlating data. As a case study, we adopt the proposed coding approach within a correlation-based optical architecture for counting k-mers in a DNA string. As verified by the simulations on Salmonella whole-genome, we can improve sensitivity and speed more than 86% and 81%, respectively, compared to BLAST by using coding set generated by GAC method fed to the proposed optical correlator system. Moreover, we present a comprehensive report on the impact of 1D and 2D cross-correlation approaches, as-well-as various coding parameters on the output noise, which motivate the system designers to customize the coding sets within the optical setup.


Subject(s)
Algorithms , Optics and Photonics , Sequence Alignment , Sequence Analysis, DNA
13.
J Biophotonics ; 13(1): e201900227, 2020 01.
Article in English | MEDLINE | ID: mdl-31397961

ABSTRACT

Nowadays, the accelerated expansion of genetic data challenges speed of current DNA sequence alignment algorithms due to their electrical implementations. Essential needs of an efficient and accurate method for DNA variant discovery demand new approaches for parallel processing in real time. Fortunately, photonics, as an emerging technology in data computing, proposes optical correlation as a fast similarity measurement algorithm; while complexity of existing local alignment algorithms severely limits their applicability. Hence, in this paper, employing optical correlation for global alignment, we present an optical processing approach for local DNA sequence alignment to benefit both high-speed processing and operational parallelism, inherently exist in optics. The proposed method, named as OptCAM, utilizes amplitude and wavelength of the optical signals, to accurately locate mutations through three main procedures. Furthermore, an all-optical implementation of the OptCAM method is proposed consisting of three units, corresponding to the three OptCAM procedures. Performing considerably fast processes by passing optical signals through high-throughput photonic devices, OptCAM avoids various limitations of electrical implementations. Accuracy and efficiency of the OptCAM method and its optical implementation are validated through numerical simulation by a gold standard simulation benchmark. The results indicate the proposed method is significantly faster than its electrical counterparts, in both single node and grid computation.


Subject(s)
Algorithms , DNA , Computer Simulation , DNA/genetics , Optics and Photonics , Sequence Alignment , Software
14.
J Opt Soc Am A Opt Image Sci Vis ; 35(11): 1929-1940, 2018 Nov 01.
Article in English | MEDLINE | ID: mdl-30461853

ABSTRACT

This paper presents a novel optical processing approach for exploring genome sequences built upon an optical correlator for global alignment and the extended dual-vector-curve (DV-curve) method for local alignment. To overcome the problem of the traditional DV-curve method for presenting an accurate and simplified output, we propose the hybrid amplitude wavelength polarization optical DV-curve (HAWPOD) method, built upon the DV-curve method, to analyze genome sequences in three steps: DNA coding, alignment, and post-analysis. For this purpose, a tunable graphene-based color filter is designed for wavelength modulation of optical signals. Moreover, all-optical implementation of the HAWPOD method is developed, while its accuracy is validated through numerical simulations in LUMERICAL FDTD. The results express that the proposed method is much faster than its electrical counterparts.


Subject(s)
DNA/genetics , Genetic Variation , Optical Phenomena , Base Sequence , Sequence Alignment
15.
J Opt Soc Am A Opt Image Sci Vis ; 34(7): 1173-1186, 2017 Jul 01.
Article in English | MEDLINE | ID: mdl-29036127

ABSTRACT

This paper presents an optical processing approach for exploring a large number of genome sequences. Specifically, we propose an optical correlator for global alignment and an extended moiré matching technique for local analysis of spatially coded DNA, whose output is fed to a novel three-dimensional artificial neural network for local DNA alignment. All-optical implementation of the proposed 3D artificial neural network is developed and its accuracy is verified in Zemax. Thanks to its parallel processing capability, the proposed structure performs local alignment of 4 million sequences of 150 base pairs in a few seconds, which is much faster than its electrical counterparts, such as the basic local alignment search tool.


Subject(s)
Computer Simulation , DNA/genetics , Neural Networks, Computer , Sequence Alignment , Sequence Analysis, DNA/methods , Algorithms , Animals , Humans , Imaging, Three-Dimensional
SELECTION OF CITATIONS
SEARCH DETAIL
...