Pesquisa | Portal Regional da BVS (teste)

Optimization of drug-target affinity prediction methods through feature processing schemes.

Ru, Xiaoqing; Zou, Quan; Lin, Chen.

Bioinformatics ; 39(11)2023 11 01.

Artigo em Inglês | MEDLINE | ID: mdl-37812388

RESUMO

MOTIVATION: Numerous high-accuracy drug-target affinity (DTA) prediction models, whose performance is heavily reliant on the drug and target feature information, are developed at the expense of complexity and interpretability. Feature extraction and optimization constitute a critical step that significantly influences the enhancement of model performance, robustness, and interpretability. Many existing studies aim to comprehensively characterize drugs and targets by extracting features from multiple perspectives; however, this approach has drawbacks: (i) an abundance of redundant or noisy features; and (ii) the feature sets often suffer from high dimensionality. RESULTS: In this study, to obtain a model with high accuracy and strong interpretability, we utilize various traditional and cutting-edge feature selection and dimensionality reduction techniques to process self-associated features and adjacent associated features. These optimized features are then fed into learning to rank to achieve efficient DTA prediction. Extensive experimental results on two commonly used datasets indicate that, among various feature optimization methods, the regression tree-based feature selection method is most beneficial for constructing models with good performance and strong robustness. Then, by utilizing Shapley Additive Explanations values and the incremental feature selection approach, we obtain that the high-quality feature subset consists of the top 150D features and the top 20D features have a breakthrough impact on the DTA prediction. In conclusion, our study thoroughly validates the importance of feature optimization in DTA prediction and serves as inspiration for constructing high-performance and high-interpretable models. AVAILABILITY AND IMPLEMENTATION: https://github.com/RUXIAOQING964914140/FS_DTA.

Assuntos

Modelos Químicos , Preparações Farmacêuticas , Análise de Regressão , Preparações Farmacêuticas/química

NerLTR-DTA: drug-target binding affinity prediction based on neighbor relationship and learning to rank.

Ru, Xiaoqing; Ye, Xiucai; Sakurai, Tetsuya; Zou, Quan.

Bioinformatics ; 38(7): 1964-1971, 2022 03 28.

Artigo em Inglês | MEDLINE | ID: mdl-35134828

RESUMO

MOTIVATION: Drug-target interaction prediction plays an important role in new drug discovery and drug repurposing. Binding affinity indicates the strength of drug-target interactions. Predicting drug-target binding affinity is expected to provide promising candidates for biologists, which can effectively reduce the workload of wet laboratory experiments and speed up the entire process of drug research. Given that, numerous new proteins are sequenced and compounds are synthesized, several improved computational methods have been proposed for such predictions, but there are still some challenges. (i) Many methods only discuss and implement one application scenario, they focus on drug repurposing and ignore the discovery of new drugs and targets. (ii) Many methods do not consider the priority order of proteins (or drugs) related to each target drug (or protein). Therefore, it is necessary to develop a comprehensive method that can be used in multiple scenarios and focuses on candidate order. RESULTS: In this study, we propose a method called NerLTR-DTA that uses the neighbor relationship of similarity and sharing to extract features, and applies a ranking framework with regression attributes to predict affinity values and priority order of query drug (or query target) and its related proteins (or compounds). It is worth noting that using the characteristics of learning to rank to set different queries can smartly realize the multi-scenario application of the method, including the discovery of new drugs and new targets. Experimental results on two commonly used datasets show that NerLTR-DTA outperforms some state-of-the-art competing methods. NerLTR-DTA achieves excellent performance in all application scenarios mentioned in this study, and the rm(test)2 values guarantee such excellent performance is not obtained by chance. Moreover, it can be concluded that NerLTR-DTA can provide accurate ranking lists for the relevant results of most queries through the statistics of the association relationship of each query drug (or query protein). In general, NerLTR-DTA is a powerful tool for predicting drug-target associations and can contribute to new drug discovery and drug repurposing. AVAILABILITY AND IMPLEMENTATION: The proposed method is implemented in Python and Java. Source codes and datasets are available at https://github.com/RUXIAOQING964914140/NerLTR-DTA.

Assuntos

Algoritmos , Software , Desenvolvimento de Medicamentos/métodos , Descoberta de Drogas/métodos , Reposicionamento de Medicamentos , Proteínas/química

Application of Machine Learning for Drug-Target Interaction Prediction.

Xu, Lei; Ru, Xiaoqing; Song, Rong.

Front Genet ; 12: 680117, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34234813

RESUMO

Exploring drug-target interactions by biomedical experiments requires a lot of human, financial, and material resources. To save time and cost to meet the needs of the present generation, machine learning methods have been introduced into the prediction of drug-target interactions. The large amount of available drug and target data in existing databases, the evolving and innovative computer technologies, and the inherent characteristics of various types of machine learning have made machine learning techniques the mainstream method for drug-target interaction prediction research. In this review, details of the specific applications of machine learning in drug-target interaction prediction are summarized, the characteristics of each algorithm are analyzed, and the issues that need to be further addressed and explored for future research are discussed. The aim of this review is to provide a sound basis for the construction of high-performance models.

Current status and future prospects of drug-target interaction prediction.

Ru, Xiaoqing; Ye, Xiucai; Sakurai, Tetsuya; Zou, Quan; Xu, Lei; Lin, Chen.

Brief Funct Genomics ; 20(5): 312-322, 2021 09 11.

Artigo em Inglês | MEDLINE | ID: mdl-34189559

RESUMO

Drug-target interaction prediction is important for drug development and drug repurposing. Many computational methods have been proposed for drug-target interaction prediction due to their potential to the time and cost reduction. In this review, we introduce the molecular docking and machine learning-based methods, which have been widely applied to drug-target interaction prediction. Particularly, machine learning-based methods are divided into different types according to the data processing form and task type. For each type of method, we provide a specific description and propose some solutions to improve its capability. The knowledge of heterogeneous network and learning to rank are also summarized in this review. As far as we know, this is the first comprehensive review that summarizes the knowledge of heterogeneous network and learning to rank in the drug-target interaction prediction. Moreover, we propose three aspects that can be explored in depth for future research.

Assuntos

Descoberta de Drogas , Preparações Farmacêuticas , Desenvolvimento de Medicamentos , Aprendizado de Máquina , Simulação de Acoplamento Molecular

Application of learning to rank in bioinformatics tasks.

Ru, Xiaoqing; Ye, Xiucai; Sakurai, Tetsuya; Zou, Quan.

Brief Bioinform ; 22(5)2021 09 02.

Artigo em Inglês | MEDLINE | ID: mdl-33454758

RESUMO

Over the past decades, learning to rank (LTR) algorithms have been gradually applied to bioinformatics. Such methods have shown significant advantages in multiple research tasks in this field. Therefore, it is necessary to summarize and discuss the application of these algorithms so that these algorithms are convenient and contribute to bioinformatics. In this paper, the characteristics of LTR algorithms and their strengths over other types of algorithms are analyzed based on the application of multiple perspectives in bioinformatics. Finally, the paper further discusses the shortcomings of the LTR algorithms, the methods and means to better use the algorithms and some open problems that currently exist.

Assuntos

Algoritmos , Biologia Computacional/métodos , DNA/química , Drogas em Investigação/farmacologia , Proteínas/química , Software , Sequência de Aminoácidos , DNA/genética , DNA/metabolismo , Descoberta de Drogas , Drogas em Investigação/síntese química , Humanos , Domínios Proteicos , Estrutura Secundária de Proteína , Proteínas/genética , Proteínas/metabolismo , Homologia de Sequência de Aminoácidos

Exploration of the correlation between GPCRs and drugs based on a learning to rank algorithm.

Ru, Xiaoqing; Wang, Lida; Li, Lihong; Ding, Hui; Ye, Xiucai; Zou, Quan.

Comput Biol Med ; 119: 103660, 2020 04.

Artigo em Inglês | MEDLINE | ID: mdl-32090901

RESUMO

Exploring the protein - drug correlation can not only solve the problem of selecting candidate compounds but also solve related problems such as drug redirection and finding potential drug targets. Therefore, many researchers have proposed different machine learning methods for prediction of protein-drug correlations. However, many existing models simply divide the protein-drug relationship into related or irrelevant categories and do not deeply explore the most relevant target (or drug) for a given drug (or target). In order to solve this problem, this paper applies the ranking concept to the prediction of the GPCR (G Protein-Coupled Receptors)-drug correlation. This study uses two different types of data sets to explore candidate compound and potential target problems, and both sets achieved good results. In addition, this study also found that the family to which a protein belongs is not an inherent factor that affects the ranking of GPCR-drug correlations; however, if the drug affects other family members of the protein, then the protein is likely to be a potential target of the drug. This study showed that the learning to rank algorithm is a good tool for exploring protein-drug correlations.

Assuntos

Algoritmos , Preparações Farmacêuticas , Aprendizado de Máquina , Receptores Acoplados a Proteínas G

Selecting Essential MicroRNAs Using a Novel Voting Method.

Ru, Xiaoqing; Cao, Peigang; Li, Lihong; Zou, Quan.

Mol Ther Nucleic Acids ; 18: 16-23, 2019 Dec 06.

Artigo em Inglês | MEDLINE | ID: mdl-31479921

RESUMO

Among the large number of known microRNAs (miRNAs), some miRNAs play negligible roles in cell regulation. Therefore, selecting essential miRNAs is an important initial step for a deeper understanding of miRNAs and their functions. In this study, we generated 60 classification models by combining 12 representative feature extraction methods and 5 commonly used classification algorithms. The optimal model for essential miRNA classification that we obtained is based on the Mismatch feature extraction method combined with the random forest algorithm. The F-Measure, area under the curve, and accuracy values of this model were 93.2%, 96.7%, and 93.0%, respectively. We also found that the distribution of the positive and negative examples of the first few features greatly influenced the classification results. The feature extraction methods performed best when the differences between the positive and negative examples were obvious, and this led to better classification of essential miRNAs. Because each classifier's predictions for the same sample may be different, we employed a novel voting method to improve the accuracy of the classification of essential miRNAs. The performance results showed that the best classification results were obtained when five classification models were used in the voting. The five classification models were constructed based on the Mismatch, pseudo-distance structure status pair composition, Subsequence, Kmer, and Triplet feature extraction methods. The voting result was 95.3%. Our results suggest that the voting method can be an important tool for selecting essential miRNAs.

Incorporating Distance-Based Top-n-gram and Random Forest To Identify Electron Transport Proteins.

Ru, Xiaoqing; Li, Lihong; Zou, Quan.

J Proteome Res ; 18(7): 2931-2939, 2019 07 05.

Artigo em Inglês | MEDLINE | ID: mdl-31136183

RESUMO

Cellular respiration provides direct energy substances for living organisms. Electron storage and transportation should be completed through electron transport chains during the cellular respiration process. Thus, identifying electron transport proteins is an important research task. In protein identification, selection of the feature extraction method and classification algorithm has a direct bearing on classification. The distance-based Top-n-gram method, which was proposed based on the frequency profile and considered evolutionary information, was used in this study for feature extraction. The Max-Relevance-Max-Distance algorithm was adopted for feature selection. The first 4D features that greatly influenced the classification result were selected to form the feature data set. Finally, the random forest algorithm was used to identify electron transport proteins. Under the 10-fold cross-validation of the model constructed in this study, sensitivity, specificity, and accuracy rates surpassed 85%, 80%, and 82%, respectively. In the testing set, F-measure, AUC value, and accuracy exceeded 74%, 95%, and 86%, respectively. These experimental results indicated that the classification model built in this study is an effective tool in identifying electron transport proteins.

Assuntos

Algoritmos , Proteínas de Transporte/análise , Complexo de Proteínas da Cadeia de Transporte de Elétrons/análise , Transporte de Elétrons , Classificação , Modelos Químicos , Sensibilidade e Especificidade

Identification of Phage Viral Proteins With Hybrid Sequence Features.

Ru, Xiaoqing; Li, Lihong; Wang, Chunyu.

Front Microbiol ; 10: 507, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-30972038

RESUMO

The uniqueness of bacteriophages plays an important role in bioinformatics research. In real applications, the function of the bacteriophage virion proteins is the main area of interest. Therefore, it is very important to classify bacteriophage virion proteins and non-phage virion proteins accurately. Extracting comprehensive and effective sequence features from proteins plays a vital role in protein classification. In order to more fully represent protein information, this paper is more comprehensive and effective by combining the features extracted by the feature information representation algorithm based on sequence information (CCPA) and the feature representation algorithm based on sequence and structure information. After extracting features, the Max-Relevance-Max-Distance (MRMD) algorithm is used to select the optimal feature set with the strongest correlation between class labels and low redundancy between features. Given the randomness of the samples selected by the random forest classification algorithm and the randomness features for producing each node variable, a random forest method is employed to perform 10-fold cross-validation on the bacteriophage protein classification. The accuracy of this model is as high as 93.5% in the classification of phage proteins in this study. This study also found that, among the eight physicochemical properties considered, the charge property has the greatest impact on the classification of bacteriophage proteins These results indicate that the model discussed in this paper is an important tool in bacteriophage protein research.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA