Pesquisa | Portal Regional da BVS

Discovering SNP-disease relationships in genome-wide SNP data using an improved harmony search based on SNP locus and genetic inheritance patterns.

Esmaeili, Fariba; Narimani, Zahra; Vasighi, Mahdi.

PLoS One ; 18(10): e0292266, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37831690

RESUMO

Advances in high-throughput sequencing technologies have made it possible to access millions of measurements from thousands of people. Single nucleotide polymorphisms (SNPs), the most common type of mutation in the human genome, have been shown to play a significant role in the development of complex and multifactorial diseases. However, studying the synergistic interactions between different SNPs in explaining multifactorial diseases is challenging due to the high dimensionality of the data and methodological complexities. Existing solutions often use a multi-objective approach based on metaheuristic optimization algorithms such as harmony search. However, previous studies have shown that using a multi-objective approach is not sufficient to address complex disease models with no or low marginal effect. In this research, we introduce a locus-driven harmony search (LDHS), an improved harmony search algorithm that focuses on using SNP locus information and genetic inheritance patterns to initialize harmony memories. The proposed method integrates biological knowledge to improve harmony memory initialization by adding SNP combinations that are likely candidates for interaction and disease causation. Using a SNP grouping process, LDHS generates harmonies that include SNPs with a higher potential for interaction, resulting in greater power in detecting disease-causing SNP combinations. The performance of the proposed algorithm was evaluated on 200 synthesized datasets for disease models with and without marginal effect. The results show significant improvement in the power of the algorithm to find disease-related SNP sets while decreasing computational cost compared to state-of-the-art algorithms. The proposed algorithm also demonstrated notable performance on real breast cancer data, showing that integrating prior knowledge can significantly improve the process of detecting disease-related SNPs in both real and synthesized data.

Assuntos

Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Humanos , Estudo de Associação Genômica Ampla/métodos , Algoritmos , Genoma Humano , Bases de Dados Genéticas , Epistasia Genética

Sequence-Based Prediction of Plant Allergenic Proteins: Machine Learning Classification Approach.

Nedyalkova, Miroslava; Vasighi, Mahdi; Azmoon, Amirreza; Naneva, Ludmila; Simeonov, Vasil.

ACS Omega ; 8(4): 3698-3704, 2023 Jan 31.

Artigo em Inglês | MEDLINE | ID: mdl-36743013

RESUMO

This Article proposes a novel chemometric approach to understanding and exploring the allergenic nature of food proteins. Using machine learning methods (supervised and unsupervised), this work aims to predict the allergenicity of plant proteins. The strategy is based on scoring descriptors and testing their classification performance. Partitioning was based on support vector machines (SVM), and a k-nearest neighbor (KNN) classifier was applied. A fivefold cross-validation approach was used to validate the KNN classifier in the variable selection step as well as the final classifier. To overcome the problem of food allergies, a robust and efficient method for protein classification is needed.

A multilevel approach for screening natural compounds as an antiviral agent for COVID-19.

Vasighi, Mahdi; Romanova, Julia; Nedyalkova, Miroslava.

Comput Biol Chem ; 98: 107694, 2022 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-35576744

RESUMO

The COVID-19 has a worldwide spread, which has prompted concerted efforts to find successful drug treatments. Drug design focused on finding antiviral therapeutic agents from plant-derived compounds which may disrupt the attachment of SARS-CoV-2 to host cells is with a pivotal need and role in the last year. Herein, we provide an approach based on drug design methods combined with machine learning approaches to classify and discover inhibitors for COVID-19 from natural products. The spike receptor-binding domain (RBD) was docked with database of 125 ligands. The docking protocol based on several steps was performed within Autodock Vina to identify the high-affinity binding mode and to reveal more insights into interaction between the phytochemicals and the RBD domain. A protein-ligand interaction analyzer has been developed. The drug-likeness properties of explored inhibitors are analyzed in the frame of exploratory data analyses. The developed computational protocol yielded a comprehensive pipeline for predicting the inhibitors to prevent the entry RBD region.

Assuntos

Antivirais , Tratamento Farmacológico da COVID-19 , Antivirais/química , Antivirais/farmacologia , Produtos Biológicos/química , Produtos Biológicos/farmacologia , Avaliação Pré-Clínica de Medicamentos , Humanos , Ligantes , Simulação de Acoplamento Molecular , SARS-CoV-2 , Glicoproteína da Espícula de Coronavírus/metabolismo

Persistent organic pollutants (POPs) - QSPR classification models by means of Machine learning strategies.

Vakarelska, Ekaterina; Nedyalkova, Miroslava; Vasighi, Mahdi; Simeonov, Vasil.

Chemosphere ; 287(Pt 2): 132189, 2022 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-34826905

RESUMO

Persistent Organic pollutants (POPs) are toxic chemicals with a shallow degradation rate and global negative impact. Their physicochemical is combined with the complex effects of long-term POPs accumulation in the environment and transport function through the food chain. That is why POPs have been linked to adverse effects on human health and animals. They circulate globally via different environmental pathways, and could be detected in regions far from their source of origin. The primary goal of the present study is to carry out classification of various representatives of POPs using different theoretical descriptors (molecular, structural) to develop quantitative structure-properties relationship (QSPR) models for predicting important properties POPs. Multivariate statistical methods such as hierarchical cluster analysis, principal components analysis and self-organizing maps were applied to reach excellent partitioning of 149 representatives of POPs into 4 classes using ten most appropriate descriptors (out of 63) defined by variable reduction procedure. The predictive capabilities of the defined classes could be applied as a pattern recognition for new and unidentified POPs, based only on structural properties that similar molecules may have. The additional self-organizing maps technique made it possible to visualize the feature-space and investigate possible patterns and similarities between POPs molecules. It contributes to confirmation of the proper classification into four classes. Based on SOM results, the effect of each variable and pattern formation has been presented.

Assuntos

Poluentes Ambientais , Poluentes Orgânicos Persistentes , Animais , Poluentes Ambientais/análise , Cadeia Alimentar , Humanos , Aprendizado de Máquina , Relação Quantitativa Estrutura-Atividade

Inhibition Ability of Natural Compounds on Receptor-Binding Domain of SARS-CoV2: An In Silico Approach.

Nedyalkova, Miroslava; Vasighi, Mahdi; Sappati, Subrahmanyam; Kumar, Anmol; Madurga, Sergio; Simeonov, Vasil.

Pharmaceuticals (Basel) ; 14(12)2021 Dec 18.

Artigo em Inglês | MEDLINE | ID: mdl-34959727

RESUMO

The lack of medication to treat COVID-19 is still an obstacle that needs to be addressed by all possible scientific approaches. It is essential to design newer drugs with varied approaches. A receptor-binding domain (RBD) is a key part of SARS-CoV-2 virus, located on its surface, that allows it to dock to ACE2 receptors present on human cells, which is followed by admission of virus into cells, and thus infection is triggered. Specific receptor-binding domains on the spike protein play a pivotal role in binding to the receptor. In this regard, the in silico method plays an important role, as it is more rapid and cost effective than the trial and error methods using experimental studies. A combination of virtual screening, molecular docking, molecular simulations and machine learning techniques are applied on a library of natural compounds to identify ligands that show significant binding affinity at the hydrophobic pocket of the RBD. A list of ligands with high binding affinity was obtained using molecular docking and molecular dynamics (MD) simulations for protein-ligand complexes. Machine learning (ML) classification schemes have been applied to obtain features of ligands and important descriptors, which help in identification of better binding ligands. A plethora of descriptors were used for training the self-organizing map algorithm. The model brings out descriptors important for protein-ligand interactions.

Structural classification of proteins using texture descriptors extracted from the cellular automata image.

Kavianpour, Hamidreza; Vasighi, Mahdi.

Amino Acids ; 49(2): 261-271, 2017 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-27778167

RESUMO

Nowadays, having knowledge about cellular attributes of proteins has an important role in pharmacy, medical science and molecular biology. These attributes are closely correlated with the function and three-dimensional structure of proteins. Knowledge of protein structural class is used by various methods for better understanding the protein functionality and folding patterns. Computational methods and intelligence systems can have an important role in performing structural classification of proteins. Most of protein sequences are saved in databanks as characters and strings and a numerical representation is essential for applying machine learning methods. In this work, a binary representation of protein sequences is introduced based on reduced amino acids alphabets according to surrounding hydrophobicity index. Many important features which are hidden in these long binary sequences can be clearly displayed through their cellular automata images. The extracted features from these images are used to build a classification model by support vector machine. Comparing to previous studies on the several benchmark datasets, the promising classification rates obtained by tenfold cross-validation imply that the current approach can help in revealing some inherent features deeply hidden in protein sequences and improve the quality of predicting protein structural class.

Assuntos

Algoritmos , Aminoácidos/química , Biologia Computacional/métodos , Proteínas/química , Proteínas/classificação , Bases de Dados de Proteínas , Interações Hidrofóbicas e Hidrofílicas , Proteínas/metabolismo

Oblique rotation of factors: a novel pattern recognition strategy to classify fluorescence excitation-emission matrices of human blood plasma for early diagnosis of colorectal cancer.

Shahbazy, Mohammad; Vasighi, Mahdi; Kompany-Zareh, Mohsen; Ballabio, Davide.

Mol Biosyst ; 12(6): 1963-75, 2016 05 24.

Artigo em Inglês | MEDLINE | ID: mdl-27076033

RESUMO

Colorectal cancer (CRC) ranks high in both men and women, accounting for about 13% of all cancers. In this study, a novel pattern recognition strategy is proposed to improve early diagnosis of CRC through visualizing the relationship between different spectral patterns in a case-control research. Partial least squares-discriminant analysis (PLS-DA) and supervised Kohonen network (SKN) were used to classify the fluorescence excitation-emission matrices (EEMs) from 289 human blood plasma samples containing CRC patients, adenomas tumor, other non-malignant findings and healthy individuals. To obtain optimal factors, oblique rotation (OR) and genetic algorithm (GA) were used to rotate the factors by optimizing transformation matrix elements. Transformed factors were introduced to SKN to build a classification model and the model performance was examined via comparison with a common classifier; PLS-DA. Classification models were built for CRC-healthy and adenomas-healthy samples and the best results were obtained through applying GA-OR on PLS factors and introducing them to the classifiers. Non-error rates for SKN and PLS-DA models assisted with GA (for selecting more informative PLS factors) and OR were equal to 0.97 and 0.95 in cross validation and 0.93 and 0.90 for prediction of the external test set, respectively. Moreover, according to the acceptable results for adenomas-healthy cases using optimal factors, CRC can be diagnosed in early stages. Combining classifiers and optimal factors proved to be efficient for distinguishing healthy and malignant samples, and OR can significantly improve performance of the classification model.

Assuntos

Biomarcadores Tumorais , Neoplasias Colorretais/sangue , Neoplasias Colorretais/diagnóstico , Metabolômica/métodos , Reconhecimento Automatizado de Padrão/métodos , Espectrometria de Fluorescência/métodos , Algoritmos , Análise Discriminante , Detecção Precoce de Câncer , Humanos , Modelos Estatísticos , Redes Neurais de Computação , Reprodutibilidade dos Testes , Software

Effects of supervised Self Organising Maps parameters on classification performance.

Ballabio, Davide; Vasighi, Mahdi; Filzmoser, Peter.

Anal Chim Acta ; 765: 45-53, 2013 Feb 26.

Artigo em Inglês | MEDLINE | ID: mdl-23410625

RESUMO

Self Organising Maps (SOMs) are one of the most powerful learning strategies among neural networks algorithms. SOMs have several adaptable parameters and the selection of appropriate network architectures is required in order to make accurate predictions. The major disadvantage of SOMs is probably due to the network optimisation, since this procedure can be often time-expensive. Effects of network size, training epochs and learning rate on the classification performance of SOMs are known, whereas the effect of other parameters (type of SOMs, weights initialisation, training algorithm, topology and boundary conditions) are not so obvious. This study was addressed to analyse the effect of SOMs parameters on the network classification performance, as well as on their computational times, taking into consideration a significant number of real datasets, in order to achieve a comprehensive statistical comparison. Parameters were contemporaneously evaluated by means of an approach based on the design of experiments, which enabled the investigation of their interaction effects. Results highlighted the most important parameters which influence the classification performance and enabled the identification of the optimal settings, as well as the optimal architectures to reduce the computational time of SOMs.

Assuntos

Algoritmos , Análise de Variância , Redes Neurais de Computação , Projetos de Pesquisa

Nuclear magnetic resonance-based screening of thalassemia and quantification of some hematological parameters using chemometric methods.

Arjmand, Mohammad; Kompany-Zareh, Mohsen; Vasighi, Mahdi; Parvizzadeh, Nastran; Zamani, Zahra; Nazgooei, Fereshteh.

Talanta ; 81(4-5): 1229-36, 2010 Jun 15.

Artigo em Inglês | MEDLINE | ID: mdl-20441889

RESUMO

High-resolution (1)H NMR spectroscopy of biofluids is a good representation of metabolic pattern and offers a high potential noninvasive technique for pathological diagnosis. Diagnosis of thalassemia and quantification of some blood parameters can be performed by using (1)H NMR spectra of human blood serum in parallel with chemometric techniques. Spectra of 28 samples were collected from 15 adult male and female thalassemia patients as experimental set and 13 healthy volunteers as control set. Principal component analysis (PCA) as a dimension reduction tool was used for transforming spectra to abstract factors. The abstract factors were introduced to linear discriminant analysis (LDA), which is a common technique for classification, in order to establish adequate model for discrimination of healthy and unhealthy samples. In addition, these abstract factors were used for calibration of some blood parameters using radial basis function neural network (RBFNN) as an artificial intelligence modeling method. Different test sets (left out samples in training algorithm) were used for evaluating the quality and robustness of the built models. PCA abstract factors were employed as input for LDA model and successfully classified all the members of the test sets except one member of third test set. RBFNN also has a good capability for modeling the most of blood parameters according to proposed network parameters optimization procedure. We conclude that (1)H NMR spectroscopy, LDA and RBFNN assisted by PCA provide a powerful method for thalassemia diagnosis and prediction of some blood variants.

Assuntos

Técnicas de Química Analítica , Espectroscopia de Ressonância Magnética/métodos , Talassemia/sangue , Talassemia/diagnóstico , Algoritmos , Estudos de Casos e Controles , Feminino , Humanos , Análise dos Mínimos Quadrados , Modelos Lineares , Masculino , Metabolômica , Redes Neurais de Computação , Análise de Componente Principal , Análise de Regressão , Reprodutibilidade dos Testes

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA