Pesquisa | Portal Regional da BVS

An ensemble-based machine learning model for predicting type 2 diabetes and its effect on bone health.

Alsadi, Belqes; Musleh, Saleh; Al-Absi, Hamada R H; Refaee, Mahmoud; Qureshi, Rizwan; El Hajj, Nady; Alam, Tanvir.

BMC Med Inform Decis Mak ; 24(1): 144, 2024 May 29.

Artigo em Inglês | MEDLINE | ID: mdl-38811939

RESUMO

BACKGROUND: Diabetes is a chronic condition that can result in many long-term physiological, metabolic, and neurological complications. Therefore, early detection of diabetes would help to determine a proper diagnosis and treatment plan. METHODS: In this study, we employed machine learning (ML) based case-control study on a diabetic cohort size of 1000 participants form Qatar Biobank to predict diabetes using clinical and bone health indicators from Dual Energy X-ray Absorptiometry (DXA) machines. ML models were utilized to distinguish diabetes groups from non-diabetes controls. Recursive feature elimination (RFE) was leveraged to identify a subset of features to improve the performance of model. SHAP based analysis was used for the importance of features and support the explainability of the proposed model. RESULTS: Ensemble based models XGboost and RF achieved over 84% accuracy for detecting diabetes. After applying RFE, we selected only 20 features which improved the model accuracy to 87.2%. From a clinical standpoint, higher HDL-Cholesterol and Neutrophil levels were observed in the diabetic group, along with lower vitamin B12 and testosterone levels. Lower sodium levels were found in diabetics, potentially stemming from clinical factors including specific medications, hormonal imbalances, unmanaged diabetes. We believe Dapagliflozin prescriptions in Qatar were associated with decreased Gamma Glutamyltransferase and Aspartate Aminotransferase enzyme levels, confirming prior research. We observed that bone area, bone mineral content, and bone mineral density were slightly lower in the Diabetes group across almost all body parts, but the difference against the control group was not statistically significant except in T12, troch and trunk area. No significant negative impact of diabetes progression on bone health was observed over a period of 5-15 yrs in the cohort. CONCLUSION: This study recommends the inclusion of ML model which combines both DXA and clinical data for the early diagnosis of diabetes.

Assuntos

Absorciometria de Fóton , Diabetes Mellitus Tipo 2 , Aprendizado de Máquina , Humanos , Pessoa de Meia-Idade , Masculino , Estudos de Casos e Controles , Feminino , Catar , Adulto , Idoso , Densidade Óssea

DPI_CDF: druggable protein identifier using cascade deep forest.

Arif, Muhammad; Fang, Ge; Ghulam, Ali; Musleh, Saleh; Alam, Tanvir.

BMC Bioinformatics ; 25(1): 145, 2024 Apr 05.

Artigo em Inglês | MEDLINE | ID: mdl-38580921

RESUMO

BACKGROUND: Drug targets in living beings perform pivotal roles in the discovery of potential drugs. Conventional wet-lab characterization of drug targets is although accurate but generally expensive, slow, and resource intensive. Therefore, computational methods are highly desirable as an alternative to expedite the large-scale identification of druggable proteins (DPs); however, the existing in silico predictor's performance is still not satisfactory. METHODS: In this study, we developed a novel deep learning-based model DPI_CDF for predicting DPs based on protein sequence only. DPI_CDF utilizes evolutionary-based (i.e., histograms of oriented gradients for position-specific scoring matrix), physiochemical-based (i.e., component protein sequence representation), and compositional-based (i.e., normalized qualitative characteristic) properties of protein sequence to generate features. Then a hierarchical deep forest model fuses these three encoding schemes to build the proposed model DPI_CDF. RESULTS: The empirical outcomes on 10-fold cross-validation demonstrate that the proposed model achieved 99.13 % accuracy and 0.982 of Matthew's-correlation-coefficient (MCC) on the training dataset. The generalization power of the trained model is further examined on an independent dataset and achieved 95.01% of maximum accuracy and 0.900 MCC. When compared to current state-of-the-art methods, DPI_CDF improves in terms of accuracy by 4.27% and 4.31% on training and testing datasets, respectively. We believe, DPI_CDF will support the research community to identify druggable proteins and escalate the drug discovery process. AVAILABILITY: The benchmark datasets and source codes are available in GitHub: http://github.com/Muhammad-Arif-NUST/DPI_CDF .

Assuntos

Proteínas , Software , Sequência de Aminoácidos , Matrizes de Pontuação de Posição Específica , Evolução Biológica , Biologia Computacional/métodos

Unified mRNA Subcellular Localization Predictor based on machine learning techniques.

Musleh, Saleh; Arif, Muhammad; Alajez, Nehad M; Alam, Tanvir.

BMC Genomics ; 25(1): 151, 2024 Feb 07.

Artigo em Inglês | MEDLINE | ID: mdl-38326777

RESUMO

BACKGROUND: The mRNA subcellular localization bears substantial impact in the regulation of gene expression, cellular migration, and adaptation. However, the methods employed for experimental determination of this localization are arduous, time-intensive, and come with a high cost. METHODS: In this research article, we tackle the essential challenge of predicting the subcellular location of messenger RNAs (mRNAs) through Unified mRNA Subcellular Localization Predictor (UMSLP), a machine learning (ML) based approach. We embrace an in silico strategy that incorporate four distinct feature sets: kmer, pseudo k-tuple nucleotide composition, nucleotide physicochemical attributes, and the 3D sequence depiction achieved via Z-curve transformation for predicting subcellular localization in benchmark dataset across five distinct subcellular locales, encompassing nucleus, cytoplasm, extracellular region (ExR), mitochondria, and endoplasmic reticulum (ER). RESULTS: The proposed ML model UMSLP attains cutting-edge outcomes in predicting mRNA subcellular localization. On independent testing dataset, UMSLP ahcieved over 87% precision, 94% specificity, and 94% accuracy. Compared to other existing tools, UMSLP outperformed mRNALocator, mRNALoc, and SubLocEP by 11%, 21%, and 32%, respectively on average prediction accuracy for all five locales. SHapley Additive exPlanations analysis highlights the dominance of k-mer features in predicting cytoplasm, nucleus, ER, and ExR localizations, while Z-curve based features play pivotal roles in mitochondria subcellular localization detection. AVAILABILITY: We have shared datasets, code, Docker API for users in GitHub at: https://github.com/smusleh/UMSLP .

Assuntos

Retículo Endoplasmático , Mitocôndrias , RNA Mensageiro/genética , Mitocôndrias/genética , Biologia Computacional/métodos , Aprendizado de Máquina , Nucleotídeos

iMRSAPred: Improved Prediction of Anti-MRSA Peptides Using Physicochemical and Pairwise Contact-Energy Properties of Amino Acids.

Arif, Muhammad; Fang, Ge; Fida, Huma; Musleh, Saleh; Yu, Dong-Jun; Alam, Tanvir.

ACS Omega ; 9(2): 2874-2883, 2024 Jan 16.

Artigo em Inglês | MEDLINE | ID: mdl-38250405

RESUMO

Methicillin-resistant Staphylococcus aureus (MRSA) is a growing concern for human lives worldwide. Anti-MRSA peptides act as potential antibiotic agents and play significant role to combat MRSA infection. Traditional laboratory-based methods for annotating Anti-MRSA peptides are although precise but quite challenging, costly, and time-consuming. Therefore, computational methods capable of identifying Anti-MRSA peptides accelerate the drug designing process for treating bacterial infections. In this study, we developed a novel sequence-based predictor "iMRSAPred" for screening Anti-MRSA peptides by incorporating energy estimation and physiochemical and sequential information. We successfully resolved the skewed imbalance phenomena by using synthetic minority oversampling technique plus Tomek link (SMOTETomek) algorithm. Furthermore, the Shapley additive explanation method was leveraged to analyze the impact of top-ranked features in the prediction task. We evaluated multiple machine learning algorithms, i.e., CatBoost, Cascade Deep Forest, Kernel and Tree Boosting, support vector machine, and HistGBoost classifiers by 10-fold cross-validation and independent testing. The proposed iMRSAPred method significantly improved the overall performance in terms of accuracy and Matthew's correlation coefficient (MCC) by 5.45 and 0.083%, respectively, on the training data set. On the independent data set, iMRSAPred improved accuracy and MCC by 3.98 and 0.055%, respectively. We believe that the proposed method would be useful in large-scale Anti-MRSA peptide prediction and provide insights into other bioactive peptides.

Predicting Overall Survival in METABRIC Cohort Using Machine Learning.

Banu, Afroz; Ahmed, Rayyan; Musleh, Saleh; Shah, Zubair; Househ, Mowafa; Alam, Tanvir.

Stud Health Technol Inform ; 305: 632-635, 2023 Jun 29.

Artigo em Inglês | MEDLINE | ID: mdl-37387111

RESUMO

Triple-negative breast cancer (TNBC) is an aggressive form of breast cancer that presents very high relapse and mortality. However, due to differences in the genetic architecture associated with TNBC, patients have different outcomes and respond differently to available treatments. In this study, we predicted the overall survival of TNBC patients in the METABRIC cohort employing supervised machine learning to identify important clinical and genetic features that are associated with better survival. We achieved a slightly higher Concordance index than the state of art and identified biological pathways related to the top genes considered important by our model.

Assuntos

Neoplasias de Mama Triplo Negativas , Humanos , Aprendizado de Máquina , Aprendizado de Máquina Supervisionado , Agressão

Correction: MSLP: mRNA subcellular localization predictor based on machine learning techniques.

Musleh, Saleh; Islam, Mohammad Tariqul; Qureshi, Rizwan; Alajez, Nehad M; Alam, Tanvir.

BMC Bioinformatics ; 24(1): 156, 2023 Apr 18.

Artigo em Inglês | MEDLINE | ID: mdl-37072697

MSLP: mRNA subcellular localization predictor based on machine learning techniques.

Musleh, Saleh; Islam, Mohammad Tariqul; Qureshi, Rizwan; Alajez, Nehad M; Alam, Tanvir.

BMC Bioinformatics ; 24(1): 109, 2023 Mar 22.

Artigo em Inglês | MEDLINE | ID: mdl-36949389

RESUMO

BACKGROUND: Subcellular localization of messenger RNA (mRNAs) plays a pivotal role in the regulation of gene expression, cell migration as well as in cellular adaptation. Experiment techniques for pinpointing the subcellular localization of mRNAs are laborious, time-consuming and expensive. Therefore, in silico approaches for this purpose are attaining great attention in the RNA community. METHODS: In this article, we propose MSLP, a machine learning-based method to predict the subcellular localization of mRNA. We propose a novel combination of four types of features representing k-mer, pseudo k-tuple nucleotide composition (PseKNC), physicochemical properties of nucleotides, and 3D representation of sequences based on Z-curve transformation to feed into machine learning algorithm to predict the subcellular localization of mRNAs. RESULTS: Considering the combination of the above-mentioned features, ennsemble-based models achieved state-of-the-art results in mRNA subcellular localization prediction tasks for multiple benchmark datasets. We evaluated the performance of our method in ten subcellular locations, covering cytoplasm, nucleus, endoplasmic reticulum (ER), extracellular region (ExR), mitochondria, cytosol, pseudopodium, posterior, exosome, and the ribosome. Ablation study highlighted k-mer and PseKNC to be more dominant than other features for predicting cytoplasm, nucleus, and ER localizations. On the other hand, physicochemical properties and Z-curve based features contributed the most to ExR and mitochondria detection. SHAP-based analysis revealed the relative importance of features to provide better insights into the proposed approach. AVAILABILITY: We have implemented a Docker container and API for end users to run their sequences on our model. Datasets, the code of API and the Docker are shared for the community in GitHub at: https://github.com/smusleh/MSLP .

Assuntos

Algoritmos , Núcleo Celular , RNA Mensageiro/genética , Ribossomos , Aprendizado de Máquina , Biologia Computacional/métodos

COVID-19Base v3: Update of the knowledgebase for drugs and biomedical entities linked to COVID-19.

Basit, Syed Abdullah; Qureshi, Rizwan; Musleh, Saleh; Guler, Reto; Rahman, M Sohel; Biswas, Kabir H; Alam, Tanvir.

Front Public Health ; 11: 1125917, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-36950105

RESUMO

COVID-19 has taken a huge toll on our lives over the last 3 years. Global initiatives put forward by all stakeholders are still in place to combat this pandemic and help us learn lessons for future ones. While the vaccine rollout was not able to curb the spread of the disease for all strains, the research community is still trying to develop effective therapeutics for COVID-19. Although Paxlovid and remdesivir have been approved by the FDA against COVID-19, they are not free of side effects. Therefore, the search for a therapeutic solution with high efficacy continues in the research community. To support this effort, in this latest version (v3) of COVID-19Base, we have summarized the biomedical entities linked to COVID-19 that have been highlighted in the scientific literature after the vaccine rollout. Eight different topic-specific dictionaries, i.e., gene, miRNA, lncRNA, PDB entries, disease, alternative medicines registered under clinical trials, drugs, and the side effects of drugs, were used to build this knowledgebase. We have introduced a BLSTM-based deep-learning model to predict the drug-disease associations that outperforms the existing model for the same purpose proposed in the earlier version of COVID-19Base. For the very first time, we have incorporated disease-gene, disease-miRNA, disease-lncRNA, and drug-PDB associations covering the largest number of biomedical entities related to COVID-19. We have provided examples of and insights into different biomedical entities covered in COVID-19Base to support the research community by incorporating all of these entities under a single platform to provide evidence-based support from the literature. COVID-19Base v3 can be accessed from: https://covidbase-v3.vercel.app/. The GitHub repository for the source code and data dictionaries is available to the community from: https://github.com/91Abdullah/covidbasev3.0.

Assuntos

COVID-19 , MicroRNAs , RNA Longo não Codificante , Humanos , SARS-CoV-2 , Bases de Conhecimento

ALLD: Acute Lymphoblastic Leukemia Detector.

Musleh, Saleh; Islam, Mohammad Tariqul; Alam, Mohammad Towfik; Househ, Mowafa; Shah, Zubair; Alam, Tanvir.

Stud Health Technol Inform ; 289: 77-80, 2022 Jan 14.

Artigo em Inglês | MEDLINE | ID: mdl-35062096

RESUMO

Acute Lymphoblastic Leukemia (ALL) is a life-threatening type of cancer wherein mortality rate is unquestionably high. Early detection of ALL can reduce both the rate of fatality as well as improve the diagnosis plan for patients. In this study, we developed the ALL Detector (ALLD), which is a deep learning-based network to distinguish ALL patients from healthy individuals based on blast cell microscopic images. We evaluated multiple DL-based models and the ResNet-based model performed the best with 98% accuracy in the classification task. We also compared the performance of ALLD against state-of-the-art tools utilized for the same purpose, and ALLD outperformed them all. We believe that ALLD will support pathologists to explicitly diagnose ALL in the early stages and reduce the burden on clinical practice overall.

Assuntos

Aprendizado Profundo , Leucemia-Linfoma Linfoblástico de Células Precursoras , Humanos , Redes Neurais de Computação

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA