Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 12 de 12
Filter
Add more filters










Publication year range
1.
Comput Struct Biotechnol J ; 23: 1864-1876, 2024 Dec.
Article in English | MEDLINE | ID: mdl-38707536

ABSTRACT

In current genomic research, the widely used methods for predicting antimicrobial resistance (AMR) often rely on prior knowledge of known AMR genes or reference genomes. However, these methods have limitations, potentially resulting in imprecise predictions owing to incomplete coverage of AMR mechanisms and genetic variations. To overcome these limitations, we propose a pan-genome-based machine learning approach to advance our understanding of AMR gene repertoires and uncover possible feature sets for precise AMR classification. By building compacted de Brujin graphs (cDBGs) from thousands of genomes and collecting the presence/absence patterns of unique sequences (unitigs) for Pseudomonas aeruginosa, we determined that using machine learning models on unitig-centered pan-genomes showed significant promise for accurately predicting the antibiotic resistance or susceptibility of microbial strains. Applying a feature-selection-based machine learning algorithm led to satisfactory predictive performance for the training dataset (with an area under the receiver operating characteristic curve (AUC) of > 0.929) and an independent validation dataset (AUC, approximately 0.77). Furthermore, the selected unitigs revealed previously unidentified resistance genes, allowing for the expansion of the resistance gene repertoire to those that have not previously been described in the literature on antibiotic resistance. These results demonstrate that our proposed unitig-based pan-genome feature set was effective in constructing machine learning predictors that could accurately identify AMR pathogens. Gene sets extracted using this approach may offer valuable insights into expanding known AMR genes and forming new hypotheses to uncover the underlying mechanisms of bacterial AMR.

2.
Cancers (Basel) ; 15(12)2023 Jun 08.
Article in English | MEDLINE | ID: mdl-37370717

ABSTRACT

Valvular Heart Disease (VHD) is a known late complication of radiotherapy for childhood cancer (CC), and identifying high-risk survivors correctly remains a challenge. This paper focuses on the distribution of the radiation dose absorbed by heart tissues. We propose that a dosiomics signature could provide insight into the spatial characteristics of the heart dose associated with a VHD, beyond the already-established risk induced by high doses. We analyzed data from the 7670 survivors of the French Childhood Cancer Survivors' Study (FCCSS), 3902 of whom were treated with radiotherapy. In all, 63 (1.6%) survivors that had been treated with radiotherapy experienced a VHD, and 57 of them had heterogeneous heart doses. From the heart-dose distribution of each survivor, we extracted 93 first-order and spatial dosiomics features. We trained random forest algorithms adapted for imbalanced classification and evaluated their predictive performance compared to the performance of standard mean heart dose (MHD)-based models. Sensitivity analyses were also conducted for sub-populations of survivors with spatially heterogeneous heart doses. Our results suggest that MHD and dosiomics-based models performed equally well globally in our cohort and that, when considering the sub-population having received a spatially heterogeneous dose distribution, the predictive capability of the models is significantly improved by the use of the dosiomics features. If these findings are further validated, the dosiomics signature may be incorporated into machine learning algorithms for radiation-induced VHD risk assessment and, in turn, into the personalized refinement of follow-up guidelines.

3.
Sci Rep ; 12(1): 13412, 2022 08 04.
Article in English | MEDLINE | ID: mdl-35927323

ABSTRACT

O6-Methylguanine-DNA-methyltransferase (MGMT) promoter methylation was shown in many studies to be an important predictive biomarker for temozolomide (TMZ) resistance and poor progression-free survival in glioblastoma multiforme (GBM) patients. However, identifying the MGMT methylation status using molecular techniques remains challenging due to technical limitations, such as the inability to obtain tumor specimens, high prices for detection, and the high complexity of intralesional heterogeneity. To overcome these difficulties, we aimed to test the feasibility of using a novel radiomics-based machine learning (ML) model to preoperatively and noninvasively predict the MGMT methylation status. In this study, radiomics features extracted from multimodal images of GBM patients with annotated MGMT methylation status were downloaded from The Cancer Imaging Archive (TCIA) public database for retrospective analysis. The radiomics features extracted from multimodal images from magnetic resonance imaging (MRI) had undergone a two-stage feature selection method, including an eXtreme Gradient Boosting (XGBoost) feature selection model followed by a genetic algorithm (GA)-based wrapper model for extracting the most meaningful radiomics features for predictive purposes. The cross-validation results suggested that the GA-based wrapper model achieved the high performance with a sensitivity of 0.894, specificity of 0.966, and accuracy of 0.925 for predicting the MGMT methylation status in GBM. Application of the extracted GBM radiomics features on a low-grade glioma (LGG) dataset also achieved a sensitivity 0.780, specificity 0.620, and accuracy 0.750, indicating the potential of the selected radiomics features to be applied more widely on both low- and high-grade gliomas. The performance indicated that our model may potentially confer significant improvements in prognosis and treatment responses in GBM patients.


Subject(s)
Brain Neoplasms , Glioblastoma , Glioma , Brain Neoplasms/diagnostic imaging , Brain Neoplasms/genetics , Brain Neoplasms/pathology , DNA Methylation , DNA Modification Methylases/genetics , DNA Modification Methylases/metabolism , DNA Repair Enzymes/genetics , DNA Repair Enzymes/metabolism , Glioblastoma/diagnostic imaging , Glioblastoma/genetics , Glioma/genetics , Humans , Machine Learning , O(6)-Methylguanine-DNA Methyltransferase/genetics , Retrospective Studies , Tumor Suppressor Proteins/genetics
4.
Cancers (Basel) ; 14(14)2022 Jul 18.
Article in English | MEDLINE | ID: mdl-35884551

ABSTRACT

Glioma is a Center Nervous System (CNS) neoplasm that arises from the glial cells. In a new scheme category of the World Health Organization 2016, lower-grade gliomas (LGGs) are grade II and III gliomas. Following the discovery of suppression of negative immune regulation, immunotherapy is a promising effective treatment method for lower-grade glioma patients. However, the therapy is not effective for all types of LGGs, and tumor mutational burden (TMB) has been shown to be a potential biomarker for the susceptibility and prognosis of immunotherapy in lower-grade glioma patients. Hence, predicting TMB benefits brain cancer patients. In this study, we investigated the correlation between MRI (magnetic resonance imaging)-based radiomic features and TMB in LGG by applying machine learning methods. Six machine learning classifiers were examined on the features extracted from the genetic algorithm. Subsequently, a light gradient boosting machine (LightGBM) succeeded in selecting 11 radiomics signatures for TMB classification. Our LightGBM model resulted in high accuracy of 0.7936, and reached a balance between sensitivity and specificity, achieving 0.76 and 0.8107, respectively. To our knowledge, our study represents the best model for classification of TMB in LGG patients at present.

5.
NMR Biomed ; 35(11): e4792, 2022 11.
Article in English | MEDLINE | ID: mdl-35767281

ABSTRACT

In 2016, the World Health Organization (WHO) updated the glioma classification by incorporating molecular biology parameters, including low-grade glioma (LGG). In the new scheme, LGGs have three molecular subtypes: isocitrate dehydrogenase (IDH)-mutated 1p/19q-codeleted, IDH-mutated 1p/19q-noncodeleted, and IDH-wild type 1p/19q-noncodeleted entities. This work proposes a model prediction of LGG molecular subtypes using magnetic resonance imaging (MRI). MR images were segmented and converted into radiomics features, thereby providing predictive information about the brain tumor classification. With 726 raw features obtained from the feature extraction procedure, we developed a hybrid machine learning-based radiomics by incorporating a genetic algorithm and eXtreme Gradient Boosting (XGBoost) classifier, to ascertain 12 optimal features for tumor classification. To resolve imbalanced data, the synthetic minority oversampling technique (SMOTE) was applied in our study. The XGBoost algorithm outperformed the other algorithms on the training dataset by an accuracy value of 0.885. We continued evaluating the XGBoost model, then achieved an overall accuracy of 0.6905 for the three-subtype classification of LGGs on an external validation dataset. Our model is among just a few to have resolved the three-subtype LGG classification challenge with high accuracy compared with previous studies performing similar work.


Subject(s)
Brain Neoplasms , Glioma , Brain Neoplasms/diagnostic imaging , Brain Neoplasms/pathology , Glioma/pathology , Humans , Isocitrate Dehydrogenase/genetics , Machine Learning , Magnetic Resonance Imaging/methods , Mutation/genetics , Retrospective Studies
6.
Food Chem ; 373(Pt B): 131469, 2022 Mar 30.
Article in English | MEDLINE | ID: mdl-34731813

ABSTRACT

An ethanol extract of avocado seed (TN-1) and six smaller fractions (PD-1 to PD-6) were prepared. Most of the extracts exhibited scavenging DPPH radical, reducing Fe3+ to Fe2+, and inhibiting polyphenoloxidase, consistently matching with their high polyphenolic content (p < 0.05). Most of the 47 compounds identified from TN-1 were classified into phenolic acid, condensed tannin, flavonoid, fatty acids, and alkaloids. Two extracts TN-1 and PD-2 (0.025%, w/v) were used to treat white-leg shrimp and the quality changes were evaluated compared to those treated with sodium metabisulfite (1.25%, w/v) and controls (without treatment) during 8-day storage at 2 °C. Changes in melanosis scores, lipid peroxidation, pHs, microorganisms, and nutrient in shrimps treated with the extracts were comparable to or even much better than others. These results promise a potential use of avocado seed extract as a cost-effective, eco-friendly, and effective alternative to commercial additives in shrimp storage.


Subject(s)
Persea , Antioxidants , Flavonoids , Lipid Peroxidation , Plant Extracts
7.
Gene ; 787: 145643, 2021 Jun 30.
Article in English | MEDLINE | ID: mdl-33848577

ABSTRACT

Krüppel-like factors (KLF) refer to a group of conserved zinc finger-containing transcription factors that are involved in various physiological and biological processes, including cell proliferation, differentiation, development, and apoptosis. Some bioinformatics methods such as sequence similarity searches, multiple sequence alignment, phylogenetic reconstruction, and gene synteny analysis have also been proposed to broaden our knowledge of KLF proteins. In this study, we proposed a novel computational approach by using machine learning on features calculated from primary sequences. To detail, our XGBoost-based model is efficient in identifying KLF proteins, with accuracy of 96.4% and MCC of 0.704. It also holds a promising performance when testing our model on an independent dataset. Therefore, our model could serve as an useful tool to identify new KLF proteins and provide necessary information for biologists and researchers in KLF proteins. Our machine learning source codes as well as datasets are freely available at https://github.com/khanhlee/KLF-XGB.


Subject(s)
Computational Biology , Kruppel-Like Transcription Factors/chemistry , Algorithms , Amino Acid Sequence , Animals , Computational Biology/methods , Databases, Protein , Humans , Kruppel-Like Transcription Factors/analysis , Kruppel-Like Transcription Factors/genetics , Machine Learning , Models, Biological
8.
Comput Biol Med ; 132: 104320, 2021 05.
Article in English | MEDLINE | ID: mdl-33735760

ABSTRACT

BACKGROUND: In the field of glioma, transcriptome subtypes have been considered as an important diagnostic and prognostic biomarker that may help improve the treatment efficacy. However, existing identification methods of transcriptome subtypes are limited due to the relatively long detection period, the unattainability of tumor specimens via biopsy or surgery, and the fleeting nature of intralesional heterogeneity. In search of a superior model over previous ones, this study evaluated the efficiency of eXtreme Gradient Boosting (XGBoost)-based radiomics model to classify transcriptome subtypes in glioblastoma patients. METHODS: This retrospective study retrieved patients from TCGA-GBM and IvyGAP cohorts with pathologically diagnosed glioblastoma, and separated them into different transcriptome subtypes groups. GBM patients were then segmented into three different regions of MRI: enhancement of the tumor core (ET), non-enhancing portion of the tumor core (NET), and peritumoral edema (ED). We subsequently used handcrafted radiomics features (n = 704) from multimodality MRI and two-level feature selection techniques (Spearman correlation and F-score tests) in order to find the features that could be relevant. RESULTS: After the feature selection approach, we identified 13 radiomics features that were the most meaningful ones that can be used to reach the optimal results. With these features, our XGBoost model reached the predictive accuracies of 70.9%, 73.3%, 88.4%, and 88.4% for classical, mesenchymal, neural, and proneural subtypes, respectively. Our model performance has been improved in comparison with the other models as well as previous works on the same dataset. CONCLUSION: The use of XGBoost and two-level feature selection analysis (Spearman correlation and F-score) could be expected as a potential combination for classifying transcriptome subtypes with high performance and might raise public attention for further research on radiomics-based GBM models.


Subject(s)
Brain Neoplasms , Glioblastoma , Humans , Machine Learning , Magnetic Resonance Imaging , Retrospective Studies , Transcriptome
9.
Brief Bioinform ; 22(3)2021 05 20.
Article in English | MEDLINE | ID: mdl-32613242

ABSTRACT

Protein S-sulfenylation is one kind of crucial post-translational modifications (PTMs) in which the hydroxyl group covalently binds to the thiol of cysteine. Some recent studies have shown that this modification plays an important role in signaling transduction, transcriptional regulation and apoptosis. To date, the dynamic of sulfenic acids in proteins remains unclear because of its fleeting nature. Identifying S-sulfenylation sites, therefore, could be the key to decipher its mysterious structures and functions, which are important in cell biology and diseases. However, due to the lack of effective methods, scientists in this field tend to be limited in merely a handful of some wet lab techniques that are time-consuming and not cost-effective. Thus, this motivated us to develop an in silico model for detecting S-sulfenylation sites only from protein sequence information. In this study, protein sequences served as natural language sentences comprising biological subwords. The deep neural network was consequentially employed to perform classification. The performance statistics within the independent dataset including sensitivity, specificity, accuracy, Matthews correlation coefficient and area under the curve rates achieved 85.71%, 69.47%, 77.09%, 0.5554 and 0.833, respectively. Our results suggested that the proposed method (fastSulf-DNN) achieved excellent performance in predicting S-sulfenylation sites compared to other well-known tools on a benchmark dataset.


Subject(s)
Databases, Protein , Neural Networks, Computer , Protein Processing, Post-Translational , Sequence Analysis, Protein , Sulfenic Acids , Sulfenic Acids/chemistry , Sulfenic Acids/metabolism
10.
Int J Mol Sci ; 21(23)2020 Nov 28.
Article in English | MEDLINE | ID: mdl-33260643

ABSTRACT

Essential genes contain key information of genomes that could be the key to a comprehensive understanding of life and evolution. Because of their importance, studies of essential genes have been considered a crucial problem in computational biology. Computational methods for identifying essential genes have become increasingly popular to reduce the cost and time-consumption of traditional experiments. A few models have addressed this problem, but performance is still not satisfactory because of high dimensional features and the use of traditional machine learning algorithms. Thus, there is a need to create a novel model to improve the predictive performance of this problem from DNA sequence features. This study took advantage of a natural language processing (NLP) model in learning biological sequences by treating them as natural language words. To learn the NLP features, a supervised learning model was consequentially employed by an ensemble deep neural network. Our proposed method could identify essential genes with sensitivity, specificity, accuracy, Matthews correlation coefficient (MCC), and area under the receiver operating characteristic curve (AUC) values of 60.2%, 84.6%, 76.3%, 0.449, and 0.814, respectively. The overall performance outperformed the single models without ensemble, as well as the state-of-the-art predictors on the same benchmark dataset. This indicated the effectiveness of the proposed method in determining essential genes, in particular, and other sequencing problems, in general.


Subject(s)
Algorithms , Deep Learning , Genes, Essential , Neural Networks, Computer , Area Under Curve , Reproducibility of Results , Sequence Analysis, DNA , Species Specificity
11.
J Pers Med ; 10(3)2020 Sep 15.
Article in English | MEDLINE | ID: mdl-32942564

ABSTRACT

Approximately 96% of patients with glioblastomas (GBM) have IDH1 wildtype GBMs, characterized by extremely poor prognosis, partly due to resistance to standard temozolomide treatment. O6-Methylguanine-DNA methyltransferase (MGMT) promoter methylation status is a crucial prognostic biomarker for alkylating chemotherapy resistance in patients with GBM. However, MGMT methylation status identification methods, where the tumor tissue is often undersampled, are time consuming and expensive. Currently, presurgical noninvasive imaging methods are used to identify biomarkers to predict MGMT methylation status. We evaluated a novel radiomics-based eXtreme Gradient Boosting (XGBoost) model to identify MGMT promoter methylation status in patients with IDH1 wildtype GBM. This retrospective study enrolled 53 patients with pathologically proven GBM and tested MGMT methylation and IDH1 status. Radiomics features were extracted from multimodality MRI and tested by F-score analysis to identify important features to improve our model. We identified nine radiomics features that reached an area under the curve of 0.896, which outperformed other classifiers reported previously. These features could be important biomarkers for identifying MGMT methylation status in IDH1 wildtype GBM. The combination of radiomics feature extraction and F-core feature selection significantly improved the performance of the XGBoost model, which may have implications for patient stratification and therapeutic strategy in GBM.

12.
Genomics ; 112(3): 2445-2451, 2020 05.
Article in English | MEDLINE | ID: mdl-31987913

ABSTRACT

DNA replication is a fundamental task that plays a crucial role in the propagation of all living things on earth. Hence, the accurate identification of its origin could be the key to giving an insightful understanding of the regulatory mechanism of gene expression. Indeed, with the robust development of computational techniques and the abundant biological sequencing data, it has become possible for scientists to identify the origin of replication accurately and promptly. This growing concern has drawn a lot of attention among experts in this field. However, to gain better outcomes, more work is required. Therefore, this study is designed to explore the combination of state-of-the-art features and extreme gradient boosting learning system in classifying DNA sequences. Our hybrid approach is able to identify the origin of DNA replication with achieved sensitivity of 85.19%, specificity of 93.83%, accuracy of 89.51%, and MCC of 0.7931. Evidence is presented to show that our proposed method is superior to the state-of-the-art methods on the same benchmark dataset. Moreover, the research results represent a further step towards developing the prediction models for DNA replication in particular and DNA sequences in general.


Subject(s)
Replication Origin , Saccharomyces cerevisiae/genetics , Sequence Analysis, DNA/methods , Genome, Fungal , Machine Learning
SELECTION OF CITATIONS
SEARCH DETAIL
...