Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
1.
Comput Biol Med ; 150: 106162, 2022 11.
Article in English | MEDLINE | ID: mdl-36252365

ABSTRACT

With the rapid development of science and technology, the trend of low age myopia is becoming increasingly significant. The latest national survey done by the Chinese government found that more than 80% of Chinese teenagers suffer from myopia. Adolescent myopia is closely related to living environment, heredity, and living habits. Quantifying the relationship between myopia and living environment, heredity, and living habits is conductive to the prevention and intervention of adolescent myopia. In this study, we investigated the relationships between four main factors (environment, habits, parental vision, and demographic) and myopia status by analyzing the questionnaire data. Data were collected from Chengdu, China in 2021, including 2808 myopia samples and 5693 non-myopia samples, with a total of 22 features. Then, these 22 features were inputted into three machine learning algorithms to discriminate the two classes of samples. Results show that the computational model could produce an AUC of 0.768. To pick out the most important features which play important roles in classification, we used incremental feature selection strategy to screen the 22 features. As a result, we found that the 4 most influential features with XGBoost could achieve a competitive AUC of 0.764. To further investigate the risk and protective factors affecting adolescent myopia, we used OR values derived from MLE-LR to analyze the relationship between 22 features and adolescent myopia. Results showed that the age variable was the most significant risk factor for myopia, followed by the myopia status of parents. The most protective factor for eyesight is the measure taken by the children, followed by the distance between books and eyes when reading. These discoveries can guide the prevention and control of myopia in children and adolescents.


Subject(s)
Myopia , Child , Adolescent , Humans , Myopia/epidemiology , Myopia/genetics , Surveys and Questionnaires , Eye , China/epidemiology , Machine Learning , Risk Factors
2.
Brief Bioinform ; 23(4)2022 07 18.
Article in English | MEDLINE | ID: mdl-35817303

ABSTRACT

Many studies have proved that small nucleolar RNAs (snoRNAs) play critical roles in the development of various human complex diseases. Discovering the associations between snoRNAs and diseases is an important step toward understanding the pathogenesis and characteristics of diseases. However, uncovering associations via traditional experimental approaches is costly and time-consuming. This study proposed a bounded nuclear norm regularization-based method, called PSnoD, to predict snoRNA-disease associations. Benchmark experiments showed that compared with the state-of-the-art methods, PSnoD achieved a superior performance in the 5-fold stratified shuffle split. PSnoD produced a robust performance with an area under receiver-operating characteristic of 0.90 and an area under precision-recall of 0.55, highlighting the effectiveness of our proposed method. In addition, the computational efficiency of PSnoD was also demonstrated by comparison with other matrix completion techniques. More importantly, the case study further elucidated the ability of PSnoD to screen potential snoRNA-disease associations. The code of PSnoD has been uploaded to https://github.com/linDing-groups/PSnoD. Based on PSnoD, we established a web server that is freely accessed via http://psnod.lin-group.cn/.


Subject(s)
Cell Nucleus , RNA, Small Nucleolar , Humans , RNA, Small Nucleolar/genetics
3.
Math Biosci Eng ; 19(4): 3597-3608, 2022 02 07.
Article in English | MEDLINE | ID: mdl-35341266

ABSTRACT

Diabetes is a metabolic disorder caused by insufficient insulin secretion and insulin secretion disorders. From health to diabetes, there are generally three stages: health, pre-diabetes and type 2 diabetes. Early diagnosis of diabetes is the most effective way to prevent and control diabetes and its complications. In this work, we collected the physical examination data from Beijing Physical Examination Center from January 2006 to December 2017, and divided the population into three groups according to the WHO (1999) Diabetes Diagnostic Standards: normal fasting plasma glucose (NFG) (FPG < 6.1 mmol/L), mildly impaired fasting plasma glucose (IFG) (6.1 mmol/L ≤ FPG < 7.0 mmol/L) and type 2 diabetes (T2DM) (FPG > 7.0 mmol/L). Finally, we obtained1,221,598 NFG samples, 285,965 IFG samples and 387,076 T2DM samples, with a total of 15 physical examination indexes. Furthermore, taking eXtreme Gradient Boosting (XGBoost), random forest (RF), Logistic Regression (LR), and Fully connected neural network (FCN) as classifiers, four models were constructed to distinguish NFG, IFG and T2DM. The comparison results show that XGBoost has the best performance, with AUC (macro) of 0.7874 and AUC (micro) of 0.8633. In addition, based on the XGBoost classifier, three binary classification models were also established to discriminate NFG from IFG, NFG from T2DM, IFG from T2DM. On the independent dataset, the AUCs were 0.7808, 0.8687, 0.7067, respectively. Finally, we analyzed the importance of the features and identified the risk factors associated with diabetes.


Subject(s)
Diabetes Mellitus, Type 2 , Prediabetic State , Blood Glucose/metabolism , Diabetes Mellitus, Type 2/diagnosis , Diabetes Mellitus, Type 2/epidemiology , Fasting , Humans , Physical Examination , Prediabetic State/diagnosis , Prediabetic State/epidemiology
4.
Int J Mol Sci ; 23(3)2022 Jan 23.
Article in English | MEDLINE | ID: mdl-35163174

ABSTRACT

4mC is a type of DNA alteration that has the ability to synchronize multiple biological movements, for example, DNA replication, gene expressions, and transcriptional regulations. Accurate prediction of 4mC sites can provide exact information to their hereditary functions. The purpose of this study was to establish a robust deep learning model to recognize 4mC sites in Geobacter pickeringii. In the anticipated model, two kinds of feature descriptors, namely, binary and k-mer composition were used to encode the DNA sequences of Geobacter pickeringii. The obtained features from their fusion were optimized by using correlation and gradient-boosting decision tree (GBDT)-based algorithm with incremental feature selection (IFS) method. Then, these optimized features were inserted into 1D convolutional neural network (CNN) to classify 4mC sites from non-4mC sites in Geobacter pickeringii. The performance of the anticipated model on independent data exhibited an accuracy of 0.868, which was 4.2% higher than the existing model.


Subject(s)
Computational Biology/methods , Epigenesis, Genetic/genetics , Geobacter/genetics , Algorithms , Cytosine/metabolism , DNA/genetics , DNA Methylation/genetics , Deep Learning , Machine Learning , Mutation/genetics , Neural Networks, Computer , Software
5.
Methods ; 203: 558-563, 2022 07.
Article in English | MEDLINE | ID: mdl-34352373

ABSTRACT

N4-methylcytosine (4mC) is a type of DNA modification which could regulate several biological progressions such as transcription regulation, replication and gene expressions. Precisely recognizing 4mC sites in genomic sequences can provide specific knowledge about their genetic roles. This study aimed to develop a deep learning-based model to predict 4mC sites in the Escherichia coli. In the model, DNA sequences were encoded by word embedding technique 'word2vec'. The obtained features were inputted into 1-D convolutional neural network (CNN) to discriminate 4mC sites from non-4mC sites in Escherichia coli genome. The examination on independent dataset showed that our model could yield the overall accuracy of 0.861, which was about 4.3% higher than the existing model. To provide convenience to scholars, we provided the data and source code of the model which can be freely download from https://github.com/linDing-groups/Deep-4mCW2V.


Subject(s)
DNA , Escherichia coli , DNA/genetics , Escherichia coli/genetics , Genome , Genomics , Software
6.
Med Phys ; 48(12): 7891-7899, 2021 Dec.
Article in English | MEDLINE | ID: mdl-34669994

ABSTRACT

PURPOSE: This study aimed to explore the predictive ability of deep learning (DL) for the common epidermal growth factor receptor (EGFR) mutation subtypes in patients with lung adenocarcinoma. METHODS: A total of 665 patients with lung adenocarcinoma (528/137) were recruited from two different institutions. In the training set, an 18-layer convolutional neural network (CNN) and fivefold cross-validation strategy were used to establish a CNN model. Subsequently, an independent external validation cohort from the other institution was used to evaluate the predictive efficacy of the CNN model. Grad-weighted class activation mapping (Grad-CAM) technology was used for the visual interpretation of the CNN model. In addition, this study also compared the prediction abilities of the radiomics and CNN models. Receiver operating characteristic (ROC) curves, accuracy and precision values, and recall and F1-score were used to evaluate the effectiveness of the CNN model and compare its performance with that of the radiomics model. RESULTS: In the validation set, the micro- and macroaverage values of the area under the ROC curve of the CNN model to identify the three EGFR subtypes were 0.78 and 0.79, respectively. All evaluation indicators of the CNN model were better than those of the radiomics model. CONCLUSIONS: Our study confirmed the potential of DL for predicting the EGFR mutation status in lung adenocarcinoma. The imaging phenotypes of the three mutation subtypes were found to be different, which can provide a basis for choosing more accurate and personalized treatment in patients with lung adenocarcinoma.


Subject(s)
Adenocarcinoma of Lung , Deep Learning , Lung Neoplasms , Adenocarcinoma of Lung/diagnostic imaging , Adenocarcinoma of Lung/genetics , ErbB Receptors/genetics , Humans , Lung Neoplasms/diagnostic imaging , Lung Neoplasms/genetics , Mutation , Retrospective Studies
7.
Comput Struct Biotechnol J ; 19: 4123-4131, 2021.
Article in English | MEDLINE | ID: mdl-34527186

ABSTRACT

Cyclin proteins are capable to regulate the cell cycle by forming a complex with cyclin-dependent kinases to activate cell cycle. Correct recognition of cyclin proteins could provide key clues for studying their functions. However, their sequences share low similarity, which results in poor prediction for sequence similarity-based methods. Thus, it is urgent to construct a machine learning model to identify cyclin proteins. This study aimed to develop a computational model to discriminate cyclin proteins from non-cyclin proteins. In our model, protein sequences were encoded by seven kinds of features that are amino acid composition, composition of k-spaced amino acid pairs, tri peptide composition, pseudo amino acid composition, geary correlation, normalized moreau-broto autocorrelation and composition/transition/distribution. Afterward, these features were optimized by using analysis of variance (ANOVA) and minimum redundancy maximum relevance (mRMR) with incremental feature selection (IFS) technique. A gradient boost decision tree (GBDT) classifier was trained on the optimal features. Five-fold cross-validated results showed that our model would identify cyclins with an accuracy of 93.06% and AUC value of 0.971, which are higher than the two recent studies on the same data.

8.
Brief Bioinform ; 22(5)2021 09 02.
Article in English | MEDLINE | ID: mdl-33751027

ABSTRACT

DNase I hypersensitive site (DHS) refers to the hypersensitive region of chromatin for the DNase I enzyme. It is an important part of the noncoding region and contains a variety of regulatory elements, such as promoter, enhancer, and transcription factor-binding site, etc. Moreover, the related locus of disease (or trait) are usually enriched in the DHS regions. Therefore, the detection of DHS region is of great significance. In this study, we develop a deep learning-based algorithm to identify whether an unknown sequence region would be potential DHS. The proposed method showed high prediction performance on both training datasets and independent datasets in different cell types and developmental stages, demonstrating that the method has excellent superiority in the identification of DHSs. Furthermore, for the convenience of related wet-experimental researchers, the user-friendly web-server iDHS-Deep was established at http://lin-group.cn/server/iDHS-Deep/, by which users can easily distinguish DHS and non-DHS and obtain the corresponding developmental stage ofDHS.


Subject(s)
Arabidopsis/genetics , DNA/genetics , Deep Learning , Deoxyribonuclease I/genetics , Oryza/genetics , Software , Arabidopsis/metabolism , Chromatin/metabolism , Chromatin/ultrastructure , DNA/chemistry , DNA/metabolism , Datasets as Topic , Deoxyribonuclease I/metabolism , Enhancer Elements, Genetic , Genetic Loci , Humans , Internet , Oryza/metabolism , Promoter Regions, Genetic , Protein Binding , Transcription Factors/genetics , Transcription Factors/metabolism , Transcription, Genetic
9.
Comput Math Methods Med ; 2021: 6664362, 2021.
Article in English | MEDLINE | ID: mdl-33505515

ABSTRACT

Bioluminescent proteins (BLPs) are a class of proteins that widely distributed in many living organisms with various mechanisms of light emission including bioluminescence and chemiluminescence from luminous organisms. Bioluminescence has been commonly used in various analytical research methods of cellular processes, such as gene expression analysis, drug discovery, cellular imaging, and toxicity determination. However, the identification of bioluminescent proteins is challenging as they share poor sequence similarities among them. In this paper, we briefly reviewed the development of the computational identification of BLPs and subsequently proposed a novel predicting framework for identifying BLPs based on eXtreme gradient boosting algorithm (XGBoost) and using sequence-derived features. To train the models, we collected BLP data from bacteria, eukaryote, and archaea. Then, for getting more effective prediction models, we examined the performances of different feature extraction methods and their combinations as well as classification algorithms. Finally, based on the optimal model, a novel predictor named iBLP was constructed to identify BLPs. The robustness of iBLP has been proved by experiments on training and independent datasets. Comparison with other published method further demonstrated that the proposed method is powerful and could provide good performance for BLP identification. The webserver and software package for BLP identification are freely available at http://lin-group.cn/server/iBLP.


Subject(s)
Algorithms , Luminescent Proteins , Amino Acid Sequence , Chemical Phenomena , Computational Biology , Databases, Protein , Drug Discovery , Luminescence , Luminescent Proteins/chemistry , Luminescent Proteins/genetics , Luminescent Proteins/metabolism , Machine Learning , Software
10.
Med Phys ; 47(8): 3458-3466, 2020 Aug.
Article in English | MEDLINE | ID: mdl-32416013

ABSTRACT

PURPOSE: To investigate the use of radiomics in the in-depth identification of epidermal growth factor receptor (EGFR) mutation status in patients with lung adenocarcinoma. METHODS: Computed tomography images of 438 patients with lung adenocarcinoma were collected in two different institutions, and 496 radiomic features were extracted. In the training set, lasso logistic regression was used to establish radiomic signatures. Combining radiomic index and clinical features, five machine learning methods, and a tenfold cross-validation strategy were used to establish combined models for EGFR+ vs EGFR- , and 19Del vs L858R, groups. The predictive power of the models was then evaluated using an independent external validation cohort. RESULTS: In the EGFR+ vs EGFR- and 19Del vs L858R groups, radiomic signatures consisting of 12 and 7 radiomic features were established, respectively; the area under the curves (AUCs) of the lasso logistic regression model on the validation set was 0.76 and 0.71, respectively. After inclusion of the clinical features, the maximum AUC of combined models on the validation set was 0.79 and 0.74, respectively. Logistic regression analysis showed good performance in the two groups, with AUCs of 0.79 and 0.71 on the validation set. Additionally, the AUC of combined models in the EGFR+ vs EGFR- group was higher than that of the 19Del vs L858R group. CONCLUSIONS: Our study shows the potential of radiomics to predict EGFR mutation status. There are imaging phenotypic differences between EGFR+ and EGFR- , and between 19Del and L858R; these can be used to allow patients with lung adenocarcinoma to choose more appropriate and personalized treatment options.


Subject(s)
Adenocarcinoma of Lung , ErbB Receptors , Lung Neoplasms , Adenocarcinoma of Lung/diagnostic imaging , Adenocarcinoma of Lung/genetics , ErbB Receptors/genetics , Humans , Lung Neoplasms/diagnostic imaging , Lung Neoplasms/genetics , Machine Learning , Mutation , Retrospective Studies
11.
Comb Chem High Throughput Screen ; 23(6): 527-535, 2020.
Article in English | MEDLINE | ID: mdl-32334499

ABSTRACT

BACKGROUND: RNA methylation is a reversible post-transcriptional modification involving numerous biological processes. Ribose 2'-O-methylation is part of RNA methylation. It has shown that ribose 2'-O-methylation plays an important role in immune recognition and other pathogenesis. OBJECTIVE: We aim to design a computational method to identify 2'-O-methylation. METHODS: Different from the experimental method, we propose a computational workflow to identify the methylation site based on the multi-feature extracting algorithm. RESULTS: With a voting procedure based on 7 best feature-classifier combinations, we achieved Accuracy of 76.5% in 10-fold cross-validation. Furthermore, we optimized features and input the optimized features into SVM. As a result, the AUC reached to 0.813. CONCLUSION: The RNA sample, especially the negative samples, used in this study are more objective and strict, so we obtained more representative results than state-of-arts studies.


Subject(s)
Computational Biology , Machine Learning , RNA/metabolism , Methylation , RNA/chemistry
SELECTION OF CITATIONS
SEARCH DETAIL
...