Pesquisa | Portal Regional da BVS (teste)

A novel feature selection method to predict protein structural class.

Yuan, Mingshun; Yang, Zijiang; Huang, Guangzao; Ji, Guoli.

Comput Biol Chem ; 76: 118-129, 2018 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-29990791

RESUMO

Integrating various features from different protein properties helps to improve the prediction accuracy of protein structural class but need to deal with the corresponding integrated high-dimensional data. Thus, the feature selection process used to select the informative features from the integrated features also becomes an indispensable key step. This paper proposes a novel feature selection method, Partial-Maximum-Correlation-Information based Recursive Feature Elimination (PMCI-RFE), to quickly select the best feature subset from the integrated high-dimensional protein features set to improve the prediction performance of protein structural class. PMCI-RFE can also be used to find different types of informative features to further analyze some biological relationships. The proposed PMCI-RFE method uses the correlation information between the feature space and class encoding space to select informative features based on the idea of orthogonal component projection in the feature space. The experimental results on six widely used benchmark datasets show that PMCI-RFE is a fast and effective method compare to other four state-of-the-art feature selection methods, which indeed can make full use of different protein property information and improve the predictability of protein structural class.

Assuntos

Algoritmos , Modelos Químicos , Proteínas/química , Proteínas/classificação , Sequência de Aminoácidos , Conjuntos de Dados como Assunto , Conformação Proteica

Laser-Induced Breakdown Spectroscopy for Rapid Discrimination of Heavy-Metal-Contaminated Seafood Tegillarca granosa.

Ji, Guoli; Ye, Pengchao; Shi, Yijian; Yuan, Leiming; Chen, Xiaojing; Yuan, Mingshun; Zhu, Dehua; Chen, Xi; Hu, Xinyu; Jiang, Jing.

Sensors (Basel) ; 17(11)2017 Nov 17.

Artigo em Inglês | MEDLINE | ID: mdl-29149053

RESUMO

Tegillarca granosa samples contaminated artificially by three kinds of toxic heavy metals including zinc (Zn), cadmium (Cd), and lead (Pb) were attempted to be distinguished using laser-induced breakdown spectroscopy (LIBS) technology and pattern recognition methods in this study. The measured spectra were firstly processed by a wavelet transform algorithm (WTA), then the generated characteristic information was subsequently expressed by an information gain algorithm (IGA). As a result, 30 variables obtained were used as input variables for three classifiers: partial least square discriminant analysis (PLS-DA), support vector machine (SVM), and random forest (RF), among which the RF model exhibited the best performance, with 93.3% discrimination accuracy among those classifiers. Besides, the extracted characteristic information was used to reconstruct the original spectra by inverse WTA, and the corresponding attribution of the reconstructed spectra was then discussed. This work indicates that the healthy shellfish samples of Tegillarca granosa could be distinguished from the toxic heavy-metal-contaminated ones by pattern recognition analysis combined with LIBS technology, which only requires minimal pretreatments.

Assuntos

Análise de Alimentos/instrumentação , Análise de Alimentos/métodos , Lasers , Metais Pesados/análise , Alimentos Marinhos/análise , Análise Espectral , Análise dos Mínimos Quadrados

Integrating multiple fitting regression and Bayes decision for cancer diagnosis with transcriptomic data from tumor-educated blood platelets.

Huang, Guangzao; Yuan, Mingshun; Chen, Moliang; Li, Lei; You, Wenjie; Li, Hanjie; Cai, James J; Ji, Guoli.

Analyst ; 142(19): 3588-3597, 2017 Oct 07.

Artigo em Inglês | MEDLINE | ID: mdl-28853484

RESUMO

The application of machine learning in cancer diagnostics has shown great promise and is of importance in clinic settings. Here we consider applying machine learning methods to transcriptomic data derived from tumor-educated platelets (TEPs) from individuals with different types of cancer. We aim to define a reliability measure for diagnostic purposes to increase the potential for facilitating personalized treatments. To this end, we present a novel classification method called MFRB (for Multiple Fitting Regression and Bayes decision), which integrates the process of multiple fitting regression (MFR) with Bayes decision theory. MFR is first used to map multidimensional features of the transcriptomic data into a one-dimensional feature. The probability density function of each class in the mapped space is then adjusted using the Gaussian probability density function. Finally, the Bayes decision theory is used to build a probabilistic classifier with the estimated probability density functions. The output of MFRB can be used to determine which class a sample belongs to, as well as to assign a reliability measure for a given class. The classical support vector machine (SVM) and probabilistic SVM (PSVM) are used to evaluate the performance of the proposed method with simulated and real TEP datasets. Our results indicate that the proposed MFRB method achieves the best performance compared to SVM and PSVM, mainly due to its strong generalization ability for limited, imbalanced, and noisy data.

Assuntos

Teorema de Bayes , Plaquetas/metabolismo , Neoplasias/diagnóstico , Máquina de Vetores de Suporte , Transcriptoma , Algoritmos , Humanos , Reprodutibilidade dos Testes

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA