Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
1.
Article in Spanish | LILACS-Express | LILACS | ID: biblio-1536159

ABSTRACT

En este trabajo consideramos 148 semioquímicos reportados para la familia Scarabaeidae, cuya estructura química fue caracterizada empleando un conjunto de 200 descriptores moleculares de cinco clases distintas. La selección de los descriptores más discriminantes se realizó con tres técnicas: análisis de componentes principales, por cada clase de descriptores, bosques aleatorios y Boruta-Shap, aplicados al total de descriptores. A pesar de que las tres técnicas son conceptualmente diferentes, seleccionan un número de descriptores similar de cada clase. Propusimos una combinación de técnicas de aprendizaje de máquina para buscar un patrón estructural en el conjunto de semioquímicos y posteriormente realizar la clasificación de estos. El patrón se estableció a partir de la alta pertenencia de un subconjunto de estos metabolitos a los grupos que fueron obtenidos por un método de agrupamiento basado en lógica difusa, C-means; el patrón descubierto corresponde a las rutas biosintéticas por las cuales se obtienen biológicamente. Esta primera clasificación se corroboró con el empleo de mapas autoorganizados de Kohonen. Para clasificar aquellos semioquímicos cuya pertenencia a una ruta no quedaba claramente definida, construimos dos modelos de perceptrones multicapa, los cuales tuvieron un desempeño aceptable.


In this work we consider 148 semiochemicals reported for the family Scarabaeidae, whose chemical structure was characterized using a set of 200 molecular descriptors from five different classes. The selection of the most discriminating descriptors was carried out with three different techniques: Principal Component Analysis, for each class of descriptors, Random Forests and Boruta-Shap, applied to the total of descriptors. Although the three techniques are conceptually different, they select a similar number of descriptors from each class. We proposed a combination of machine learning techniques to search for a structural pattern in the set of semiochemicals and then perform their classification. The pattern was established from the high belonging of a subset of these metabolites to the groups that were obtained by a grouping method based on fuzzy C-means logic; the discovered pattern corresponds to the biosynthetic pathway by which they are obtained biologically. This first classification was corroborated with Kohonen's self-organizing maps. To classify those semiochemicals whose belonging to a biosynthetic pathway was not clearly defined, we built two models of Multilayer Perceptrons which had an acceptable performance.


Neste trabalho consideramos 148 semioquímicos reportados para a família Scarabaeidae, cuja estrutura química foi caracterizada usando um conjunto de 200 descritores moleculares de 5 classes diferentes. A seleção dos descritores mais discriminantes foi realizada com três técnicas diferentes: Análise de Componentes Principais, para cada classe de descritores, Florestas Aleatórias e Boruta-Shap, aplicadas a todos os descritores. Embora as três técnicas sejam conceitualmente diferentes, elas selecionaram um número semelhante de descritores de cada classe. Nós propusemos uma combinação de técnicas de aprendizado de máquina para buscar um padrão estrutural no conjunto de semioquímicos e então realizar sua classificação. O padrão foi estabelecido a partir da alta pertinência de um subconjunto desses metabólitos aos grupos que foram obtidos por um método de agrupamento baseado em lógica fuzzy, C-means; o padrão descoberto corresponde às rotas biossintéticas pelas quais eles são obtidos biologicamente. Essa primeira classificação foi corroborada com o uso dos mapas auto-organizados de Kohonen. Para classificar os semioquímicos cuja pertença a uma rota não foi claramente definida, construímos dois modelos de Perceptrons Multicamadas que tiveram um desempenho aceitável.

2.
Chinese Critical Care Medicine ; (12): 1071-1076, 2017.
Article in Chinese | WPRIM | ID: wpr-663347

ABSTRACT

Objective To establish a severe sepsis/septic shock prognosis prediction model based on randomize forest law (RF model), and to evaluate the prognostic value of this model for patients with severe sepsis/septic shock. Methods 497 patients with severe sepsis/septic shock admitted to intensive care unit (ICU) of Zhejiang Hospital from September 2013 to May 2017 were enrolled. The basic data, vital signs and symptoms, biochemical indexes and blood routine indexes on the 1st, 3rd, 5th day and prognosis were collected. According to the 28-day prognosis, the patients were divided into death group and survival group, and the specific indicators about the prognosis of severe sepsis/septic shock were screened. A RF model was constructed by using the specificity indicators. The assessment effectiveness of RF model, sequential organ failure assessment (SOFA), acute physiology and chronic health evaluation Ⅱ(APACHE Ⅱ) were evaluated by receiver operating characteristic (ROC) curve analysis. Results In 497 cases of severe sepsis/septic shock, 201 cases died, 28-day mortality was 40.4%. ① According to the index difference of death group and survival group, 19 specific parameters of the RF model were selected, which included the age; 24-hour urine output, urea nitrogen (BUN), serum creatinine (SCr), platelet count (PLT) on the 1st day; heart rate (HR), mean arterial pressure (MAP), cyanosis and clammy skin on the 3rd day; temperature, HR, MAP, 24-hour urine output, PLT, fever, cyanosis, dyspneic, clammy skin, piebald on the 5th day. ② ROC curve analysis showed that the area under the ROC curve (AUC) of RF model predicting 28-day mortality was higher than that of SOFA and APACHE Ⅱ score on the 1st, 3rd, 5th day (AUC: 0.836 vs. 0.643, 0.554, 0.766 and 0.590, 0.670, 0.758). The sensitivity of RF model to predict the 28-day mortality was 86.1%, the specificity was 77.0%, the accuracy was 80.7%. Conclusion The evaluation model based on random forest can effectively predict the death risk of 28-day in patients with severe sepsis/septic shock, and its predictive efficiency is better than that of the SOFA and APACHE Ⅱ score.

3.
Chinese Journal of Biochemical Pharmaceutics ; (6): 5-8, 2016.
Article in Chinese | WPRIM | ID: wpr-486528

ABSTRACT

Objective To screen the genes most relevant to lymph node metastasis of cervical cancer and identify the genes at the key knots of the regulatory network to provide the potential targets for cervical cancer intervention.Methods The transcriptional profiling database of TCGA was used, and random forests algorithm was adopted to rank the genes related to lymph node metastasis extracted from GeneCards database.STRING and Cytospace tolls were used to build the interactive regulatory network and identify the most weighted genes localized in the central of the network.DAVID platform was used to perform a functional annotation for the whole geneset.Results We ranked 2784 genes in respect to their potential contributions to lymph node metastasis of cervical cancer and identified the genes at the key knob.The genes related to cancer metastasis were enriched to cytokines pathway, MAPK pathway, wnt pathway, intercellular interaction, adhesive conjunction, cellular skeleton regulation, etc.Some of the identified key genes, like EGFR, NOTCH1, RHOA, etc. have been verified to be closely related cervical cancer metastasis in the basic and clinical research. Conclusion Random forests algorithm is useful, taking advantages of TCGA database, in enriching the genes playing significant role in cervical cancer metastasis.A majority of the genes in the analyzed geneset were indicated to be significantly correlated with lymph node metastasis.

4.
J Biosci ; 2015 Oct; 40(4): 731-740
Article in English | IMSEAR | ID: sea-181456

ABSTRACT

Use of computational methods to predict gene regulatory networks (GRNs) from gene expression data is a challenging task. Many studies have been conducted using unsupervised methods to fulfill the task; however, such methods usually yield low prediction accuracies due to the lack of training data. In this article, we propose semi-supervised methods for GRN prediction by utilizing two machine learning algorithms, namely, support vector machines (SVM) and random forests (RF). The semi-supervised methods make use of unlabelled data for training. We investigated inductive and transductive learning approaches, both of which adopt an iterative procedure to obtain reliable negative training data from the unlabelled data. We then applied our semi-supervised methods to gene expression data of Escherichia coli and Saccharomyces cerevisiae, and evaluated the performance of our methods using the expression data. Our analysis indicated that the transductive learning approach outperformed the inductive learning approach for both organisms. However, there was no conclusive difference identified in the performance of SVM and RF. Experimental results also showed that the proposed semi-supervised methods performed better than existing supervised methods for both organisms.

5.
World Science and Technology-Modernization of Traditional Chinese Medicine ; (12): 1876-1881, 2013.
Article in Chinese | WPRIM | ID: wpr-440233

ABSTRACT

This study was aimed to apply the electronic nose (E-nose) in the research of traditional Chinese medicine (TCM). The discussion was made on difficulties of using E-nose. The solution plan was proposed and the discrimination model was established. It provided a simple, rapid and effective analysi method in the identification of TCM. It also provided new ideas for the research and application of gas sensor arrays. E-nose was used in the ex-traction of TCM scent characteristics. Based on ion mobility spectrometry of MOS sensor, the fingerprint of TCM scent was established. The maximum response value of the sensor was used as analysis index. According to the diffi-culties of identification, two solution plans were proposed. Firstly, different detectors were employed to complete the classification. Secondly, radial basis function (RBF) and random forests (RF) were combined and then a cascade classifier was constructed in order to achieve the maximum of information obtained in conditions where the number of measurements, metal oxide semiconductor sensors in E-nose was limited. The results showed that both plans were accurate and practical with relatively high upper correct judge rate and better cross-validation (The highest upper correct judge rates were 95% and 100%, 96% and 80%, respectively). It was concluded that this study firstly ap-plied cascade classifier in the establishment of TCM identification by E-nose. With limited amount of sensors, the maximum information was received through data mining. Using E-nose in the identification of TCM was rapid and accurate. The established pattern recognition method was maneuverable with accurate identification rate and stability compared to conventional sensory identification method. It provided a simple and rapid analysis method for the iden-tification of TCM.

6.
Genomics & Informatics ; : 168-173, 2007.
Article in English | WPRIM | ID: wpr-21118

ABSTRACT

In previous nuclear genomic association studies, Random Forests (RF), one of several up-to-date machine learning methods, has been used successfully to generate evidence of association of genetic polymorphisms with diseases or other phenotypes. Compared with traditional statistical analytic methods, such as chi-square tests or logistic regression models, the RF method has advantages in handling large numbers of predictor variables and examining gene-gene interactions without a specific model. Here, we applied the RF method to find the association between mitochondrial single nucleotide polymorphisms (mtSNPs) and diabetes risk. The results from a chi-square test validated the usage of RF for association studies using mtDNA. Indexes of important variables such as the Gini index and mean decrease in accuracy index performed well compared with chi-square tests in favor of finding mtSNPs associated with a real disease example, type 2 diabetes.


Subject(s)
DNA, Mitochondrial , Logistic Models , Phenotype , Polymorphism, Genetic , Polymorphism, Single Nucleotide , Machine Learning
SELECTION OF CITATIONS
SEARCH DETAIL