Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
bioRxiv ; 2024 Jan 21.
Artigo em Inglês | MEDLINE | ID: mdl-38293135

RESUMO

Dimensionality reduction-based data visualization is pivotal in comprehending complex biological data. The most common methods, such as PHATE, t-SNE, and UMAP, are unsupervised and therefore reflect the dominant structure in the data, which may be independent of expert-provided labels. Here we introduce a supervised data visualization method called RF-PHATE, which integrates expert knowledge for further exploration of the data. RF-PHATE leverages random forests to capture intricate featurelabel relationships. Extracting information from the forest, RF-PHATE generates low-dimensional visualizations that highlight relevant data relationships while disregarding extraneous features. This approach scales to large datasets and applies to classification and regression. We illustrate RF-PHATE's prowess through three case studies. In a multiple sclerosis study using longitudinal clinical and imaging data, RF-PHATE unveils a sub-group of patients with non-benign relapsingremitting Multiple Sclerosis, demonstrating its aptitude for time-series data. In the context of Raman spectral data, RF-PHATE effectively showcases the impact of antioxidants on diesel exhaust-exposed lung cells, highlighting its proficiency in noisy environments. Furthermore, RF-PHATE aligns established geometric structures with COVID-19 patient outcomes, enriching interpretability in a hierarchical manner. RF-PHATE bridges expert insights and visualizations, promising knowledge generation. Its adaptability, scalability, and noise tolerance underscore its potential for widespread adoption.

2.
IEEE Trans Pattern Anal Mach Intell ; 45(9): 10947-10959, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-37015125

RESUMO

Random forests are considered one of the best out-of-the-box classification and regression algorithms due to their high level of predictive performance with relatively little tuning. Pairwise proximities can be computed from a trained random forest and measure the similarity between data points relative to the supervised task. Random forest proximities have been used in many applications including the identification of variable importance, data imputation, outlier detection, and data visualization. However, existing definitions of random forest proximities do not accurately reflect the data geometry learned by the random forest. In this paper, we introduce a novel definition of random forest proximities called Random Forest-Geometry- and Accuracy-Preserving proximities (RF-GAP). We prove that the proximity-weighted sum (regression) or majority vote (classification) using RF-GAP exactly matches the out-of-bag random forest prediction, thus capturing the data geometry learned by the random forest. We empirically show that this improved geometric representation outperforms traditional random forest proximities in tasks such as data imputation and provides outlier detection and visualization results consistent with the learned data geometry.

3.
Am J Clin Nutr ; 98(5): 1263-71, 2013 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-24047922

RESUMO

BACKGROUND: Healthy dietary patterns may protect against age-related cognitive decline, but results of studies have been inconsistent. OBJECTIVE: We examined associations between Dietary Approaches to Stop Hypertension (DASH)- and Mediterranean-style dietary patterns and age-related cognitive change in a prospective, population-based study. DESIGN: Participants included 3831 men and women ≥65 y of age who were residents of Cache County, UT, in 1995. Cognitive function was assessed by using the Modified Mini-Mental State Examination (3MS) ≤4 times over 11 y. Diet-adherence scores were computed by summing across the energy-adjusted rank-order of individual food and nutrient components and categorizing participants into quintiles of the distribution of the diet accordance score. Mixed-effects repeated-measures models were used to examine 3MS scores over time across increasing quintiles of dietary accordance scores and individual food components that comprised each score. RESULTS: The range of rank-order DASH and Mediterranean diet scores was 1661-25,596 and 2407-26,947, respectively. Higher DASH and Mediterranean diet scores were associated with higher average 3MS scores. People in quintile 5 of DASH averaged 0.97 points higher than those in quintile 1 (P = 0.001). The corresponding difference for Mediterranean quintiles was 0.94 (P = 0.001). These differences were consistent over 11 y. Higher intakes of whole grains and nuts and legumes were also associated with higher average 3MS scores [mean quintile 5 compared with 1 differences: 1.19 (P < 0.001), 1.22 (P < 0.001), respectively]. CONCLUSIONS: Higher levels of accordance with both the DASH and Mediterranean dietary patterns were associated with consistently higher levels of cognitive function in elderly men and women over an 11-y period. Whole grains and nuts and legumes were positively associated with higher cognitive functions and may be core neuroprotective foods common to various healthy plant-centered diets around the globe.


Assuntos
Envelhecimento/fisiologia , Demência/epidemiologia , Dieta Mediterrânea , Comportamento Alimentar , Hipertensão/prevenção & controle , Memória/fisiologia , Idoso , Idoso de 80 Anos ou mais , Cognição/fisiologia , Demência/dietoterapia , Grão Comestível/química , Ingestão de Energia , Feminino , Humanos , Masculino , Nozes/química , Prevalência , Estudos Prospectivos , Inquéritos e Questionários
4.
BMC Genet ; 11: 49, 2010 Jun 14.
Artigo em Inglês | MEDLINE | ID: mdl-20546594

RESUMO

BACKGROUND: As computational power improves, the application of more advanced machine learning techniques to the analysis of large genome-wide association (GWA) datasets becomes possible. While most traditional statistical methods can only elucidate main effects of genetic variants on risk for disease, certain machine learning approaches are particularly suited to discover higher order and non-linear effects. One such approach is the Random Forests (RF) algorithm. The use of RF for SNP discovery related to human disease has grown in recent years; however, most work has focused on small datasets or simulation studies which are limited. RESULTS: Using a multiple sclerosis (MS) case-control dataset comprised of 300 K SNP genotypes across the genome, we outline an approach and some considerations for optimally tuning the RF algorithm based on the empirical dataset. Importantly, results show that typical default parameter values are not appropriate for large GWA datasets. Furthermore, gains can be made by sub-sampling the data, pruning based on linkage disequilibrium (LD), and removing strong effects from RF analyses. The new RF results are compared to findings from the original MS GWA study and demonstrate overlap. In addition, four new interesting candidate MS genes are identified, MPHOSPH9, CTNNA3, PHACTR2 and IL7, by RF analysis and warrant further follow-up in independent studies. CONCLUSIONS: This study presents one of the first illustrations of successfully analyzing GWA data with a machine learning algorithm. It is shown that RF is computationally feasible for GWA data and the results obtained make biologic sense based on previous studies. More importantly, new genes were identified as potentially being associated with MS, suggesting new avenues of investigation for this complex disease.


Assuntos
Algoritmos , Inteligência Artificial , Estudo de Associação Genômica Ampla/métodos , Biologia Computacional , Estudos de Viabilidade , Predisposição Genética para Doença , Genótipo , Humanos , Esclerose Múltipla/genética , Polimorfismo de Nucleotídeo Único
5.
Ecology ; 88(11): 2783-92, 2007 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-18051647

RESUMO

Classification procedures are some of the most widely used statistical methods in ecology. Random forests (RF) is a new and powerful statistical classifier that is well established in other disciplines but is relatively unknown in ecology. Advantages of RF compared to other statistical classifiers include (1) very high classification accuracy; (2) a novel method of determining variable importance; (3) ability to model complex interactions among predictor variables; (4) flexibility to perform several types of statistical data analysis, including regression, classification, survival analysis, and unsupervised learning; and (5) an algorithm for imputing missing values. We compared the accuracies of RF and four other commonly used statistical classifiers using data on invasive plant species presence in Lava Beds National Monument, California, USA, rare lichen species presence in the Pacific Northwest, USA, and nest sites for cavity nesting birds in the Uinta Mountains, Utah, USA. We observed high classification accuracy in all applications as measured by cross-validation and, in the case of the lichen data, by independent test data, when comparing RF to other common classification methods. We also observed that the variables that RF identified as most important for classifying invasive plant species coincided with expectations based on the literature.


Assuntos
Interpretação Estatística de Dados , Ecologia/métodos , Ecossistema , Modelos Estatísticos , Modelos Teóricos , Algoritmos , Animais , Aves/crescimento & desenvolvimento , Demografia , Modelos Logísticos , Densidade Demográfica , Dinâmica Populacional , Especificidade da Espécie , Árvores/crescimento & desenvolvimento
6.
Methods Enzymol ; 411: 422-32, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-16939804

RESUMO

Random Forests is a powerful multipurpose tool for predicting and understanding data. If gene expression data come from known groups or classes (e.g., tumor patients and controls), Random Forests can rank the genes in terms of their usefulness in separating the groups. When the groups are unknown, Random Forests uses an intrinsic measure of the similarity of the genes to extract useful multivariate structure, including clusters. This chapter summarizes the Random Forests methodology and illustrates its use on freely available data sets.


Assuntos
Análise de Sequência com Séries de Oligonucleotídeos/métodos , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos , Terminologia como Assunto , Animais , Interpretação Estatística de Dados , Humanos
7.
Hum Immunol ; 67(4-5): 346-51, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-16720216

RESUMO

Previous research has revealed associations between autism and immune genes located in the human leukocyte antigen (HLA). To better understand which HLA genetic loci may be associated with autism, we compared the class I HLA-A and -B alleles in autistic probands with case control subjects from Caucasian families. The frequency of HLA-A2 alleles was significantly increased in autistic subjects compared with normal allelic frequencies from the National Marrow Donors Program (NMDP) (p = 0.0043 after allelic correction). The transmission disequilibrium test for the A2 allele revealed an increased frequency of inheritance for autistic children (p = 0.033). There were no significant associations of autism with HLA-B alleles; however, the A2-B44 and A2-B51 haplotypes were two times more frequent in autistic subjects. The association and linkage of the class I HLA-A2 allele with autism suggests its involvement in the etiology of autism. Possible roles are discussed for the HLA-A2 association in the presentation of microbial antigen within the central nervous system and/or in the establishment of synaptic and neuronal circuits in the developing brain.


Assuntos
Transtorno Autístico/genética , Transtorno Autístico/imunologia , Antígeno HLA-A2/genética , Antígenos de Histocompatibilidade Classe I/genética , Alelos , Ligação Genética , Humanos
8.
Hum Immunol ; 66(2): 140-5, 2005 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-15694999

RESUMO

The objective of this study was to examine and attempt to confirm our previous findings of an increased frequency of the C4B null allele (C4BQ0) in subjects with autism. Newly identified subjects from Utah and Oregon were studied. Families evaluated included 85 who had a child with autism and 69 control families. Of the subjects with autism studied, 42.4% carried at least one C4BQ0, compared with 14.5% of the control subjects (p = 0.00013), with a relative risk of 4.33. Over half of the C4B null alleles in the subjects with autism involved C4A duplications. A marked increase in the ancestral haplotype 44.1 that lacks a C4B gene and has 2 C4A genes was also observed. The results of this study suggest that the human leukocyte antigen class III C4BQ0 significantly increases the risk for autism.


Assuntos
Transtorno Autístico/genética , Complemento C4b/genética , Frequência do Gene , Genótipo , Antígenos de Histocompatibilidade/genética , Humanos , Reação em Cadeia da Polimerase
9.
Appl Environ Microbiol ; 70(11): 6738-47, 2004 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-15528540

RESUMO

This report describes the use of an oligonucleotide macroarray to profile the expression of 375 genes in Lactococcus lactis subsp. lactis IL1403 during heat, acid, and osmotic stress. A set of known stress-associated genes in IL1403 was used as the internal control on the array. Every stress response was accurately detected using the macroarray, compared to data from previous reports. As a group, the expression patterns of the investigated metabolic genes were significantly altered by heat, acid, and osmotic stresses. Specifically, 13 to 18% of the investigated genes were differentially expressed in each of the environmental stress treatments. Interestingly, the methionine biosynthesis pathway genes (metA-metB1 and metB2-cysK) were induced during heat shock, but methionine utilization genes, such as metK, were induced during acid stress. These data provide a possible explanation for the differences between acid tolerance mechanisms of L. lactis strains IL1403 and MG1363 reported previously. Several groups of transcriptional responses were common among the stress treatments, such as repression of peptide transporter genes, including the opt operon (also known as dpp) and dtpT. Reduction of peptide transport due to environmental stress will have important implications in the cheese ripening process. Although stress responses in lactococci were extensively studied during the last decade, additional information about this bacterium was gained from the use of this metabolic array.


Assuntos
Proteínas de Bactérias/metabolismo , Perfilação da Expressão Gênica , Resposta ao Choque Térmico , Lactococcus lactis/crescimento & desenvolvimento , Análise de Sequência com Séries de Oligonucleotídeos , Proteínas de Bactérias/genética , Meios de Cultura , Regulação Bacteriana da Expressão Gênica , Temperatura Alta , Concentração de Íons de Hidrogênio , Lactococcus lactis/genética , Lactococcus lactis/metabolismo , Lactococcus lactis/fisiologia , Pressão Osmótica
10.
Hum Immunol ; 63(4): 311-6, 2002 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-12039413

RESUMO

We have evaluated possible contributions of HLA-DRB1 alleles to autism spectrum disorder (ASD) in 103 families of Caucasian descent. The DR4 allele occurred more often in probands than controls (0.007), whereas the DR13,14 alleles occurred less often in probands than controls (p = 0.003). The transmission disequilibrium test (TDT) indicated that the ASD probands inherited the DR4 allele more frequently than expected (p = 0.026) from the fathers. The TDT also revealed that fewer DR13 alleles than expected were inherited from the mother by ASD probands (p = 0.006). We conclude that the TDT results suggest that DR4 and DR13 are linked to ASD. Reasons for the parental inheritance of specific alleles are poorly understood but coincide with current genetic research noting possible parent-of-origin effects in autism.


Assuntos
Transtorno Autístico/genética , Antígenos HLA-DR/genética , Antígeno HLA-DR4/genética , Desequilíbrio de Ligação , Alelos , Transtorno Autístico/imunologia , Ligação Genética , Subtipos Sorológicos de HLA-DR , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...