Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
1.
Artif Intell Rev ; 55(3): 1803-1820, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35370341

RESUMO

There exist a variety of distance measures which operate on time series kernels. The objective of this article is to compare those distance measures in a support vector machine setting. A support vector machine is a state-of-the-art classifier for static (non-time series) datasets and usually outperforms k-Nearest Neighbour, however it is often noted that that 1-NN DTW is a robust baseline for time-series classification. Through a collection of experiments we determine that the most effective distance measure is Dynamic Time Warping and the most effective classifier is kNN. However, a surprising result is that the pairing of kNN and DTW is not the most effective model. Instead we have discovered via experimentation that Dynamic Time Warping paired with the Gaussian Support Vector Machine is the most accurate time series classifier. Finally, with good reason we recommend a slightly inferior (in terms of accuracy) model Time Warp Edit Distance paired with the Gaussian Support Vector Machine as it has a better theoretical basis. We also discuss the reduction in computational cost achieved by using a Support Vector Machine, finding that the Negative Kernel paired with the Dynamic Time Warping distance produces the greatest reduction in computational cost.

2.
Pediatrics ; 136(5): e1228-36, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26482666

RESUMO

BACKGROUND AND OBJECTIVES: Central apnea complicates, and may be the presenting complaint in, bronchiolitis. Our objective was to prospectively derive candidate clinical decision rules (CDRs) to identify infants in the emergency department (ED) who are at risk for central apnea. METHODS: We conducted a prospective observational study over 8 years. The primary outcome was central apnea subsequent to the initial ED visit. Infants were enrolled if they presented with central apnea or bronchiolitis. We excluded infants with obstructive apnea, neonatal jaundice, trauma, or suspected sepsis. We developed 3 candidate CDRs by using 3 techniques: (1) Poisson regression clustered on the individual, (2) classification and regression tree analysis (CART), and (3) a random forest (RF). RESULTS: We analyzed 990 ED visits for 892 infants. Central apnea subsequently occurred in the hospital in 41 (5%) patients. Parental report of apnea, previous history of apnea, congenital heart disease, birth weight ≤2.5 kg, lower weight, and age ≤6 weeks all identified a group at high risk for subsequent central apnea. All CDRs and RFs were 100% sensitive (95% confidence interval [CI] 91%-100%) and had a negative predictive value of 100% (95% CI 99%-100%) for the subsequent apnea. Specificity ranged from 61% to 65% (95% CI 58%-68%) for CDRs based on Poisson models; 65% to 77% (95% CI 62%-90%) for CART; and 81% to 91% (95% CI 78%-92%) for RF models. CONCLUSIONS: All candidate CDRs had a negative predictive value of 100% for subsequent central apnea.


Assuntos
Tomada de Decisão Clínica/métodos , Apneia do Sono Tipo Central/diagnóstico , Bronquiolite/epidemiologia , Comorbidade , Feminino , Humanos , Lactente , Masculino , Valor Preditivo dos Testes , Estudos Prospectivos , Curva ROC , Apneia do Sono Tipo Central/epidemiologia
3.
Genome Biol Evol ; 5(6): 1049-59, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23661563

RESUMO

In the budding yeast Saccharomyces cerevisiae, the subunits of any given protein complex are either mostly essential or mostly nonessential, suggesting that essentiality is a property of molecular machines rather than individual components. There are exceptions to this rule, however, that is, nonessential genes in largely essential complexes and essential genes in largely nonessential complexes. Here, we provide explanations for these exceptions, showing that redundancy within complexes, as revealed by genetic interactions, can explain many of the former cases, whereas "moonlighting," as revealed by membership of multiple complexes, can explain the latter. Surprisingly, we find that redundancy within complexes cannot usually be explained by gene duplication, suggesting alternate buffering mechanisms. In the distantly related Schizosaccharomyces pombe, we observe the same phenomenon of modular essentiality, suggesting that it may be a general feature of eukaryotes. Furthermore, we show that complexes flip essentiality in a cohesive fashion between the two species, that is, they tend to change from mostly essential to mostly nonessential, or vice versa, but not to mixed patterns. We show that these flips in essentiality can be explained by differing lifestyles of the two yeasts. Collectively, our results support a previously proposed model where proteins are essential because of their involvement in essential functional modules rather than because of specific topological features such as degree or centrality.


Assuntos
Mapas de Interação de Proteínas , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Proteínas de Schizosaccharomyces pombe/genética , Schizosaccharomyces/genética , Ontologia Genética , Genes Essenciais , Genes Fúngicos , Mapeamento de Interação de Proteínas , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Schizosaccharomyces/metabolismo , Proteínas de Schizosaccharomyces pombe/metabolismo
4.
Mol Cell ; 46(5): 691-704, 2012 Jun 08.
Artigo em Inglês | MEDLINE | ID: mdl-22681890

RESUMO

To date, cross-species comparisons of genetic interactomes have been restricted to small or functionally related gene sets, limiting our ability to infer evolutionary trends. To facilitate a more comprehensive analysis, we constructed a genome-scale epistasis map (E-MAP) for the fission yeast Schizosaccharomyces pombe, providing phenotypic signatures for ~60% of the nonessential genome. Using these signatures, we generated a catalog of 297 functional modules, and we assigned function to 144 previously uncharacterized genes, including mRNA splicing and DNA damage checkpoint factors. Comparison with an integrated genetic interactome from the budding yeast Saccharomyces cerevisiae revealed a hierarchical model for the evolution of genetic interactions, with conservation highest within protein complexes, lower within biological processes, and lowest between distinct biological processes. Despite the large evolutionary distance and extensive rewiring of individual interactions, both networks retain conserved features and display similar levels of functional crosstalk between biological processes, suggesting general design principles of genetic interactomes.


Assuntos
Epistasia Genética , Evolução Molecular , Genes Fúngicos , Saccharomyces cerevisiae/genética , Schizosaccharomyces/genética , Regulação Fúngica da Expressão Gênica , Redes Reguladoras de Genes , Genoma Fúngico , Saccharomyces cerevisiae/metabolismo , Schizosaccharomyces/metabolismo , Especificidade da Espécie
5.
Methods Mol Biol ; 781: 353-61, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21877290

RESUMO

Mapping epistatic (or genetic) interactions has emerged as an important network biology approach for establishing functional relationships among genes and proteins. Epistasis networks are complementary to physical protein interaction networks, providing valuable insight into both the function of individual genes and the overall wiring of the cell. A high-throughput method termed "epistatic mini array profiles" (E-MAPs) was recently developed in yeast to quantify alleviating or aggravating interactions between gene pairs. The typical output of an E-MAP experiment is a large symmetric matrix of interaction scores. One problem with this data is the large amount of missing values - interactions that cannot be measured during the high-throughput process or whose measurements were discarded due to quality filtering steps. These missing values can reduce the effectiveness of some data analysis techniques and prevent the use of others. Here, we discuss one solution to this problem, imputation using nearest neighbors, and give practical examples of the use of a freely available implementation of this method.


Assuntos
Epistasia Genética , Mapeamento de Interação de Proteínas , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Redes Reguladoras de Genes , Ensaios de Triagem em Larga Escala , Modelos Genéticos , Valor Preditivo dos Testes , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo
6.
BMC Syst Biol ; 5: 80, 2011 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-21605386

RESUMO

BACKGROUND: Epistatic Miniarray Profiling(E-MAP) quantifies the net effect on growth rate of disrupting pairs of genes, often producing phenotypes that may be more (negative epistasis) or less (positive epistasis) severe than the phenotype predicted based on single gene disruptions. Epistatic interactions are important for understanding cell biology because they define relationships between individual genes, and between sets of genes involved in biochemical pathways and protein complexes. Each E-MAP screen quantifies the interactions between a logically selected subset of genes (e.g. genes whose products share a common function). Interactions that occur between genes involved in different cellular processes are not as frequently measured, yet these interactions are important for providing an overview of cellular organization. RESULTS: We introduce a method for combining overlapping E-MAP screens and inferring new interactions between them. We use this method to infer with high confidence 2,240 new strongly epistatic interactions and 34,469 weakly epistatic or neutral interactions. We show that accuracy of the predicted interactions approaches that of replicate experiments and that, like measured interactions, they are enriched for features such as shared biochemical pathways and knockout phenotypes. We constructed an expanded epistasis map for yeast cell protein complexes and show that our new interactions increase the evidence for previously proposed inter-complex connections, and predict many new links. We validated a number of these in the laboratory, including new interactions linking the SWR-C chromatin modifying complex and the nuclear transport apparatus. CONCLUSION: Overall, our data support a modular model of yeast cell protein network organization and show how prediction methods can considerably extend the information that can be extracted from overlapping E-MAP screens.


Assuntos
Biologia Computacional/métodos , Epistasia Genética , Mapeamento de Interação de Proteínas/métodos , Proteínas de Saccharomyces cerevisiae/metabolismo , Cromatina/metabolismo , Ligação Proteica , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo
7.
Comput Med Imaging Graph ; 35(7-8): 629-45, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21269807

RESUMO

We present a tile-based approach for producing clinically relevant probability maps of prostatic carcinoma in histological sections from radical prostatectomy. Our methodology incorporates ensemble learning for feature selection and classification on expert-annotated images. Random forest feature selection performed over varying training sets provides a subset of generalized CIEL*a*b* co-occurrence texture features, while sample selection strategies with minimal constraints reduce training data requirements to achieve reliable results. Ensembles of classifiers are built using expert-annotated tiles from training images, and scores for the probability of cancer presence are calculated from the responses of each classifier in the ensemble. Spatial filtering of tile-based texture features prior to classification results in increased heat-map coherence as well as AUC values of 95% using ensembles of either random forests or support vector machines. Our approach is designed for adaptation to different imaging modalities, image features, and histological decision domains.


Assuntos
Cor , Técnicas Histológicas/métodos , Interpretação de Imagem Assistida por Computador , Neoplasias da Próstata/patologia , Algoritmos , Humanos , Masculino , Reconhecimento Automatizado de Padrão , Neoplasias da Próstata/diagnóstico
8.
BMC Genomics ; 11: 677, 2010 Nov 30.
Artigo em Inglês | MEDLINE | ID: mdl-21118509

RESUMO

BACKGROUND: The computational prediction of transcription start sites is an important unsolved problem. Some recent progress has been made, but many promoters, particularly those not associated with CpG islands, are still difficult to locate using current methods. These methods use different features and training sets, along with a variety of machine learning techniques and result in different prediction sets. RESULTS: We demonstrate the heterogeneity of current prediction sets, and take advantage of this heterogeneity to construct a two-level classifier ('Profisi Ensemble') using predictions from 7 programs, along with 2 other data sources. Support vector machines using 'full' and 'reduced' data sets are combined in an either/or approach. We achieve a 14% increase in performance over the current state-of-the-art, as benchmarked by a third-party tool. CONCLUSIONS: Supervised learning methods are a useful way to combine predictions from diverse sources.


Assuntos
Biologia Computacional/métodos , Software , Sítio de Iniciação de Transcrição , Pareamento de Bases/genética , Genoma Humano/genética , Humanos , Análise de Componente Principal , Regiões Promotoras Genéticas/genética
9.
BMC Bioinformatics ; 11: 197, 2010 Apr 20.
Artigo em Inglês | MEDLINE | ID: mdl-20406472

RESUMO

BACKGROUND: Epistatic miniarray profiling (E-MAPs) is a high-throughput approach capable of quantifying aggravating or alleviating genetic interactions between gene pairs. The datasets resulting from E-MAP experiments typically take the form of a symmetric pairwise matrix of interaction scores. These datasets have a significant number of missing values - up to 35% - that can reduce the effectiveness of some data analysis techniques and prevent the use of others. An effective method for imputing interactions would therefore increase the types of possible analysis, as well as increase the potential to identify novel functional interactions between gene pairs. Several methods have been developed to handle missing values in microarray data, but it is unclear how applicable these methods are to E-MAP data because of their pairwise nature and the significantly larger number of missing values. Here we evaluate four alternative imputation strategies, three local (Nearest neighbor-based) and one global (PCA-based), that have been modified to work with symmetric pairwise data. RESULTS: We identify different categories for the missing data based on their underlying cause, and show that values from the largest category can be imputed effectively. We compare local and global imputation approaches across a variety of distinct E-MAP datasets, showing that both are competitive and preferable to filling in with zeros. In addition we show that these methods are effective in an E-MAP from a different species, suggesting that pairwise imputation techniques will be increasingly useful as analogous epistasis mapping techniques are developed in different species. We show that strongly alleviating interactions are significantly more difficult to predict than strongly aggravating interactions. Finally we show that imputed interactions, generated using nearest neighbor methods, are enriched for annotations in the same manner as measured interactions. Therefore our method potentially expands the number of mapped epistatic interactions. In addition we make implementations of our algorithms available for use by other researchers. CONCLUSIONS: We address the problem of missing value imputation for E-MAPs, and suggest the use of symmetric nearest neighbor based approaches as they offer consistently accurate imputations across multiple datasets in a tractable manner.


Assuntos
Epistasia Genética , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Algoritmos , Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos
10.
Nucleic Acids Res ; 37(22): 7360-7, 2009 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-19820114

RESUMO

The accurate computational prediction of transcription start sites (TSS) in vertebrate genomes is a difficult problem. The physicochemical properties of DNA can be computed in various ways and a many combinations of DNA features have been tested in the past for use as predictors of transcription. We looked in detail at melting temperature, which measures the temperature, at which two strands of DNA separate, considering the cooperative nature of this process. We find that peaks in melting temperature correspond closely to experimentally determined transcription start sites in human and mouse chromosomes. Using melting temperature alone, and with simple thresholding, we can predict TSS with accuracy that is competitive with the most accurate state-of-the-art TSS prediction methods. Accuracy is measured using both experimentally and manually determined TSS. The method works especially well with CpG island containing promoters, but also works when CpG islands are absent. This result is clear evidence of the important role of the physical properties of DNA in the process of transcription. It also points to the importance for TSS prediction methods to include melting temperature as prior information.


Assuntos
Algoritmos , DNA/química , Temperatura , Sítio de Iniciação de Transcrição , Animais , Ilhas de CpG , Humanos , Camundongos , Desnaturação de Ácido Nucleico , Regiões Promotoras Genéticas
11.
BMC Bioinformatics ; 9: 470, 2008 Nov 05.
Artigo em Inglês | MEDLINE | ID: mdl-18986526

RESUMO

BACKGROUND: Metabolomics, or metabonomics, refers to the quantitative analysis of all metabolites present within a biological sample and is generally carried out using NMR spectroscopy or Mass Spectrometry. Such analysis produces a set of peaks, or features, indicative of the metabolic composition of the sample and may be used as a basis for sample classification. Feature selection may be employed to improve classification accuracy or aid model explanation by establishing a subset of class discriminating features. Factors such as experimental noise, choice of technique and threshold selection may adversely affect the set of selected features retrieved. Furthermore, the high dimensionality and multi-collinearity inherent within metabolomics data may exacerbate discrepancies between the set of features retrieved and those required to provide a complete explanation of metabolite signatures. Given these issues, the latter in particular, we present the MetaFIND application for 'post-feature selection' correlation analysis of metabolomics data. RESULTS: In our evaluation we show how MetaFIND may be used to elucidate metabolite signatures from the set of features selected by diverse techniques over two metabolomics datasets. Importantly, we also show how MetaFIND may augment standard feature selection and aid the discovery of additional significant features, including those which represent novel class discriminating metabolites. MetaFIND also supports the discovery of higher level metabolite correlations. CONCLUSION: Standard feature selection techniques may fail to capture the full set of relevant features in the case of high dimensional, multi-collinear metabolomics data. We show that the MetaFIND 'post-feature selection' analysis tool may aid metabolite signature elucidation, feature discovery and inference of metabolic correlations.


Assuntos
Biologia Computacional/métodos , Metabolômica/métodos , Software , Bases de Dados de Proteínas , Análise Discriminante , Internet , Análise dos Mínimos Quadrados , Espectrometria de Massas , Ressonância Magnética Nuclear Biomolecular , Reprodutibilidade dos Testes , Estatísticas não Paramétricas , Interface Usuário-Computador
12.
BMC Genomics ; 9 Suppl 2: S20, 2008 Sep 16.
Artigo em Inglês | MEDLINE | ID: mdl-18831786

RESUMO

BACKGROUND: Microarrays have the capacity to measure the expressions of thousands of genes in parallel over many experimental samples. The unsupervised classification technique of bicluster analysis has been employed previously to uncover gene expression correlations over subsets of samples with the aim of providing a more accurate model of the natural gene functional classes. This approach also has the potential to aid functional annotation of unclassified open reading frames (ORFs). Until now this aspect of biclustering has been under-explored. In this work we illustrate how bicluster analysis may be extended into a 'semi-supervised' ORF annotation approach referred to as BALBOA. RESULTS: The efficacy of the BALBOA ORF classification technique is first assessed via cross validation and compared to a multi-class k-Nearest Neighbour (kNN) benchmark across three independent gene expression datasets. BALBOA is then used to assign putative functional annotations to unclassified yeast ORFs. These predictions are evaluated using existing experimental and protein sequence information. Lastly, we employ a related semi-supervised method to predict the presence of novel functional modules within yeast. CONCLUSION: In this paper we demonstrate how unsupervised classification methods, such as bicluster analysis, may be extended using of available annotations to form semi-supervised approaches within the gene expression analysis domain. We show that such methods have the potential to improve upon supervised approaches and shed new light on the functions of unclassified ORFs and their co-regulation.


Assuntos
Perfilação da Expressão Gênica/métodos , Modelos Estatísticos , Fases de Leitura Aberta , Algoritmos , Análise por Conglomerados , Biologia Computacional/métodos , Expressão Gênica , Genoma Fúngico , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Reconhecimento Automatizado de Padrão/métodos , Saccharomyces cerevisiae/genética
13.
Bioinformatics ; 24(15): 1722-8, 2008 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-18556670

RESUMO

MOTIVATION: When working with large-scale protein interaction data, an important analysis task is the assignment of pairs of proteins to groups that correspond to higher order assemblies. Previously a common approach to this problem has been to apply standard hierarchical clustering methods to identify such a groups. Here we propose a new algorithm for aggregating a diverse collection of matrix factorizations to produce a more informative clustering, which takes the form of a 'soft' hierarchy of clusters. RESULTS: We apply the proposed Ensemble non-negative matrix factorization (NMF) algorithm to a high-quality assembly of binary protein interactions derived from two proteome-wide studies in yeast. Our experimental evaluation demonstrates that the algorithm lends itself to discovering small localized structures in this data, which correspond to known functional groupings of complexes. In addition, we show that the algorithm also supports the assignment of putative functions for previously uncharacterized proteins, for instance the protein YNR024W, which may be an uncharacterized component of the exosome.


Assuntos
Algoritmos , Análise por Conglomerados , Modelos Químicos , Reconhecimento Automatizado de Padrão/métodos , Mapeamento de Interação de Proteínas/métodos , Proteínas/química , Simulação por Computador , Ligação Proteica
14.
West J Emerg Med ; 9(2): 74-80, 2008 May.
Artigo em Inglês | MEDLINE | ID: mdl-19561711

RESUMO

BACKGROUND: Decision-support tools (DST) are typically developed by computer engineers for use by clinicians. Prototype testing DSTs may be performed relatively easily by one or two clinical experts. The costly alternative is to test each prototype on a larger number of diverse clinicians, based on the untested assumption that these evaluations would more accurately reflect those of actual end users. HYPOTHESIS: We hypothesized substantial or better agreement (as defined by a kappa statistic greater than 0.6) between the evaluations of a case based reasoning (CBR) DST predicting ED admission for bronchiolitis performed by the clinically diverse end users, to those of two clinical experts who evaluated the same DST output. METHODS: Three outputs from a previously described DST were evaluated by the emergency physicians (EP) who originally saw the patients and by two pediatric EPs with an interest in bronchiolitis. The DST outputs were as follows: predicted disposition, an example of another previously seen patient to explain the prediction, and explanatory dialog. Each was rated using the scale Definitely Not, No, Maybe, Yes, and Absolutely. This was converted to a Likert scale for analysis. Agreement was measured using the kappa statistic. RESULTS: Agreement with the DST predicted disposition was moderate between end users and the expert reviewers, but was only fair or poor for value of the explanatory case and dialog. CONCLUSION: Agreement between expert evaluators and end users on the value of a CBR DST predicted dispositions was moderate. For the more subjective explicative components, agreement was fair, poor, or worse.

15.
IEEE Trans Inf Technol Biomed ; 10(3): 519-25, 2006 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-16871720

RESUMO

In a gene expression data matrix, a bicluster is a submatrix of genes and conditions that exhibits a high correlation of expression activity across both rows and columns. The problem of locating the most significant bicluster has been shown to be NP-complete. Heuristic approaches such as Cheng and Church's greedy node deletion algorithm have been previously employed. It is to be expected that stochastic search techniques such as evolutionary algorithms or simulated annealing might improve upon such greedy techniques. In this paper we show that an approach based on simulated annealing is well suited to this problem, and we present a comparative evaluation of simulated annealing and node deletion on a variety of datasets. We show that simulated annealing discovers more significant biclusters in many cases. Furthermore, we also test the ability of our technique to locate biologically verifiable biclusters within an annotated set of genes.


Assuntos
Algoritmos , Perfilação da Expressão Gênica/métodos , Expressão Gênica/fisiologia , Modelos Biológicos , Família Multigênica/fisiologia , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Reconhecimento Automatizado de Padrão/métodos , Animais , Inteligência Artificial , Análise por Conglomerados , Simulação por Computador , Humanos , Armazenamento e Recuperação da Informação/métodos
16.
Appl Bioinformatics ; 4(3): 211-3, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-16231963

RESUMO

cluML is a new markup language for microarray data clustering and cluster validity assessment. The XML-based format has been designed to address some of the limitations observed in traditional formats, such as inability to store multiple clustering (including biclustering) and validation results within a dataset. cluML is an effective tool to support biomedical knowledge representation in gene expression data analysis. Although cluML was developed for DNA microarray analysis applications, it can be effectively used for the representation of clustering and for the validation of other biomedical and physical data that has no limitations.


Assuntos
Análise por Conglomerados , Biologia Computacional/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Linguagens de Programação , Algoritmos , Computadores , Bases de Dados Genéticas , Perfilação da Expressão Gênica , Armazenamento e Recuperação da Informação , Reconhecimento Automatizado de Padrão , Software
17.
Bioinformatics ; 21(10): 2546-7, 2005 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-15713738

RESUMO

UNLABELLED: This paper presents an approach to assessing cluster validity based on similarity knowledge extracted from the Gene Ontology. AVAILABILITY: The program is freely available for non-profit use on request from the authors.


Assuntos
Inteligência Artificial , Sistemas de Gerenciamento de Base de Dados , Bases de Dados de Proteínas , Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Reconhecimento Automatizado de Padrão/métodos , Software , Interface Usuário-Computador , Algoritmos , Análise por Conglomerados , Proteínas de Saccharomyces cerevisiae/metabolismo
18.
Bioinformatics ; 21(4): 451-5, 2005 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-15608048

RESUMO

UNLABELLED: In this paper we present a data mining system, which allows the application of different clustering and cluster validity algorithms for DNA microarray data. This tool may improve the quality of the data analysis results, and may support the prediction of the number of relevant clusters in the microarray datasets. This systematic evaluation approach may significantly aid genome expression analyses for knowledge discovery applications. The developed software system may be effectively used for clustering and validating not only DNA microarray expression analysis applications but also other biomedical and physical data with no limitations. AVAILABILITY: The program is freely available for non-profit use on request at http://www.cs.tcd.ie/Nadia.Bolshakova/Machaon.html CONTACT: Nadia.Bolshakova@cs.tcd.ie.


Assuntos
Algoritmos , Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Reconhecimento Automatizado de Padrão/métodos , Software , Interface Usuário-Computador , Benchmarking/métodos , Análise por Conglomerados , Sistemas de Gerenciamento de Base de Dados , Armazenamento e Recuperação da Informação , Integração de Sistemas
19.
Eur J Emerg Med ; 11(5): 259-64, 2004 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-15359198

RESUMO

BACKGROUND: Artificial neural networks apply complex non-linear functions to pattern recognition problems. An ensemble is a 'committee' of neural networks that usually outperforms single neural networks. Bronchiolitis is a common manifestation of viral lower respiratory tract infection in infants and toddlers. OBJECTIVE: To train artificial neural network ensembles to predict the disposition and length of stay in children presenting to the Emergency Department with bronchiolitis. METHODS: A specifically constructed database of 119 episodes of bronchiolitis was used to train, validate, and test a neural network ensemble. We used EasyNN 7.0 on a 200 Mhz pentium PC with a maths co-processor. The ensemble of neural networks constructed was subjected to fivefold validation. Comparison with actual and predicted dispositions was measured using the kappa statistic for disposition and the Kaplan-Meier estimations and log rank test for predictions of length of stay. RESULTS: The neural network ensembles correctly predicted disposition in 81% (range 75-90%) of test cases. When compared with actual disposition the neural network performed similarly to a logistic regression model and significantly better than various 'dumb machine' strategies with which we compared it. The prediction of length of stay was poorer, 65% (range 60-80%), but the difference between observed and predicted lengths of stay were not significantly different. CONCLUSION: Artificial neural network ensembles can predict disposition for infants and toddlers with bronchiolitis; however, the prediction of length of hospital stay is not as good.


Assuntos
Bronquiolite/diagnóstico , Bronquiolite/epidemiologia , Tempo de Internação/estatística & dados numéricos , Redes Neurais de Computação , Bronquiolite/terapia , Criança , Criança Hospitalizada/estatística & dados numéricos , Pré-Escolar , Serviço Hospitalar de Emergência , Feminino , Humanos , Masculino , Valor Preditivo dos Testes , Prognóstico , Sistema de Registros , Análise de Regressão , Medição de Risco , Sensibilidade e Especificidade , Índice de Gravidade de Doença , Resultado do Tratamento
20.
Artif Intell Med ; 28(2): 191-206, 2003 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-12893119

RESUMO

The use of ensembles in machine learning (ML) has had a considerable impact in increasing the accuracy and stability of predictors. This increase in accuracy has come at the cost of comprehensibility as, by definition, an ensemble model is considerably more complex than its component models. This is of significance for decision support systems in medicine because of the reluctance to use models that are essentially black boxes. Work on making ensembles comprehensible has so far focused on global models that mirror the behaviour of the ensemble as closely as possible. With such global models there is a clear tradeoff between comprehensibility and fidelity. In this paper, we pursue another tack, looking at local comprehensibility where the output of the ensemble is explained on a case-by-case basis. We argue that this meets the requirements of medical decision support systems. The approach presented here identifies the ensemble members that best fit the case in question and presents the behaviour of these in explanation.


Assuntos
Redes Neurais de Computação , Anticoagulantes/administração & dosagem , Bronquiolite/terapia , Criança , Hospitalização/estatística & dados numéricos , Humanos , Coeficiente Internacional Normatizado/estatística & dados numéricos , Reprodutibilidade dos Testes , Varfarina/administração & dosagem
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...