Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 3 de 3
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Methods Inf Med ; 48(6): 546-51, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-19696949

RESUMO

OBJECTIVES: Automated understanding of clinical records is a challenging task involving various legal and technical difficulties. Clinical free text is inherently redundant, unstructured, and full of acronyms, abbreviations and domain-specific language which make it challenging to mine automatically. There is much effort in the field focused on creating specialized ontology, lexicons and heuristics based on expert knowledge of the domain. However, ad-hoc solutions poorly generalize across diseases or diagnoses. This paper presents a successful approach for a rapid prototyping of a diagnosis classifier based on a popular computational linguistics platform. METHODS: The corpus consists of several hundred of full length discharge summaries provided by Partners Healthcare. The goal is to identify a diagnosis and assign co-morbidi-ty. Our approach is based on the rapid implementation of a logistic regression classifier using an existing toolkit: LingPipe (http://alias-i.com/lingpipe). We implement and compare three different classifiers. The baseline approach uses character 5-grams as features. The second approach uses a bag-of-words representation enriched with a small additional set of features. The third approach reduces a feature set to the most informative features according to the information content. RESULTS: The proposed systems achieve high performance (average F-micro 0.92) for the task. We discuss the relative merit of the three classifiers. Supplementary material with detailed results is available at: http:// decsai.ugr.es/~ccano/LR/supplementary_ material/ CONCLUSIONS: We show that our methodology for rapid prototyping of a domain-unaware system is effective for building an accurate classifier for clinical records.


Assuntos
Automação , Comorbidade , Diagnóstico , Prontuários Médicos/normas , Mineração de Dados , Humanos , Modelos Logísticos
2.
J Biomed Inform ; 42(5): 967-77, 2009 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-19232400

RESUMO

Agglomerating results from studies of individual biological components has shown the potential to produce biomedical discovery and the promise of therapeutic development. Such knowledge integration could be tremendously facilitated by automated text mining for relation extraction in the biomedical literature. Relation extraction systems cannot be developed without substantial datasets annotated with ground truth for benchmarking and training. The creation of such datasets is hampered by the absence of a resource for launching a distributed annotation effort, as well as by the lack of a standardized annotation schema. We have developed an annotation schema and an annotation tool which can be widely adopted so that the resulting annotated corpora from a multitude of disease studies could be assembled into a unified benchmark dataset. The contribution of this paper is threefold. First, we provide an overview of available benchmark corpora and derive a simple annotation schema for specific binary relation extraction problems such as protein-protein and gene-disease relation extraction. Second, we present BioNotate: an open source annotation resource for the distributed creation of a large corpus. Third, we present and make available the results of a pilot annotation effort of the autism disease network.


Assuntos
Armazenamento e Recuperação da Informação/métodos , Informática Médica/métodos , Processamento de Linguagem Natural , Reconhecimento Automatizado de Padrão/métodos , Interface Usuário-Computador , Transtorno Autístico , Mineração de Dados/métodos , Bases de Dados Factuais , Predisposição Genética para Doença , Humanos , Internet , Mapeamento de Interação de Proteínas , Terminologia como Assunto
3.
Bioinformatics ; 15(12): 980-6, 1999 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-10745987

RESUMO

MOTIVATION: Compositionally homogeneous segments of genomic DNA often correspond to meaningful biological units. Simple sliding window analysis is usually insufficient for compositional segmentation of natural sequences. Hidden Markov models (HMM) with a small number of states are a natural language for description of compositional properties of chromosome-size DNA sequences. RESULTS: The algorithms were applied to yeast Saccharomyces cerevisiae chromosomes (YC) I, III, IV, VI and IX. The optimal number of HMM states is found to be four. The optimal four-state HMMs for all chromosomes are very similar, as well as the reconstructed segmentations. In most cases the models with k + 1 states are obtained by 'splitting' one of the states in the model with k states, and the corresponding increase of the level of detail in segmentation. The high AT states usually correspond to intergenic regions. We also explore the model's likelihood landscape and analyze the dynamics of the optimization process, thus addressing the problem of reliability of the obtained optima and efficiency of the algorithms.


Assuntos
Algoritmos , Cadeias de Markov , Modelos Estatísticos , Saccharomyces cerevisiae/genética , Análise de Sequência de DNA/métodos , DNA/química , Hidrogênio/química , Hidrogênio/metabolismo , Modelos Genéticos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...