Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 45
Filtrar
1.
Disabil Rehabil ; : 1-10, 2023 Sep 13.
Artigo em Inglês | MEDLINE | ID: mdl-37702040

RESUMO

PURPOSE OF THE ARTICLE: This article describes a conceptual and methodological approach to integrating functional information into an ontology to categorize mental functioning, which to date is an under-developed area of classification, and supports our work with the United States (U.S.) Social Security Administration (SSA). DESIGN AND METHODOLOGICAL PROCEDURES: Conceptualizing and defining mental functioning was paramount to develop natural language processing (NLP) tools to support our use case. The International Classification of Functioning, Disability, and Health (ICF) was the framework used to conceptualize mental functioning at the activities and participation level in clinical records. To address challenges that arose when applying the ICF as to what should or should not be classified as mental functioning, a mental functioning domain ontology was developed that rearranged, reclassified and incorporated all ICF key components, concepts, classifications, and their definitions. CONCLUSIONS: Challenges emerged in the extent to which we could directly align components in the ICF into an applied ontology of mental functioning. These conceptual challenges required rearrangement of ICF components to adequately support our use case within the social security disability determination process. Findings also have implications to support future NLP efforts for behavioral health outcomes and policy research.


Mental functioning in everyday life is an important area of inquiry from the perspectives of public health, health policy, healthcare, and overall individual level health and well-being.A domain ontology of mental functioning that defines concepts and their relationships, and provides a common terminology with definitions, would enable interdisciplinary communication, research, and collaboration.A clearer conceptual model of mental functioning can improve the development of software that can identify, codify, and organize mental functioning information within clinical records into data that can be analyzed.The International Classification of Functioning, Disability and Health was utilized to conceptualize mental functioning and to guide the development of a proposed domain ontology of mental functioning.

2.
Psychiatr Serv ; 74(1): 56-62, 2023 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-35652194

RESUMO

The disability determination process of the Social Security Administration's (SSA's) disability program requires assessing work-related functioning for individual claimants alleging disability due to mental impairment. This task is particularly challenging because the determination process involves the review of a large file of information, including objective medical evidence and self-reports from claimants, families, and former employers. To improve this decision-making process, SSA entered an interagency agreement with the Rehabilitation Medicine Department, Epidemiology and Biostatistics Section, in the Clinical Center of the National Institutes of Health, intending to use data science and informatics to develop decision support tools. This collaborative effort over the past decade has led to the development of the Work Disability-Functional Assessment Battery and has initiated an approach to applying natural language processing to the review of claimants' files for information on mental health functioning. This informatics research collaboration holds promise for improving the process of disability determination for individuals with mental impairments who make claims at the SSA.


Assuntos
Pessoas com Deficiência , Saúde Mental , Estados Unidos , Humanos , United States Social Security Administration , Previdência Social , Avaliação da Deficiência , Informática
3.
Front Digit Health ; 4: 914171, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36148210

RESUMO

This paper describes the identification of body function (BF) mentions within the clinical text within a large, national, heterogeneous corpus to highlight structural challenges presented by the clinical text. BF in clinical documents provides information on dysfunction or impairments in the function or structure of organ systems or organs. BF mentions are embedded in highly formatted structures where the formats include implied scoping boundaries that confound existing natural language processing segmentation and document decomposition techniques. This paper describes follow-up work to adapt a rule-based system created using National Institutes of Health records to a larger, more challenging corpus of Social Security Administration data. Results of these systems provide a baseline for future work to improve document decomposition techniques.

4.
J Am Med Inform Assoc ; 28(3): 516-532, 2021 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-33319905

RESUMO

OBJECTIVES: Normalizing mentions of medical concepts to standardized vocabularies is a fundamental component of clinical text analysis. Ambiguity-words or phrases that may refer to different concepts-has been extensively researched as part of information extraction from biomedical literature, but less is known about the types and frequency of ambiguity in clinical text. This study characterizes the distribution and distinct types of ambiguity exhibited by benchmark clinical concept normalization datasets, in order to identify directions for advancing medical concept normalization research. MATERIALS AND METHODS: We identified ambiguous strings in datasets derived from the 2 available clinical corpora for concept normalization and categorized the distinct types of ambiguity they exhibited. We then compared observed string ambiguity in the datasets with potential ambiguity in the Unified Medical Language System (UMLS) to assess how representative available datasets are of ambiguity in clinical language. RESULTS: We found that <15% of strings were ambiguous within the datasets, while over 50% were ambiguous in the UMLS, indicating only partial coverage of clinical ambiguity. The percentage of strings in common between any pair of datasets ranged from 2% to only 36%; of these, 40% were annotated with different sets of concepts, severely limiting generalization. Finally, we observed 12 distinct types of ambiguity, distributed unequally across the available datasets, reflecting diverse linguistic and medical phenomena. DISCUSSION: Existing datasets are not sufficient to cover the diversity of clinical concept ambiguity, limiting both training and evaluation of normalization methods for clinical text. Additionally, the UMLS offers important semantic information for building and evaluating normalization methods. CONCLUSIONS: Our findings identify 3 opportunities for concept normalization research, including a need for ambiguity-specific clinical datasets and leveraging the rich semantics of the UMLS in new methods and evaluation measures for normalization.


Assuntos
Conjuntos de Dados como Assunto , Registros Eletrônicos de Saúde , Terminologia como Assunto , Unified Medical Language System , Aprendizado Profundo , Processamento de Linguagem Natural , Semântica , Vocabulário Controlado
5.
Stud Health Technol Inform ; 264: 452-456, 2019 Aug 21.
Artigo em Inglês | MEDLINE | ID: mdl-31437964

RESUMO

Misspellings in clinical free text present potential challenges to pharmacovigilance tasks, such as monitoring for potential ineffective treatment of drug-resistant infections. We developed a novel method using Word2Vec, Levenshtein edit distance constraints, and a customized lexicon to identify correct and misspelled pharmaceutical word forms. We processed a large corpus of clinical notes in a real-world pharmacovigilance task, achieving positive predictive values of 0.929 and 0.909 in identifying valid misspellings and correct spellings, respectively, and negative predictive values of 0.994 and 0.333 as assessments where the program did not produce output. In a specific Methicillin-Resistant Staphylococcus Aureus use case, the method identified 9,815 additional instances in the corpus for potential inaffective drug administration inspection. The findings suggest that this method could potentially achieve satisfactory results for other pharmacovigilance tasks.


Assuntos
Preparações Farmacêuticas , Farmacovigilância , Algoritmos , Idioma , Staphylococcus aureus Resistente à Meticilina
6.
Med Care ; 57 Suppl 6 Suppl 2: S149-S156, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-31095054

RESUMO

BACKGROUND: Despite national screening efforts, military sexual trauma (MST) is underreported. Little is known of racial/ethnic differences in MST reporting in the Veterans Health Administration (VHA). OBJECTIVE: This study aimed to compare patterns of MST disclosure in VHA by race/ethnicity. RESEARCH DESIGN: Retrospective cohort study of MST disclosures in a national, random sample of Veterans who served in Afghanistan and Iraq and completed MST screens from October 2009 to 2014. We used natural language processing (NLP) to extract MST concepts from electronic medical notes in the year following Veterans' first MST screen. MEASURE(S): Any evidence of MST (positive MST screen or NLP concepts) and late MST disclosure (NLP concepts following a negative MST screen). Multivariable logistic regressions, stratified by sex, tested racial/ethnic differences in any MST evidence, and late disclosure. RESULTS: Of 6618 male and 6716 female Veterans with MST screen results, 1473 had a positive screen (68 male, 1%; 1405 female, 21%). Of those with a negative screen, 257 evidenced late MST disclosure by NLP (44 male, 39%; 213 female, 13%). Late MST disclosure was usually documented during mental health visits. There were no significant racial/ethnic differences in MST disclosure among men. Among women, blacks were less likely than whites to have any MST evidence (adjusted odds ratio=0.75). In the subsample with any MST evidence, black and Hispanic women were more likely than whites to disclose MST late (adjusted odds ratio=1.89 and 1.59, respectively). CONCLUSIONS: Combining NLP results with MST screen data facilitated the identification of under-reported sexual trauma experiences among men and racial/ethnic minority women.


Assuntos
Revelação/estatística & dados numéricos , Documentação , Processamento de Linguagem Natural , Delitos Sexuais , Veteranos/estatística & dados numéricos , Adulto , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Estudos Retrospectivos , Delitos Sexuais/etnologia , Delitos Sexuais/estatística & dados numéricos , Estados Unidos , United States Department of Veterans Affairs
7.
BMC Res Notes ; 12(1): 42, 2019 Jan 18.
Artigo em Inglês | MEDLINE | ID: mdl-30658682

RESUMO

OBJECTIVE: Misspellings in clinical free text present challenges to natural language processing. With an objective to identify misspellings and their corrections, we developed a prototype spelling analysis method that implements Word2Vec, Levenshtein edit distance constraints, a lexical resource, and corpus term frequencies. We used the prototype method to process two different corpora, surgical pathology reports, and emergency department progress and visit notes, extracted from Veterans Health Administration resources. We evaluated performance by measuring positive predictive value and performing an error analysis of false positive output, using four classifications. We also performed an analysis of spelling errors in each corpus, using common error classifications. RESULTS: In this small-scale study utilizing a total of 76,786 clinical notes, the prototype method achieved positive predictive values of 0.9057 and 0.8979, respectively, for the surgical pathology reports, and emergency department progress and visit notes, in identifying and correcting misspelled words. False positives varied by corpus. Spelling error types were similar among the two corpora, however, the authors of emergency department progress and visit notes made over four times as many errors. Overall, the results of this study suggest that this method could also perform sufficiently in identifying misspellings in other clinical document types.


Assuntos
Dicionários como Assunto , Informática Médica/métodos , Processamento de Linguagem Natural , Vocabulário Controlado , Algoritmos , Humanos , Idioma , Informática Médica/normas , Informática Médica/estatística & dados numéricos , Sistemas Computadorizados de Registros Médicos/normas , Sistemas Computadorizados de Registros Médicos/estatística & dados numéricos , Patologia Cirúrgica/métodos , Reprodutibilidade dos Testes , Relatório de Pesquisa/normas , Unified Medical Language System/normas , Unified Medical Language System/estatística & dados numéricos
8.
AMIA Annu Symp Proc ; 2019: 514-522, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-32308845

RESUMO

Background: Experiences of sexual trauma are associated with adverse patient and health system outcomes, but are not systematically documented in electronic health records (EHR). Objective: To describe variations in how sexual trauma is documented in the Veterans Health Adminstration's EHR. Methods: Sexual trauma concepts were extracted from from 362,559 clinical notes using a natural language processing pipeline. Results: We observed variations in the presence of sexual trauma in notes across five United States regions: Pacific, Continental, Midwest, North Atlantic, Southeast. We also observed variations in the types of notes used to document sexual trauma (e.g., mental health, primary care) and sources of sexual trauma (e.g., adult, childhood, military) mentioned in the EHR. Our findings illustrate potential differences in cultural norms related to patient disclosure of sensitive information, and provider documentation. Standardized protocol for eliciting and documenting sexual trauma histories are needed to ensure Veteran access to high quality, trauma-informed care.


Assuntos
Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Delitos Sexuais , Veteranos , Adulto , Criança , Revelação , Documentação , Feminino , Humanos , Masculino , Serviços de Saúde Mental , Militares , Atenção Primária à Saúde , Estados Unidos , United States Department of Veterans Affairs
9.
Stud Health Technol Inform ; 238: 128-131, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28679904

RESUMO

Sexual trauma survivors are reluctant to disclose such a history due to stigma. This is likely the case when estimating the prevalence of sexual trauma experienced in the military. The Veterans Health Administration has a program by which all former US military service members (Veterans) are screened for military sexual trauma (MST) using a questionnaire. Administrative data on MST screens and a change of status from an initial negative answer to positive and natural language processing (NLP) on electronic medical notes to extract concepts related to MST were used to refine initial estimates of MST among a random sample of 20,000 Veterans. The initial MST positive screen of 15.4% among women was revised upward to 21.8% using administrative data and further to 24.5% by adding NLP results. The overall estimate of MST status in women and men in this sample was revised from 8.1% to 13.1% using both data elements.


Assuntos
Registros Eletrônicos de Saúde , Militares , Delitos Sexuais , United States Department of Veterans Affairs , Veteranos , Adulto , Coleta de Dados , Feminino , Humanos , Masculino , Comportamento Sexual , Transtornos de Estresse Pós-Traumáticos/epidemiologia , Estados Unidos
10.
Stud Health Technol Inform ; 238: 136-139, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28679906

RESUMO

We investigate options for grouping templates for the purpose of template identification and extraction from electronic medical records. We sampled a corpus of 1000 documents originating from Veterans Health Administration (VA) electronic medical record. We grouped documents through hashing and binning tokens (Hashed) as well as by the top 5% of tokens identified as important through the term frequency inverse document frequency metric (TF-IDF). We then compared the approaches on the number of groups with 3 or more and the resulting longest common subsequences (LCSs) common to all documents in the group. We found that the Hashed method had a higher success rate for finding LCSs, and longer LCSs than the TF-IDF method, however the TF-IDF approach found more groups than the Hashed and subsequently more long sequences, however the average length of LCSs were lower. In conclusion, each algorithm appears to have areas where it appears to be superior.


Assuntos
Algoritmos , Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Estados Unidos , United States Department of Veterans Affairs , Veteranos
11.
J Med Syst ; 41(2): 32, 2017 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-28050745

RESUMO

In an ideal clinical Natural Language Processing (NLP) ecosystem, researchers and developers would be able to collaborate with others, undertake validation of NLP systems, components, and related resources, and disseminate them. We captured requirements and formative evaluation data from the Veterans Affairs (VA) Clinical NLP Ecosystem stakeholders using semi-structured interviews and meeting discussions. We developed a coding rubric to code interviews. We assessed inter-coder reliability using percent agreement and the kappa statistic. We undertook 15 interviews and held two workshop discussions. The main areas of requirements related to; design and functionality, resources, and information. Stakeholders also confirmed the vision of the second generation of the Ecosystem and recommendations included; adding mechanisms to better understand terms, measuring collaboration to demonstrate value, and datasets/tools to navigate spelling errors with consumer language, among others. Stakeholders also recommended capability to: communicate with developers working on the next version of the VA electronic health record (VistA Evolution), provide a mechanism to automatically monitor download of tools and to automatically provide a summary of the downloads to Ecosystem contributors and funders. After three rounds of coding and discussion, we determined the percent agreement of two coders to be 97.2% and the kappa to be 0.7851. The vision of the VA Clinical NLP Ecosystem met stakeholder needs. Interviews and discussion provided key requirements that inform the design of the VA Clinical NLP Ecosystem.


Assuntos
Registros Eletrônicos de Saúde/organização & administração , Processamento de Linguagem Natural , United States Department of Veterans Affairs/organização & administração , Comunicação , Comportamento Cooperativo , Registros Eletrônicos de Saúde/normas , Humanos , Entrevistas como Assunto , Reprodutibilidade dos Testes , Terminologia como Assunto , Estados Unidos
12.
J Biomed Inform ; 71S: S39-S45, 2017 07.
Artigo em Inglês | MEDLINE | ID: mdl-27404849

RESUMO

OBJECTIVE: To develop a natural language processing pipeline to extract positively asserted concepts related to the presence of an indwelling urinary catheter in hospitalized patients from the free text of the electronic medical note. The goal is to assist infection preventionists and other healthcare professionals in determining whether a patient has an indwelling urinary catheter when a catheter-associated urinary tract infection is suspected. Currently, data on indwelling urinary catheters is not consistently captured in the electronic medical record in structured format and thus cannot be reliably extracted for clinical and research purposes. MATERIALS AND METHODS: We developed a lexicon of terms related to indwelling urinary catheters and urinary symptoms based on domain knowledge, prior experience in the field, and review of medical notes. A reference standard of 1595 randomly selected documents from inpatient admissions was annotated by human reviewers to identify all positively and negatively asserted concepts related to indwelling urinary catheters. We trained a natural language processing pipeline based on the V3NLP framework using 1050 documents and tested on 545 documents to determine agreement with the human reference standard. Metrics reported are positive predictive value and recall. RESULTS: The lexicon contained 590 terms related to the presence of an indwelling urinary catheter in various categories including insertion, care, change, and removal of urinary catheters and 67 terms for urinary symptoms. Nursing notes were the most frequent inpatient note titles in the reference standard document corpus; these also yielded the highest number of positively asserted concepts with respect to urinary catheters. Comparing the performance of the natural language processing pipeline against the human reference standard, the overall recall was 75% and positive predictive value was 99% on the training set; on the testing set, the recall was 72% and positive predictive value was 98%. The performance on extracting urinary symptoms (including fever) was high with recall and precision greater than 90%. CONCLUSIONS: We have shown that it is possible to identify the presence of an indwelling urinary catheter and urinary symptoms from the free text of electronic medical notes from inpatients using natural language processing. These are two key steps in developing automated protocols to assist humans in large-scale review of patient charts for catheter-associated urinary tract infection. The challenges associated with extracting indwelling urinary catheter-related concepts also inform the design of electronic medical record templates to reliably and consistently capture data on indwelling urinary catheters.


Assuntos
Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Cateteres Urinários , Infecções Urinárias , Mineração de Dados , Humanos
13.
Stud Health Technol Inform ; 245: 351-355, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29295114

RESUMO

Patient history of sexual trauma is of clinical relevance to healthcare providers as survivors face adverse health-related outcomes. This paper describes a method for identifying mentions of sexual trauma within the free text of electronic medical notes. A natural language processing pipeline for information extraction was developed and scaled to handle a large corpus of electronic medical notes used for this study from US Veterans Health Administration medical facilities. The tool was used to identify sexual trauma mentions and create snippets around every asserted mention based on a domain-specific lexicon developed for this purpose. All snippets were evaluated by trained human reviewers. An overall positive predictive value (PPV) of 0.90 for identifying sexual trauma mentions from the free text and a PPV of 0.71 at the patient level are reported. The metrics are superior for records from female patients.


Assuntos
Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Feminino , Humanos , Armazenamento e Recuperação da Informação
14.
Stud Health Technol Inform ; 245: 356-360, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29295115

RESUMO

There is need for cataloging signs and symptoms, but not all are documented in structured data. The text from clinical records are an additional source of signs and symptoms. We describe a Natural Language Processing (NLP) technique to identify symptoms from text. Using a human-annotated reference corpus from VA electronic medical notes we trained and tested an NLP pipeline to identify and categorize symptoms. The technique includes a model created from an automatic machine learning model selection tool. Tested on a hold-out set, its precision at the mention level was 0.80, recall 0.74 and an overall f-score of 0.80. The tool was scaled-up to process a large corpus of 964,105 patient records.


Assuntos
Mineração de Dados , Aprendizado de Máquina , Processamento de Linguagem Natural , Registros Eletrônicos de Saúde , Humanos
15.
J Biomed Inform ; 71S: S68-S76, 2017 07.
Artigo em Inglês | MEDLINE | ID: mdl-27497780

RESUMO

RATIONALE: Templates in text notes pose challenges for automated information extraction algorithms. We propose a method that identifies novel templates in plain text medical notes. The identification can then be used to either include or exclude templates when processing notes for information extraction. METHODS: The two-module method is based on the framework of information foraging and addresses the hypothesis that documents containing templates and the templates within those documents can be identified by common features. The first module takes documents from the corpus and groups those with common templates. This is accomplished through a binned word count hierarchical clustering algorithm. The second module extracts the templates. It uses the groupings and performs a longest common subsequence (LCS) algorithm to obtain the constituent parts of the templates. The method was developed and tested on a random document corpus of 750 notes derived from a large database of US Department of Veterans Affairs (VA) electronic medical notes. RESULTS: The grouping module, using hierarchical clustering, identified 23 groups with 3 documents or more, consisting of 120 documents from the 750 documents in our test corpus. Of these, 18 groups had at least one common template that was present in all documents in the group for a positive predictive value of 78%. The LCS extraction module performed with 100% positive predictive value, 94% sensitivity, and 83% negative predictive value. The human review determined that in 4 groups the template covered the entire document, with the remaining 14 groups containing a common section template. Among documents with templates, the number of templates per document ranged from 1 to 14. The mean and median number of templates per group was 5.9 and 5, respectively. DISCUSSION: The grouping method was successful in finding like documents containing templates. Of the groups of documents containing templates, the LCS module was successful in deciphering text belonging to the template and text that was extraneous. Major obstacles to improved performance included documents composed of multiple templates, templates that included other templates embedded within them, and variants of templates. We demonstrate proof of concept of the grouping and extraction method of identifying templates in electronic medical records in this pilot study and propose methods to improve performance and scaling up.


Assuntos
Algoritmos , Registros Eletrônicos de Saúde , Heurística , Processamento de Linguagem Natural , Humanos , Projetos Piloto
16.
EGEMS (Wash DC) ; 4(3): 1228, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27683667

RESUMO

INTRODUCTION: Substantial amounts of clinically significant information are contained only within the narrative of the clinical notes in electronic medical records. The v3NLP Framework is a set of "best-of-breed" functionalities developed to transform this information into structured data for use in quality improvement, research, population health surveillance, and decision support. BACKGROUND: MetaMap, cTAKES and similar well-known natural language processing (NLP) tools do not have sufficient scalability out of the box. The v3NLP Framework evolved out of the necessity to scale-up these tools up and provide a framework to customize and tune techniques that fit a variety of tasks, including document classification, tuned concept extraction for specific conditions, patient classification, and information retrieval. INNOVATION: Beyond scalability, several v3NLP Framework-developed projects have been efficacy tested and benchmarked. While v3NLP Framework includes annotators, pipelines and applications, its functionalities enable developers to create novel annotators and to place annotators into pipelines and scaled applications. DISCUSSION: The v3NLP Framework has been successfully utilized in many projects including general concept extraction, risk factors for homelessness among veterans, and identification of mentions of the presence of an indwelling urinary catheter. Projects as diverse as predicting colonization with methicillin-resistant Staphylococcus aureus and extracting references to military sexual trauma are being built using v3NLP Framework components. CONCLUSION: The v3NLP Framework is a set of functionalities and components that provide Java developers with the ability to create novel annotators and to place those annotators into pipelines and applications to extract concepts from clinical text. There are scale-up and scale-out functionalities to process large numbers of records.

17.
Stud Health Technol Inform ; 226: 33-6, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27350459

RESUMO

Medical text contains boilerplated content, an artifact of pull-down forms from EMRs. Boilerplated content is the source of challenges for concept extraction on clinical text. This paper introduces PlateRunner, a search engine on boilerplates from the US Department of Veterans Affairs (VA) EMR. Boilerplates containing concepts should be identified and reviewed to recognize challenging formats, identify high yield document titles, and fine tune section zoning. This search engine has the capability to filter negated and asserted concepts, save and search query results. This tool can save queries, search results, and documents found for later analysis.


Assuntos
Registros Eletrônicos de Saúde/organização & administração , Ferramenta de Busca/métodos , Humanos , Estados Unidos , United States Department of Veterans Affairs
18.
Stud Health Technol Inform ; 226: 79-82, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27350471

RESUMO

Extracting evidence of the absence of a target of interest from medical text can be useful in clinical inferencing. The purpose of our study was to develop a natural language processing (NLP) pipelineto identify the presence of indwelling urinary catheters from electronic medical notes to aid in detection of catheter-associated urinary tract infections (CAUTI). Finding clear evidence that a patient does not have an indwelling urinary catheter is useful in making a determination regarding CAUTI. We developed a lexicon of seven core concepts to infer the absence of a urinary catheter. Of the 990,391 concepts extractedby NLP from a large corpus of 744,285 electronic medical notes from 5589 hospitalized patients, 63,516 were labeled as evidence of absence.Human review revealed three primary causes for false negatives. The lexicon and NLP pipeline were refined using this information, resulting in outputs with an acceptable false positive rate of 11%.


Assuntos
Infecções Relacionadas a Cateter/diagnóstico , Documentação/estatística & dados numéricos , Registros Eletrônicos de Saúde/estatística & dados numéricos , Processamento de Linguagem Natural , Cateteres Urinários/efeitos adversos , Erros de Diagnóstico , Humanos , Pacientes Internados
19.
J Biomed Inform ; 58: 19-27, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26362345

RESUMO

OBJECTIVE: To develop a method to exploit the UMLS Metathesaurus for extracting and categorizing concepts found in clinical text representing signs and symptoms to anatomically related organ systems. The overarching goal is to classify patient reported symptoms to organ systems for population health and epidemiological analyses. MATERIALS AND METHODS: Using the concepts' semantic types and the inter-concept relationships as guidance, a selective portion of the concepts within the UMLS Metathesaurus was traversed starting from the concepts representing the highest level organ systems. The traversed concepts were chosen, filtered, and reviewed to obtain the concepts representing clinical signs and symptoms by blocking deviations, pruning superfluous concepts, and manual review. The mapping process was applied to signs and symptoms annotated in a corpus of 750 clinical notes. RESULTS: The mapping process yielded a total of 91,000 UMLS concepts (with approximately 300,000 descriptions) possibly representing physical and mental signs and symptoms that were extracted and categorized to the anatomically related organ systems. Of 1864 distinct descriptions of signs and symptoms found in the 750 document corpus, 1635 of these (88%) were successfully mapped to the set of concepts extracted from the UMLS. Of 668 unique concepts mapped, 603 (90%) were correctly categorized to their organ systems. CONCLUSION: We present a process that facilitates mapping of signs and symptoms to their organ systems. By providing a smaller set of UMLS concepts to use for comparing and matching patient records, this method has the potential to increase efficiency of information extraction pipelines.


Assuntos
Anatomia , Formação de Conceito , Unified Medical Language System , Humanos
20.
Stud Health Technol Inform ; 216: 639-42, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26262129

RESUMO

Clinical notes contain important temporal information that are critical for making clinical diagnosis and treatment as well as for retrospective analyses. Manually created regular expressions are commonly used for the extraction of temporal information; however, this can be a time consuming and brittle approach. We describe a novel algorithm for automatic learning of regular expressions in recognizing temporal expressions. Five classes of temporal expressions are identified. Keywords specific to those classes are used to retrieve snippets of text representing the same keywords in context. Those snippets are used for Regular Expression Discovery Extraction (REDEx). These learned regular expressions are then evaluated using 10-fold cross validation. Precision and recall are very high, above 0.95 for most classes.


Assuntos
Cronologia como Assunto , Mineração de Dados/métodos , Registros Eletrônicos de Saúde/classificação , Aprendizado de Máquina , Processamento de Linguagem Natural , Fatores de Tempo , Reprodutibilidade dos Testes , Semântica , Sensibilidade e Especificidade , Terminologia como Assunto , Vocabulário Controlado
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...