Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 40
Filtrar
1.
JMIR Hum Factors ; 9(2): e35325, 2022 May 11.
Artigo em Inglês | MEDLINE | ID: mdl-35544296

RESUMO

BACKGROUND: Patients' spontaneous speech can act as a biomarker for identifying pathological entities, such as mental illness. Despite this potential, audio recording patients' spontaneous speech is not part of clinical workflows, and health care organizations often do not have dedicated policies regarding the audio recording of clinical encounters. No previous studies have investigated the best practical approach for integrating audio recording of patient-clinician encounters into clinical workflows, particularly in the home health care (HHC) setting. OBJECTIVE: This study aimed to evaluate the functionality and usability of several audio-recording devices for the audio recording of patient-nurse verbal communications in the HHC settings and elicit HHC stakeholder (patients and nurses) perspectives about the facilitators of and barriers to integrating audio recordings into clinical workflows. METHODS: This study was conducted at a large urban HHC agency located in New York, United States. We evaluated the usability and functionality of 7 audio-recording devices in a laboratory (controlled) setting. A total of 3 devices-Saramonic Blink500, Sony ICD-TX6, and Black Vox 365-were further evaluated in a clinical setting (patients' homes) by HHC nurses who completed the System Usability Scale questionnaire and participated in a short, structured interview to elicit feedback about each device. We also evaluated the accuracy of the automatic transcription of audio-recorded encounters for the 3 devices using the Amazon Web Service Transcribe. Word error rate was used to measure the accuracy of automated speech transcription. To understand the facilitators of and barriers to integrating audio recording of encounters into clinical workflows, we conducted semistructured interviews with 3 HHC nurses and 10 HHC patients. Thematic analysis was used to analyze the transcribed interviews. RESULTS: Saramonic Blink500 received the best overall evaluation score. The System Usability Scale score and word error rate for Saramonic Blink500 were 65% and 26%, respectively, and nurses found it easier to approach patients using this device than with the other 2 devices. Overall, patients found the process of audio recording to be satisfactory and convenient, with minimal impact on their communication with nurses. Although, in general, nurses also found the process easy to learn and satisfactory, they suggested that the audio recording of HHC encounters can affect their communication patterns. In addition, nurses were not aware of the potential to use audio-recorded encounters to improve health care services. Nurses also indicated that they would need to involve their managers to determine how audio recordings could be integrated into their clinical workflows and for any ongoing use of audio recordings during patient care management. CONCLUSIONS: This study established the feasibility of audio recording HHC patient-nurse encounters. Training HHC nurses about the importance of the audio-recording process and the support of clinical managers are essential factors for successful implementation.

2.
J Am Heart Assoc ; 7(20): e09841, 2018 10 16.
Artigo em Inglês | MEDLINE | ID: mdl-30371257

RESUMO

Background Heart failure ( HF ) with "recovered" ejection fraction ( HF rec EF ) is an emerging phenotype, but no tools exist to predict ejection fraction ( EF ) recovery in acute HF . We hypothesized that indices of baseline cardiac structure and function predict HF rec EF in nonischemic cardiomyopathy and reduced EF . Methods and Results We identified a nonischemic cardiomyopathy cohort with EF <40% during the first HF hospitalization (n=166). We performed speckle-tracking echocardiography to measure longitudinal, circumferential, and radial strain, and the average of these measures (myocardial systolic performance). HF rec EF was defined as follow-up EF ≥40% and ≥10% improvement from baseline EF . Fifty-nine patients (36%) achieved HF rec EF (baseline EF 26±7%; follow-up EF 51±7%) within a median of 135 (interquartile range 58-239) days after the first HF hospitalization. Baseline demographics, biomarker profiles, and comorbid conditions (except lower chronic kidney disease in HF rec EF ) were similar between HF rec EF and persistent reduced- EF groups. HF rec EF patients had smaller baseline left ventricular end-systolic dimension (3.6 versus 4.8 cm; P<0.01), higher baseline myocardial systolic performance (9.2% versus 8.1%; P=0.02), and improved survival (adjusted hazard ratio 0.27, 95% confidence interval 0.11, 0.62). We found a significant interaction between baseline left ventricular end-systolic dimension and absolute longitudinal strain. Among patients with left ventricular end-systolic dimension >4.35 cm, higher absolute longitudinal strain (≥8%) was associated with HF rec EF (unadjusted odds ratio=3.9, 95% CI )confidence interval 1.2, 12.8). Incorporation of baseline indices of cardiac mechanics with clinical variables resulted in a predictive model for HF rec EF with c-statistic=0.85. Conclusions Factors associated with achieving HF rec EF were specific to cardiac structure and indices of cardiac mechanics. Higher baseline absolute longitudinal strain is associated with HF rec EF among nonischemic cardiomyopathy patients with reduced EF and larger left ventricular dimensions.


Assuntos
Cardiomiopatias/fisiopatologia , Insuficiência Cardíaca/fisiopatologia , Cardiomiopatias/terapia , Ecocardiografia , Feminino , Insuficiência Cardíaca/terapia , Hospitalização/estatística & dados numéricos , Humanos , Estimativa de Kaplan-Meier , Estudos Longitudinais , Masculino , Pessoa de Meia-Idade , Resultado do Tratamento , Disfunção Ventricular Esquerda/fisiopatologia
3.
J Biomed Inform ; 73: 95-103, 2017 09.
Artigo em Inglês | MEDLINE | ID: mdl-28756159

RESUMO

OBJECTIVES: The practice of evidence-based medicine involves integrating the latest best available evidence into patient care decisions. Yet, critical barriers exist for clinicians' retrieval of evidence that is relevant for a particular patient from primary sources such as randomized controlled trials and meta-analyses. To help address those barriers, we investigated machine learning algorithms that find clinical studies with high clinical impact from PubMed®. METHODS: Our machine learning algorithms use a variety of features including bibliometric features (e.g., citation count), social media attention, journal impact factors, and citation metadata. The algorithms were developed and evaluated with a gold standard composed of 502 high impact clinical studies that are referenced in 11 clinical evidence-based guidelines on the treatment of various diseases. We tested the following hypotheses: (1) our high impact classifier outperforms a state-of-the-art classifier based on citation metadata and citation terms, and PubMed's® relevance sort algorithm; and (2) the performance of our high impact classifier does not decrease significantly after removing proprietary features such as citation count. RESULTS: The mean top 20 precision of our high impact classifier was 34% versus 11% for the state-of-the-art classifier and 4% for PubMed's® relevance sort (p=0.009); and the performance of our high impact classifier did not decrease significantly after removing proprietary features (mean top 20 precision=34% vs. 36%; p=0.085). CONCLUSION: The high impact classifier, using features such as bibliometrics, social media attention and MEDLINE® metadata, outperformed previous approaches and is a promising alternative to identifying high impact studies for clinical decision support.


Assuntos
Bibliometria , Tomada de Decisão Clínica , Medicina Baseada em Evidências , Aprendizado de Máquina , PubMed , Algoritmos , Humanos , Armazenamento e Recuperação da Informação , MEDLINE , Metadados , Mídias Sociais
4.
J Cardiovasc Transl Res ; 10(3): 313-321, 2017 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-28585184

RESUMO

Precision medicine requires clinical trials that are able to efficiently enroll subtypes of patients in whom targeted therapies can be tested. To reduce the large amount of time spent screening, identifying, and recruiting patients with specific subtypes of heterogeneous clinical syndromes (such as heart failure with preserved ejection fraction [HFpEF]), we need prescreening systems that are able to automate data extraction and decision-making tasks. However, a major obstacle is the vast amount of unstructured free-form text in medical records. Here we describe an information extraction-based approach that automatically converts unstructured text into structured data, which is cross-referenced against eligibility criteria using a rule-based system to determine which patients qualify for a major HFpEF clinical trial (PARAGON). We show that we can achieve a sensitivity and positive predictive value of 0.95 and 0.86, respectively. Our open-source algorithm could be used to efficiently identify and subphenotype patients with HFpEF and other disorders.


Assuntos
Ensaios Clínicos como Assunto/métodos , Mineração de Dados/métodos , Registros Eletrônicos de Saúde , Definição da Elegibilidade/métodos , Insuficiência Cardíaca/fisiopatologia , Processamento de Linguagem Natural , Seleção de Pacientes , Volume Sistólico , Algoritmos , Ecocardiografia , Insuficiência Cardíaca/classificação , Insuficiência Cardíaca/diagnóstico , Insuficiência Cardíaca/terapia , Humanos , Fenótipo , Valor Preditivo dos Testes , Reprodutibilidade dos Testes
5.
Drug Saf ; 40(11): 1075-1089, 2017 11.
Artigo em Inglês | MEDLINE | ID: mdl-28643174

RESUMO

The goal of pharmacovigilance is to detect, monitor, characterize and prevent adverse drug events (ADEs) with pharmaceutical products. This article is a comprehensive structured review of recent advances in applying natural language processing (NLP) to electronic health record (EHR) narratives for pharmacovigilance. We review methods of varying complexity and problem focus, summarize the current state-of-the-art in methodology advancement, discuss limitations and point out several promising future directions. The ability to accurately capture both semantic and syntactic structures in clinical narratives becomes increasingly critical to enable efficient and accurate ADE detection. Significant progress has been made in algorithm development and resource construction since 2000. Since 2012, statistical analysis and machine learning methods have gained traction in automation of ADE mining from EHR narratives. Current state-of-the-art methods for NLP-based ADE detection from EHRs show promise regarding their integration into production pharmacovigilance systems. In addition, integrating multifaceted, heterogeneous data sources has shown promise in improving ADE detection and has become increasingly adopted. On the other hand, challenges and opportunities remain across the frontier of NLP application to EHR-based pharmacovigilance, including proper characterization of ADE context, differentiation between off- and on-label drug-use ADEs, recognition of the importance of polypharmacy-induced ADEs, better integration of heterogeneous data sources, creation of shared corpora, and organization of shared-task challenges to advance the state-of-the-art.


Assuntos
Sistemas de Notificação de Reações Adversas a Medicamentos/normas , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/diagnóstico , Registros Eletrônicos de Saúde/normas , Processamento de Linguagem Natural , Farmacovigilância , Humanos
6.
J Biomed Inform ; 64: 265-272, 2016 12.
Artigo em Inglês | MEDLINE | ID: mdl-27989816

RESUMO

OBJECTIVES: Extracting data from publication reports is a standard process in systematic review (SR) development. However, the data extraction process still relies too much on manual effort which is slow, costly, and subject to human error. In this study, we developed a text summarization system aimed at enhancing productivity and reducing errors in the traditional data extraction process. METHODS: We developed a computer system that used machine learning and natural language processing approaches to automatically generate summaries of full-text scientific publications. The summaries at the sentence and fragment levels were evaluated in finding common clinical SR data elements such as sample size, group size, and PICO values. We compared the computer-generated summaries with human written summaries (title and abstract) in terms of the presence of necessary information for the data extraction as presented in the Cochrane review's study characteristics tables. RESULTS: At the sentence level, the computer-generated summaries covered more information than humans do for systematic reviews (recall 91.2% vs. 83.8%, p<0.001). They also had a better density of relevant sentences (precision 59% vs. 39%, p<0.001). At the fragment level, the ensemble approach combining rule-based, concept mapping, and dictionary-based methods performed better than individual methods alone, achieving an 84.7% F-measure. CONCLUSION: Computer-generated summaries are potential alternative information sources for data extraction in systematic review development. Machine learning and natural language processing are promising approaches to the development of such an extractive summarization system.


Assuntos
Aprendizado de Máquina , Processamento de Linguagem Natural , Revisões Sistemáticas como Assunto , Humanos , Mineração de Dados , Idioma , Publicações
7.
JMIR Med Inform ; 4(3): e24, 2016 Aug 02.
Artigo em Inglês | MEDLINE | ID: mdl-27485666

RESUMO

BACKGROUND: Community-based question answering (CQA) sites play an important role in addressing health information needs. However, a significant number of posted questions remain unanswered. Automatically answering the posted questions can provide a useful source of information for Web-based health communities. OBJECTIVE: In this study, we developed an algorithm to automatically answer health-related questions based on past questions and answers (QA). We also aimed to understand information embedded within Web-based health content that are good features in identifying valid answers. METHODS: Our proposed algorithm uses information retrieval techniques to identify candidate answers from resolved QA. To rank these candidates, we implemented a semi-supervised leaning algorithm that extracts the best answer to a question. We assessed this approach on a curated corpus from Yahoo! Answers and compared against a rule-based string similarity baseline. RESULTS: On our dataset, the semi-supervised learning algorithm has an accuracy of 86.2%. Unified medical language system-based (health related) features used in the model enhance the algorithm's performance by proximately 8%. A reasonably high rate of accuracy is obtained given that the data are considerably noisy. Important features distinguishing a valid answer from an invalid answer include text length, number of stop words contained in a test question, a distance between the test question and other questions in the corpus, and a number of overlapping health-related terms between questions. CONCLUSIONS: Overall, our automated QA system based on historical QA pairs is shown to be effective according to the dataset in this case study. It is developed for general use in the health care domain, which can also be applied to other CQA sites.

8.
AMIA Jt Summits Transl Sci Proc ; 2016: 203-12, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27570671

RESUMO

Precision Medicine is an emerging approach for prevention and treatment of disease that considers individual variability in genes, environment, and lifestyle for each person. The dissemination of individualized evidence by automatically identifying population information in literature is a key for evidence-based precision medicine at the point-of-care. We propose a hybrid approach using natural language processing techniques to automatically extract the population information from biomedical literature. Our approach first implements a binary classifier to classify sentences with or without population information. A rule-based system based on syntactic-tree regular expressions is then applied to sentences containing population information to extract the population named entities. The proposed two-stage approach achieved an F-score of 0.81 using a MaxEnt classifier and the rule- based system, and an F-score of 0.87 using a Nai've-Bayes classifier and the rule-based system, and performed relatively well compared to many existing systems. The system and evaluation dataset is being released as open source.

9.
J Biomed Inform ; 61: 141-8, 2016 06.
Artigo em Inglês | MEDLINE | ID: mdl-27044929

RESUMO

OBJECTIVES: Data extraction from original study reports is a time-consuming, error-prone process in systematic review development. Information extraction (IE) systems have the potential to assist humans in the extraction task, however majority of IE systems were not designed to work on Portable Document Format (PDF) document, an important and common extraction source for systematic review. In a PDF document, narrative content is often mixed with publication metadata or semi-structured text, which add challenges to the underlining natural language processing algorithm. Our goal is to categorize PDF texts for strategic use by IE systems. METHODS: We used an open-source tool to extract raw texts from a PDF document and developed a text classification algorithm that follows a multi-pass sieve framework to automatically classify PDF text snippets (for brevity, texts) into TITLE, ABSTRACT, BODYTEXT, SEMISTRUCTURE, and METADATA categories. To validate the algorithm, we developed a gold standard of PDF reports that were included in the development of previous systematic reviews by the Cochrane Collaboration. In a two-step procedure, we evaluated (1) classification performance, and compared it with machine learning classifier, and (2) the effects of the algorithm on an IE system that extracts clinical outcome mentions. RESULTS: The multi-pass sieve algorithm achieved an accuracy of 92.6%, which was 9.7% (p<0.001) higher than the best performing machine learning classifier that used a logistic regression algorithm. F-measure improvements were observed in the classification of TITLE (+15.6%), ABSTRACT (+54.2%), BODYTEXT (+3.7%), SEMISTRUCTURE (+34%), and MEDADATA (+14.2%). In addition, use of the algorithm to filter semi-structured texts and publication metadata improved performance of the outcome extraction system (F-measure +4.1%, p=0.002). It also reduced of number of sentences to be processed by 44.9% (p<0.001), which corresponds to a processing time reduction of 50% (p=0.005). CONCLUSIONS: The rule-based multi-pass sieve framework can be used effectively in categorizing texts extracted from PDF documents. Text classification is an important prerequisite step to leverage information extraction from PDF documents.


Assuntos
Algoritmos , Armazenamento e Recuperação da Informação , Aprendizado de Máquina , Processamento de Linguagem Natural , Literatura de Revisão como Assunto , Humanos , Narração , Publicações
10.
PLoS One ; 11(4): e0153749, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27124000

RESUMO

Large volumes of data are continuously generated from clinical notes and diagnostic studies catalogued in electronic health records (EHRs). Echocardiography is one of the most commonly ordered diagnostic tests in cardiology. This study sought to explore the feasibility and reliability of using natural language processing (NLP) for large-scale and targeted extraction of multiple data elements from echocardiography reports. An NLP tool, EchoInfer, was developed to automatically extract data pertaining to cardiovascular structure and function from heterogeneously formatted echocardiographic data sources. EchoInfer was applied to echocardiography reports (2004 to 2013) available from 3 different on-going clinical research projects. EchoInfer analyzed 15,116 echocardiography reports from 1684 patients, and extracted 59 quantitative and 21 qualitative data elements per report. EchoInfer achieved a precision of 94.06%, a recall of 92.21%, and an F1-score of 93.12% across all 80 data elements in 50 reports. Physician review of 400 reports demonstrated that EchoInfer achieved a recall of 92-99.9% and a precision of >97% in four data elements, including three quantitative and one qualitative data element. Failure of EchoInfer to correctly identify or reject reported parameters was primarily related to non-standardized reporting of echocardiography data. EchoInfer provides a powerful and reliable NLP-based approach for the large-scale, targeted extraction of information from heterogeneous data sources. The use of EchoInfer may have implications for the clinical management and research analysis of patients undergoing echocardiographic evaluation.


Assuntos
Ecocardiografia/métodos , Processamento de Linguagem Natural , Idoso , Registros Eletrônicos de Saúde , Feminino , Humanos , Armazenamento e Recuperação da Informação , Masculino , Reprodutibilidade dos Testes
11.
J Biomed Inform ; 60: 14-22, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26774763

RESUMO

UNLABELLED: Most patient care questions raised by clinicians can be answered by online clinical knowledge resources. However, important barriers still challenge the use of these resources at the point of care. OBJECTIVE: To design and assess a method for extracting clinically useful sentences from synthesized online clinical resources that represent the most clinically useful information for directly answering clinicians' information needs. MATERIALS AND METHODS: We developed a Kernel-based Bayesian Network classification model based on different domain-specific feature types extracted from sentences in a gold standard composed of 18 UpToDate documents. These features included UMLS concepts and their semantic groups, semantic predications extracted by SemRep, patient population identified by a pattern-based natural language processing (NLP) algorithm, and cue words extracted by a feature selection technique. Algorithm performance was measured in terms of precision, recall, and F-measure. RESULTS: The feature-rich approach yielded an F-measure of 74% versus 37% for a feature co-occurrence method (p<0.001). Excluding predication, population, semantic concept or text-based features reduced the F-measure to 62%, 66%, 58% and 69% respectively (p<0.01). The classifier applied to Medline sentences reached an F-measure of 73%, which is equivalent to the performance of the classifier on UpToDate sentences (p=0.62). CONCLUSIONS: The feature-rich approach significantly outperformed general baseline methods. This approach significantly outperformed classifiers based on a single type of feature. Different types of semantic features provided a unique contribution to overall classification performance. The classifier's model and features used for UpToDate generalized well to Medline abstracts.


Assuntos
Sistemas de Apoio a Decisões Clínicas , Armazenamento e Recuperação da Informação/métodos , Aprendizado de Máquina Supervisionado , Algoritmos , Teorema de Bayes , Humanos , Idioma , MEDLINE , Processamento de Linguagem Natural , Semântica , Terminologia como Assunto , Unified Medical Language System
12.
J Med Internet Res ; 18(1): e11, 2016 Jan 13.
Artigo em Inglês | MEDLINE | ID: mdl-26764193

RESUMO

BACKGROUND: An increasing number of people visit online health communities to seek health information. In these communities, people share experiences and information with others, often complemented with links to different websites. Understanding how people share websites can help us understand patients' needs in online health communities and improve how peer patients share health information online. OBJECTIVE: Our goal was to understand (1) what kinds of websites are shared, (2) information quality of the shared websites, (3) who shares websites, (4) community differences in website-sharing behavior, and (5) the contexts in which patients share websites. We aimed to find practical applications and implications of website-sharing practices in online health communities. METHODS: We used regular expressions to extract URLs from 10 WebMD online health communities. We then categorized the URLs based on their top-level domains. We counted the number of trust codes (eg, accredited agencies' formal evaluation and PubMed authors' institutions) for each website to assess information quality. We used descriptive statistics to determine website-sharing activities. To understand the context of the URL being discussed, we conducted a simple random selection of 5 threads that contained at least one post with URLs from each community. Gathering all other posts in these threads resulted in 387 posts for open coding analysis with the goal of understanding motivations and situations in which website sharing occurred. RESULTS: We extracted a total of 25,448 websites. The majority of the shared websites were .com (59.16%, 15,056/25,448) and WebMD internal (23.2%, 5905/25,448) websites; the least shared websites were social media websites (0.15%, 39/25,448). High-posting community members and moderators posted more websites with trust codes than low-posting community members did. The heart disease community had the highest percentage of websites containing trust codes compared to other communities. Members used websites to disseminate information, supportive evidence, resources for social support, and other ways to communicate. CONCLUSIONS: Online health communities can be used as important health care information resources for patients and caregivers. Our findings inform patients' health information-sharing activities. This information assists health care providers, informaticians, and online health information entrepreneurs and developers in helping patients and caregivers make informed choices.


Assuntos
Informação de Saúde ao Consumidor , Internet , Apoio Social , Pessoal de Saúde , Humanos , Internet/normas
13.
Int J Med Inform ; 86: 126-34, 2016 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-26612774

RESUMO

OBJECTIVE: To iteratively design a prototype of a computerized clinical knowledge summarization (CKS) tool aimed at helping clinicians finding answers to their clinical questions; and to conduct a formative assessment of the usability, usefulness, efficiency, and impact of the CKS prototype on physicians' perceived decision quality compared with standard search of UpToDate and PubMed. MATERIALS AND METHODS: Mixed-methods observations of the interactions of 10 physicians with the CKS prototype vs. standard search in an effort to solve clinical problems posed as case vignettes. RESULTS: The CKS tool automatically summarizes patient-specific and actionable clinical recommendations from PubMed (high quality randomized controlled trials and systematic reviews) and UpToDate. Two thirds of the study participants completed 15 out of 17 usability tasks. The median time to task completion was less than 10s for 12 of the 17 tasks. The difference in search time between the CKS and standard search was not significant (median=4.9 vs. 4.5m in). Physician's perceived decision quality was significantly higher with the CKS than with manual search (mean=16.6 vs. 14.4; p=0.036). CONCLUSIONS: The CKS prototype was well-accepted by physicians both in terms of usability and usefulness. Physicians perceived better decision quality with the CKS prototype compared to standard search of PubMed and UpToDate within a similar search time. Due to the formative nature of this study and a small sample size, conclusions regarding efficiency and efficacy are exploratory.


Assuntos
Sistemas de Apoio a Decisões Clínicas/estatística & dados numéricos , Gestão do Conhecimento/normas , Registro Médico Coordenado , Modelagem Computacional Específica para o Paciente , Humanos , Reconhecimento Automatizado de Padrão , Resolução de Problemas , Integração de Sistemas
14.
AMIA Annu Symp Proc ; 2016: 705-714, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-28269867

RESUMO

Motivation: Clinicians need up-to-date evidence from high quality clinical trials to support clinical decisions. However, applying evidence from the primary literature requires significant effort. Objective: To examine the feasibility of automatically extracting key clinical trial information from ClinicalTrials.gov. Methods: We assessed the coverage of ClinicalTrials.gov for high quality clinical studies that are indexed in PubMed. Using 140 random ClinicalTrials.gov records, we developed and tested rules for the automatic extraction of key information. Results: The rate of high quality clinical trial registration in ClinicalTrials.gov increased from 0.2% in 2005 to 17% in 2015. Trials reporting results increased from 3% in 2005 to 19% in 2015. The accuracy of the automatic extraction algorithm for 10 trial attributes was 90% on average. Future research is needed to improve the algorithm accuracy and to design information displays to optimally present trial information to clinicians.


Assuntos
Ensaios Clínicos como Assunto/estatística & dados numéricos , Bases de Dados Factuais , Armazenamento e Recuperação da Informação , PubMed , Algoritmos , Ensaios Clínicos como Assunto/normas , Medicina Baseada em Evidências , Estudos de Viabilidade , Humanos , Armazenamento e Recuperação da Informação/métodos , Assistência ao Paciente
15.
J Biomed Inform ; 57: 436-45, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26363352

RESUMO

OBJECTIVE: Literature database search is a crucial step in the development of clinical practice guidelines and systematic reviews. In the age of information technology, the process of literature search is still conducted manually, therefore it is costly, slow and subject to human errors. In this research, we sought to improve the traditional search approach using innovative query expansion and citation ranking approaches. METHODS: We developed a citation retrieval system composed of query expansion and citation ranking methods. The methods are unsupervised and easily integrated over the PubMed search engine. To validate the system, we developed a gold standard consisting of citations that were systematically searched and screened to support the development of cardiovascular clinical practice guidelines. The expansion and ranking methods were evaluated separately and compared with baseline approaches. RESULTS: Compared with the baseline PubMed expansion, the query expansion algorithm improved recall (80.2% vs. 51.5%) with small loss on precision (0.4% vs. 0.6%). The algorithm could find all citations used to support a larger number of guideline recommendations than the baseline approach (64.5% vs. 37.2%, p<0.001). In addition, the citation ranking approach performed better than PubMed's "most recent" ranking (average precision +6.5%, recall@k +21.1%, p<0.001), PubMed's rank by "relevance" (average precision +6.1%, recall@k +14.8%, p<0.001), and the machine learning classifier that identifies scientifically sound studies from MEDLINE citations (average precision +4.9%, recall@k +4.2%, p<0.001). CONCLUSIONS: Our unsupervised query expansion and ranking techniques are more flexible and effective than PubMed's default search engine behavior and the machine learning classifier. Automated citation finding is promising to augment the traditional literature search.


Assuntos
Algoritmos , Armazenamento e Recuperação da Informação , MEDLINE , PubMed , Bases de Dados Factuais , Humanos , Aprendizado de Máquina , Guias de Prática Clínica como Assunto , Ferramenta de Busca
16.
J Biomed Inform ; 58 Suppl: S120-S127, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26209007

RESUMO

This paper describes the use of an agile text mining platform (Linguamatics' Interactive Information Extraction Platform, I2E) to extract document-level cardiac risk factors in patient records as defined in the i2b2/UTHealth 2014 challenge. The approach uses a data-driven rule-based methodology with the addition of a simple supervised classifier. We demonstrate that agile text mining allows for rapid optimization of extraction strategies, while post-processing can leverage annotation guidelines, corpus statistics and logic inferred from the gold standard data. We also show how data imbalance in a training set affects performance. Evaluation of this approach on the test data gave an F-Score of 91.7%, one percent behind the top performing system.


Assuntos
Doenças Cardiovasculares/epidemiologia , Mineração de Dados/métodos , Complicações do Diabetes/epidemiologia , Registros Eletrônicos de Saúde/organização & administração , Narração , Processamento de Linguagem Natural , Idoso , Doenças Cardiovasculares/diagnóstico , Estudos de Coortes , Comorbidade , Segurança Computacional , Confidencialidade , Complicações do Diabetes/diagnóstico , Feminino , Humanos , Incidência , Estudos Longitudinais , Masculino , Pessoa de Meia-Idade , Reconhecimento Automatizado de Padrão/métodos , Medição de Risco/métodos , Reino Unido/epidemiologia , Vocabulário Controlado
17.
Syst Rev ; 4: 78, 2015 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-26073888

RESUMO

BACKGROUND: Automation of the parts of systematic review process, specifically the data extraction step, may be an important strategy to reduce the time necessary to complete a systematic review. However, the state of the science of automatically extracting data elements from full texts has not been well described. This paper performs a systematic review of published and unpublished methods to automate data extraction for systematic reviews. METHODS: We systematically searched PubMed, IEEEXplore, and ACM Digital Library to identify potentially relevant articles. We included reports that met the following criteria: 1) methods or results section described what entities were or need to be extracted, and 2) at least one entity was automatically extracted with evaluation results that were presented for that entity. We also reviewed the citations from included reports. RESULTS: Out of a total of 1190 unique citations that met our search criteria, we found 26 published reports describing automatic extraction of at least one of more than 52 potential data elements used in systematic reviews. For 25 (48 %) of the data elements used in systematic reviews, there were attempts from various researchers to extract information automatically from the publication text. Out of these, 14 (27 %) data elements were completely extracted, but the highest number of data elements extracted automatically by a single study was 7. Most of the data elements were extracted with F-scores (a mean of sensitivity and positive predictive value) of over 70 %. CONCLUSIONS: We found no unified information extraction framework tailored to the systematic review process, and published reports focused on a limited (1-7) number of data elements. Biomedical natural language processing techniques have not been fully utilized to fully or even partially automate the data extraction step of systematic reviews.


Assuntos
Mineração de Dados/métodos , Editoração , Literatura de Revisão como Assunto , Humanos , Armazenamento e Recuperação da Informação , Relatório de Pesquisa
18.
AMIA Annu Symp Proc ; 2015: 2015-24, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26958301

RESUMO

OBJECTIVE: In a previous study, we investigated a sentence classification model that uses semantic features to extract clinically useful sentences from UpToDate, a synthesized clinical evidence resource. In the present study, we assess the generalizability of the sentence classifier to Medline abstracts. METHODS: We applied the classification model to an independent gold standard of high quality clinical studies from Medline. Then, the classifier trained on UpToDate sentences was optimized by re-retraining the classifier with Medline abstracts and adding a sentence location feature. RESULTS: The previous classifier yielded an F-measure of 58% on Medline versus 67% on UpToDate. Re-training the classifier on Medline improved F-measure to 68%; and to 76% (p<0.01) after adding the sentence location feature. CONCLUSIONS: The classifier's model and input features generalized to Medline abstracts, but the classifier needed to be retrained on Medline to achieve equivalent performance. Sentence location provided additional contribution to the overall classification performance.


Assuntos
MEDLINE , Semântica , Humanos , Aprendizado de Máquina
19.
J Biomed Inform ; 52: 457-67, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25016293

RESUMO

OBJECTIVE: The amount of information for clinicians and clinical researchers is growing exponentially. Text summarization reduces information as an attempt to enable users to find and understand relevant source texts more quickly and effortlessly. In recent years, substantial research has been conducted to develop and evaluate various summarization techniques in the biomedical domain. The goal of this study was to systematically review recent published research on summarization of textual documents in the biomedical domain. MATERIALS AND METHODS: MEDLINE (2000 to October 2013), IEEE Digital Library, and the ACM digital library were searched. Investigators independently screened and abstracted studies that examined text summarization techniques in the biomedical domain. Information is derived from selected articles on five dimensions: input, purpose, output, method and evaluation. RESULTS: Of 10,786 studies retrieved, 34 (0.3%) met the inclusion criteria. Natural language processing (17; 50%) and a hybrid technique comprising of statistical, Natural language processing and machine learning (15; 44%) were the most common summarization approaches. Most studies (28; 82%) conducted an intrinsic evaluation. DISCUSSION: This is the first systematic review of text summarization in the biomedical domain. The study identified research gaps and provides recommendations for guiding future research on biomedical text summarization. CONCLUSION: Recent research has focused on a hybrid technique comprising statistical, language processing and machine learning techniques. Further research is needed on the application and evaluation of text summarization in real research or patient care settings.


Assuntos
Inteligência Artificial , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Indexação e Redação de Resumos , Humanos , MEDLINE
20.
Methods Mol Biol ; 1159: 147-57, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24788266

RESUMO

The combination of scientific knowledge and experience is the key success for biomedical research. This chapter demonstrates some of the strategies used to help in identifying key opinion leaders with the expertise you need, thus enabling an effort to increase collaborative biomedical research.


Assuntos
Pesquisa Biomédica , Prova Pericial , Processamento de Linguagem Natural , Apoio Social
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...