Search | VHL Regional Portal

Cost-aware active learning for named entity recognition in clinical text.

Wei, Qiang; Chen, Yukun; Salimi, Mandana; Denny, Joshua C; Mei, Qiaozhu; Lasko, Thomas A; Chen, Qingxia; Wu, Stephen; Franklin, Amy; Cohen, Trevor; Xu, Hua.

J Am Med Inform Assoc ; 26(11): 1314-1322, 2019 11 01.

Article in English | MEDLINE | ID: mdl-31294792

ABSTRACT

OBJECTIVE: Active Learning (AL) attempts to reduce annotation cost (ie, time) by selecting the most informative examples for annotation. Most approaches tacitly (and unrealistically) assume that the cost for annotating each sample is identical. This study introduces a cost-aware AL method, which simultaneously models both the annotation cost and the informativeness of the samples and evaluates both via simulation and user studies. MATERIALS AND METHODS: We designed a novel, cost-aware AL algorithm (Cost-CAUSE) for annotating clinical named entities; we first utilized lexical and syntactic features to estimate annotation cost, then we incorporated this cost measure into an existing AL algorithm. Using the 2010 i2b2/VA data set, we then conducted a simulation study comparing Cost-CAUSE with noncost-aware AL methods, and a user study comparing Cost-CAUSE with passive learning. RESULTS: Our cost model fit empirical annotation data well, and Cost-CAUSE increased the simulation area under the learning curve (ALC) scores by up to 5.6% and 4.9%, compared with random sampling and alternate AL methods. Moreover, in a user annotation task, Cost-CAUSE outperformed passive learning on the ALC score and reduced annotation time by 20.5%-30.2%. DISCUSSION: Although AL has proven effective in simulations, our user study shows that a real-world environment is far more complex. Other factors have a noticeable effect on the AL method, such as the annotation accuracy of users, the tiredness of users, and even the physical and mental condition of users. CONCLUSION: Cost-CAUSE saves significant annotation cost compared to random sampling.

Subject(s)

Algorithms , Electronic Health Records/economics , Information Storage and Retrieval/economics , Natural Language Processing , Big Data , Computer Simulation , Humans , Models, Economic

Adherence to recommended electronic health record safety practices across eight health care organizations.

Sittig, Dean F; Salimi, Mandana; Aiyagari, Ranjit; Banas, Colin; Clay, Brian; Gibson, Kathryn A; Goel, Ashutosh; Hines, Robert; Longhurst, Christopher A; Mishra, Vimal; Sirajuddin, Anwar M; Satterly, Tyler; Singh, Hardeep.

J Am Med Inform Assoc ; 25(7): 913-918, 2018 07 01.

Article in English | MEDLINE | ID: mdl-29701854

ABSTRACT

Objective: The Safety Assurance Factors for EHR Resilience (SAFER) guides were released in 2014 to help health systems conduct proactive risk assessment of electronic health record (EHR)- safety related policies, processes, procedures, and configurations. The extent to which SAFER recommendations are followed is unknown. Methods: We conducted risk assessments of 8 organizations of varying size, complexity, EHR, and EHR adoption maturity. Each organization self-assessed adherence to all 140 unique SAFER recommendations contained within 9 guides (range 10-29 recommendations per guide). In each guide, recommendations were organized into 3 broad domains: "safe health IT" (total 45 recommendations); "using health IT safely" (total 80 recommendations); and "monitoring health IT" (total 15 recommendations). Results: The 8 sites fully implemented 25 of 140 (18%) SAFER recommendations. Mean number of "fully implemented" recommendations per guide ranged from 94% (System Interfaces-18 recommendations) to 63% (Clinical Communication-12 recommendations). Adherence was higher for "safe health IT" domain (82.1%) vs "using health IT safely" (72.5%) and "monitoring health IT" (67.3%). Conclusions: Despite availability of recommendations on how to improve use of EHRs, most recommendations were not fully implemented. New national policy initiatives are needed to stimulate implementation of these best practices.

Subject(s)

Electronic Health Records/standards , Guideline Adherence , Health Facility Administration/standards , Guidelines as Topic , Humans , Organizational Policy , Patient Safety/standards , Quality Assurance, Health Care , Risk Assessment , United States

DataMed - an open source discovery index for finding biomedical datasets.

Chen, Xiaoling; Gururaj, Anupama E; Ozyurt, Burak; Liu, Ruiling; Soysal, Ergin; Cohen, Trevor; Tiryaki, Firat; Li, Yueling; Zong, Nansu; Jiang, Min; Rogith, Deevakar; Salimi, Mandana; Kim, Hyeon-Eui; Rocca-Serra, Philippe; Gonzalez-Beltran, Alejandra; Farcas, Claudiu; Johnson, Todd; Margolis, Ron; Alter, George; Sansone, Susanna-Assunta; Fore, Ian M; Ohno-Machado, Lucila; Grethe, Jeffrey S; Xu, Hua.

J Am Med Inform Assoc ; 25(3): 300-308, 2018 Mar 01.

Article in English | MEDLINE | ID: mdl-29346583

ABSTRACT

OBJECTIVE: Finding relevant datasets is important for promoting data reuse in the biomedical domain, but it is challenging given the volume and complexity of biomedical data. Here we describe the development of an open source biomedical data discovery system called DataMed, with the goal of promoting the building of additional data indexes in the biomedical domain. MATERIALS AND METHODS: DataMed, which can efficiently index and search diverse types of biomedical datasets across repositories, is developed through the National Institutes of Health-funded biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) consortium. It consists of 2 main components: (1) a data ingestion pipeline that collects and transforms original metadata information to a unified metadata model, called DatA Tag Suite (DATS), and (2) a search engine that finds relevant datasets based on user-entered queries. In addition to describing its architecture and techniques, we evaluated individual components within DataMed, including the accuracy of the ingestion pipeline, the prevalence of the DATS model across repositories, and the overall performance of the dataset retrieval engine. RESULTS AND CONCLUSION: Our manual review shows that the ingestion pipeline could achieve an accuracy of 90% and core elements of DATS had varied frequency across repositories. On a manually curated benchmark dataset, the DataMed search engine achieved an inferred average precision of 0.2033 and a precision at 10 (P@10, the number of relevant results in the top 10 search results) of 0.6022, by implementing advanced natural language processing and terminology services. Currently, we have made the DataMed system publically available as an open source package for the biomedical community.

User needs analysis and usability assessment of DataMed - a biomedical data discovery index.

Dixit, Ram; Rogith, Deevakar; Narayana, Vidya; Salimi, Mandana; Gururaj, Anupama; Ohno-Machado, Lucila; Xu, Hua; Johnson, Todd R.

J Am Med Inform Assoc ; 25(3): 337-344, 2018 Mar 01.

Article in English | MEDLINE | ID: mdl-29202203

ABSTRACT

OBJECTIVE: To present user needs and usability evaluations of DataMed, a Data Discovery Index (DDI) that allows searching for biomedical data from multiple sources. MATERIALS AND METHODS: We conducted 2 phases of user studies. Phase 1 was a user needs analysis conducted before the development of DataMed, consisting of interviews with researchers. Phase 2 involved iterative usability evaluations of DataMed prototypes. We analyzed data qualitatively to document researchers' information and user interface needs. RESULTS: Biomedical researchers' information needs in data discovery are complex, multidimensional, and shaped by their context, domain knowledge, and technical experience. User needs analyses validate the need for a DDI, while usability evaluations of DataMed show that even though aggregating metadata into a common search engine and applying traditional information retrieval tools are promising first steps, there remain challenges for DataMed due to incomplete metadata and the complexity of data discovery. DISCUSSION: Biomedical data poses distinct problems for search when compared to websites or publications. Making data available is not enough to facilitate biomedical data discovery: new retrieval techniques and user interfaces are necessary for dataset exploration. Consistent, complete, and high-quality metadata are vital to enable this process. CONCLUSION: While available data and researchers' information needs are complex and heterogeneous, a successful DDI must meet those needs and fit into the processes of biomedical researchers. Research directions include formalizing researchers' information needs, standardizing overviews of data to facilitate relevance judgments, implementing user interfaces for concept-based searching, and developing evaluation methods for open-ended discovery systems such as DDIs.

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL