Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 20
Filter
1.
Learn Health Syst ; 4(4): e10243, 2020 Oct.
Article in English | MEDLINE | ID: mdl-33083542

ABSTRACT

OBJECTIVES: To develop and evaluate the classification accuracy of a computable phenotype for pediatric Crohn's disease using electronic health record data from PEDSnet, a large, multi-institutional research network and Learning Health System. STUDY DESIGN: Using clinician and informatician input, algorithms were developed using combinations of diagnostic and medication data drawn from the PEDSnet clinical dataset which is comprised of 5.6 million children from eight U.S. academic children's health systems. Six test algorithms (four cases, two non-cases) that combined use of specific medications for Crohn's disease plus the presence of Crohn's diagnosis were initially tested against the entire PEDSnet dataset. From these, three were selected for performance assessment using manual chart review (primary case algorithm, n = 360, primary non-case algorithm, n = 360, and alternative case algorithm, n = 80). Non-cases were patients having gastrointestinal diagnoses other than inflammatory bowel disease. Sensitivity, specificity, and positive predictive value (PPV) were assessed for the primary case and primary non-case algorithms. RESULTS: Of the six algorithms tested, the least restrictive algorithm requiring just ≥1 Crohn's diagnosis code yielded 11 950 cases across PEDSnet (prevalence 21/10 000). The most restrictive algorithm requiring ≥3 Crohn's disease diagnoses plus at least one medication yielded 7868 patients (prevalence 14/10 000). The most restrictive algorithm had the highest PPV (95%) and high sensitivity (91%) and specificity (94%). False positives were due primarily to a diagnosis reversal (from Crohn's disease to ulcerative colitis) or having a diagnosis of "indeterminate colitis." False negatives were rare. CONCLUSIONS: Using diagnosis codes and medications available from PEDSnet, we developed a computable phenotype for pediatric Crohn's disease that had high specificity, sensitivity and predictive value. This process will be of use for developing computable phenotypes for other pediatric diseases, to facilitate cohort identification for retrospective and prospective studies, and to optimize clinical care through the PEDSnet Learning Health System.

2.
EGEMS (Wash DC) ; 7(1): 36, 2019 Aug 01.
Article in English | MEDLINE | ID: mdl-31531382

ABSTRACT

BACKGROUND: Clinical data research networks (CDRNs) aggregate electronic health record data from multiple hospitals to enable large-scale research. A critical operation toward building a CDRN is conducting continual evaluations to optimize data quality. The key challenges include determining the assessment coverage on big datasets, handling data variability over time, and facilitating communication with data teams. This study presents the evolution of a systematic workflow for data quality assessment in CDRNs. IMPLEMENTATION: Using a specific CDRN as use case, the workflow was iteratively developed and packaged into a toolkit. The resultant toolkit comprises 685 data quality checks to identify any data quality issues, procedures to reconciliate with a history of known issues, and a contemporary GitHub-based reporting mechanism for organized tracking. RESULTS: During the first two years of network development, the toolkit assisted in discovering over 800 data characteristics and resolving over 1400 programming errors. Longitudinal analysis indicated that the variability in time to resolution (15day mean, 24day IQR) is due to the underlying cause of the issue, perceived importance of the domain, and the complexity of assessment. CONCLUSIONS: In the absence of a formalized data quality framework, CDRNs continue to face challenges in data management and query fulfillment. The proposed data quality toolkit was empirically validated on a particular network, and is publicly available for other networks. While the toolkit is user-friendly and effective, the usage statistics indicated that the data quality process is very time-intensive and sufficient resources should be dedicated for investigating problems and optimizing data for research.

3.
EGEMS (Wash DC) ; 7(1): 17, 2019 Apr 23.
Article in English | MEDLINE | ID: mdl-31065558

ABSTRACT

INTRODUCTION: In aggregate, existing data quality (DQ) checks are currently represented in heterogeneous formats, making it difficult to compare, categorize, and index checks. This study contributes a data element-function conceptual model to facilitate the categorization and indexing of DQ checks and explores the feasibility of leveraging natural language processing (NLP) for scalable acquisition of knowledge of common data elements and functions from DQ checks narratives. METHODS: The model defines a "data element", the primary focus of the check, and a "function", the qualitative or quantitative measure over a data element. We applied NLP techniques to extract both from 172 checks for Observational Health Data Sciences and Informatics (OHDSI) and 3,434 checks for Kaiser Permanente's Center for Effectiveness and Safety Research (CESR). RESULTS: The model was able to classify all checks. A total of 751 unique data elements and 24 unique functions were extracted. The top five frequent data element-function pairings for OHDSI were Person-Count (55 checks), Insurance-Distribution (17), Medication-Count (16), Condition-Count (14), and Observations-Count (13); for CESR, they were Medication-Variable Type (175), Medication-Missing (172), Medication-Existence (152), Medication-Count (127), and Socioeconomic Factors-Variable Type (114). CONCLUSIONS: This study shows the efficacy of the data element-function conceptual model for classifying DQ checks, demonstrates early promise of NLP-assisted knowledge acquisition, and reveals the great heterogeneity in the focus in DQ checks, confirming variation in intrinsic checks and use-case specific "fitness-for-use" checks.

4.
AMIA Jt Summits Transl Sci Proc ; 2017: 113-121, 2018.
Article in English | MEDLINE | ID: mdl-29888053

ABSTRACT

Clinical data research networks (CDRNs) invest substantially in identifying and investigating data quality problems. While identification is largely automated, the investigation and resolution are carried out manually at individual institutions. In the PEDSnet CDRN, we found that only approximately 35% of the identified data quality issues are resolvable as they are caused by errors in the extract-transform-load (ETL) code. Nonetheless, with no prior knowledge of issue causes, partner institutions end up spending significant time investigating issues that represent either inherent data characteristics or false alarms. This work investigates whether the causes (ETL, Characteristic, or False alarm) can be predicted before spending time investigating issues. We trained a classifier on the metadata from 10,281 real-world data quality issues, and achieved a cause prediction F1-measure of up to 90%. While initially tested on PEDSnet, the proposed methodology is applicable to other CDRNs facing similar bottlenecks in handling data quality results.

5.
Pediatr Dent ; 40(2): 131-135, 2018 Mar 15.
Article in English | MEDLINE | ID: mdl-29663914

ABSTRACT

PURPOSE: The purpose of this study was to assess whether there is an association between oral thrush or other Candida-related conditions in infancy and early childhood caries (ECC) diagnosed by pediatricians. METHODS: We conducted a retrospective cohort study using electronic health records from six national children's hospitals that participate in the PEDSnet research network. There were 1,012,668 children with a visit at ages one to 12 months and another visit at ages 13 to 71 months. The independent variables were diagnosis of thrush or Candida-related conditions in the first year of life, while the dependent variable was diagnosis of ECC between 13 to 71 months old. RESULTS: Oral thrush detection was strongly associated with ECC, particularly between 13 and 36 months (rate ratio between 2.7 [95 percent confidence interval (95% CI) equals 2.5 to 2.9; P<.001] and 3.0 [95% CI, equals 2.8 to 3.4; P<.001]). A similar trend was observed with other Candida-related conditions. CONCLUSIONS: Oral thrush may be a risk factor for early childhood caries.


Subject(s)
Candidiasis, Oral/complications , Dental Caries/etiology , Child, Preschool , Female , Humans , Infant , Male , Retrospective Studies , Risk Factors
6.
J Am Med Inform Assoc ; 24(6): 1072-1079, 2017 Nov 01.
Article in English | MEDLINE | ID: mdl-28398525

ABSTRACT

OBJECTIVE: PEDSnet is a clinical data research network (CDRN) that aggregates electronic health record data from multiple children's hospitals to enable large-scale research. Assessing data quality to ensure suitability for conducting research is a key requirement in PEDSnet. This study presents a range of data quality issues identified over a period of 18 months and interprets them to evaluate the research capacity of PEDSnet. MATERIALS AND METHODS: Results were generated by a semiautomated data quality assessment workflow. Two investigators reviewed programmatic data quality issues and conducted discussions with the data partners' extract-transform-load analysts to determine the cause for each issue. RESULTS: The results include a longitudinal summary of 2182 data quality issues identified across 9 data submission cycles. The metadata from the most recent cycle includes annotations for 850 issues: most frequent types, including missing data (>300) and outliers (>100); most complex domains, including medications (>160) and lab measurements (>140); and primary causes, including source data characteristics (83%) and extract-transform-load errors (9%). DISCUSSION: The longitudinal findings demonstrate the network's evolution from identifying difficulties with aligning the data to a common data model to learning norms in clinical pediatrics and determining research capability. CONCLUSION: While data quality is recognized as a critical aspect in establishing and utilizing a CDRN, the findings from data quality assessments are largely unpublished. This paper presents a real-world account of studying and interpreting data quality findings in a pediatric CDRN, and the lessons learned could be used by other CDRNs.


Subject(s)
Biomedical Research , Data Accuracy , Datasets as Topic/standards , Electronic Health Records/standards , Hospitals, Pediatric , Longitudinal Studies
7.
EGEMS (Wash DC) ; 5(1): 8, 2017 Jun 12.
Article in English | MEDLINE | ID: mdl-29881733

ABSTRACT

OBJECTIVE: To compare rule-based data quality (DQ) assessment approaches across multiple national clinical data sharing organizations. METHODS: Six organizations with established data quality assessment (DQA) programs provided documentation or source code describing current DQ checks. DQ checks were mapped to the categories within the data verification context of the harmonized DQA terminology. To ensure all DQ checks were consistently mapped, conventions were developed and four iterations of mapping performed. Difficult-to-map DQ checks were discussed with research team members until consensus was achieved. RESULTS: Participating organizations provided 11,026 DQ checks, of which 99.97 percent were successfully mapped to a DQA category. Of the mapped DQ checks (N=11,023), 214 (1.94 percent) mapped to multiple DQA categories. The majority of DQ checks mapped to Atemporal Plausibility (49.60 percent), Value Conformance (17.84 percent), and Atemporal Completeness (12.98 percent) categories. DISCUSSION: Using the common DQA terminology, near-complete (99.97 percent) coverage across a wide range of DQA programs and specifications was reached. Comparing the distributions of mapped DQ checks revealed important differences between participating organizations. This variation may be related to the organization's stakeholder requirements, primary analytical focus, or maturity of their DQA program. Not within scope, mapping checks within the data validation context of the terminology may provide additional insights into DQA practice differences. CONCLUSION: A common DQA terminology provides a means to help organizations and researchers understand the coverage of their current DQA efforts as well as highlight potential areas for additional DQA development. Sharing DQ checks between organizations could help expand the scope of DQA across clinical data networks.

8.
EGEMS (Wash DC) ; 4(1): 1239, 2016.
Article in English | MEDLINE | ID: mdl-28154833

ABSTRACT

INTRODUCTION: Data quality and fitness for analysis are crucial if outputs of analyses of electronic health record data or administrative claims data should be trusted by the public and the research community. METHODS: We describe a data quality analysis tool (called Achilles Heel) developed by the Observational Health Data Sciences and Informatics Collaborative (OHDSI) and compare outputs from this tool as it was applied to 24 large healthcare datasets across seven different organizations. RESULTS: We highlight 12 data quality rules that identified issues in at least 10 of the 24 datasets and provide a full set of 71 rules identified in at least one dataset. Achilles Heel is a freely available software that provides a useful starter set of data quality rules with the ability to add additional rules. We also present results of a structured email-based interview of all participating sites that collected qualitative comments about the value of Achilles Heel for data quality evaluation. DISCUSSION: Our analysis represents the first comparison of outputs from a data quality tool that implements a fixed (but extensible) set of data quality rules. Thanks to a common data model, we were able to compare quickly multiple datasets originating from several countries in America, Europe and Asia.

9.
Brief Bioinform ; 17(1): 23-32, 2016 Jan.
Article in English | MEDLINE | ID: mdl-25888696

ABSTRACT

The use of crowdsourcing to solve important but complex problems in biomedical and clinical sciences is growing and encompasses a wide variety of approaches. The crowd is diverse and includes online marketplace workers, health information seekers, science enthusiasts and domain experts. In this article, we review and highlight recent studies that use crowdsourcing to advance biomedicine. We classify these studies into two broad categories: (i) mining big data generated from a crowd (e.g. search logs) and (ii) active crowdsourcing via specific technical platforms, e.g. labor markets, wikis, scientific games and community challenges. Through describing each study in detail, we demonstrate the applicability of different methods in a variety of domains in biomedical research, including genomics, biocuration and clinical research. Furthermore, we discuss and highlight the strengths and limitations of different crowdsourcing platforms. Finally, we identify important emerging trends, opportunities and remaining challenges for future crowdsourcing research in biomedicine.


Subject(s)
Crowdsourcing/trends , Computational Biology/trends , Data Mining , Humans , Internet , Search Engine , Smartphone , Social Media , Video Games
10.
J Biomed Inform ; 57: 28-37, 2015 Oct.
Article in English | MEDLINE | ID: mdl-26187250

ABSTRACT

BACKGROUND: Identifying key variables such as disorders within the clinical narratives in electronic health records has wide-ranging applications within clinical practice and biomedical research. Previous research has demonstrated reduced performance of disorder named entity recognition (NER) and normalization (or grounding) in clinical narratives than in biomedical publications. In this work, we aim to identify the cause for this performance difference and introduce general solutions. METHODS: We use closure properties to compare the richness of the vocabulary in clinical narrative text to biomedical publications. We approach both disorder NER and normalization using machine learning methodologies. Our NER methodology is based on linear-chain conditional random fields with a rich feature approach, and we introduce several improvements to enhance the lexical knowledge of the NER system. Our normalization method - never previously applied to clinical data - uses pairwise learning to rank to automatically learn term variation directly from the training data. RESULTS: We find that while the size of the overall vocabulary is similar between clinical narrative and biomedical publications, clinical narrative uses a richer terminology to describe disorders than publications. We apply our system, DNorm-C, to locate disorder mentions and in the clinical narratives from the recent ShARe/CLEF eHealth Task. For NER (strict span-only), our system achieves precision=0.797, recall=0.713, f-score=0.753. For the normalization task (strict span+concept) it achieves precision=0.712, recall=0.637, f-score=0.672. The improvements described in this article increase the NER f-score by 0.039 and the normalization f-score by 0.036. We also describe a high recall version of the NER, which increases the normalization recall to as high as 0.744, albeit with reduced precision. DISCUSSION: We perform an error analysis, demonstrating that NER errors outnumber normalization errors by more than 4-to-1. Abbreviations and acronyms are found to be frequent causes of error, in addition to the mentions the annotators were not able to identify within the scope of the controlled vocabulary. CONCLUSION: Disorder mentions in text from clinical narratives use a rich vocabulary that results in high term variation, which we believe to be one of the primary causes of reduced performance in clinical narrative. We show that pairwise learning to rank offers high performance in this context, and introduce several lexical enhancements - generalizable to other clinical NER tasks - that improve the ability of the NER system to handle this variation. DNorm-C is a high performing, open source system for disorders in clinical text, and a promising step toward NER and normalization methods that are trainable to a wide variety of domains and entities. (DNorm-C is open source software, and is available with a trained model at the DNorm demonstration website: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/#DNorm.).


Subject(s)
Electronic Health Records , Machine Learning , Natural Language Processing , Vocabulary, Controlled , Humans , Narration , Software
11.
Article in English | MEDLINE | ID: mdl-25797061

ABSTRACT

Motivated by the high cost of human curation of biological databases, there is an increasing interest in using computational approaches to assist human curators and accelerate the manual curation process. Towards the goal of cataloging drug indications from FDA drug labels, we recently developed LabeledIn, a human-curated drug indication resource for 250 clinical drugs. Its development required over 40 h of human effort across 20 weeks, despite using well-defined annotation guidelines. In this study, we aim to investigate the feasibility of scaling drug indication annotation through a crowdsourcing technique where an unknown network of workers can be recruited through the technical environment of Amazon Mechanical Turk (MTurk). To translate the expert-curation task of cataloging indications into human intelligence tasks (HITs) suitable for the average workers on MTurk, we first simplify the complex task such that each HIT only involves a worker making a binary judgment of whether a highlighted disease, in context of a given drug label, is an indication. In addition, this study is novel in the crowdsourcing interface design where the annotation guidelines are encoded into user options. For evaluation, we assess the ability of our proposed method to achieve high-quality annotations in a time-efficient and cost-effective manner. We posted over 3000 HITs drawn from 706 drug labels on MTurk. Within 8 h of posting, we collected 18 775 judgments from 74 workers, and achieved an aggregated accuracy of 96% on 450 control HITs (where gold-standard answers are known), at a cost of $1.75 per drug label. On the basis of these results, we conclude that our crowdsourcing approach not only results in significant cost and time saving, but also leads to accuracy comparable to that of domain experts.


Subject(s)
Crowdsourcing , Data Curation/methods , Databases, Pharmaceutical , Pharmaceutical Preparations , Drug Labeling , Humans , Pharmaceutical Preparations/chemistry , Pharmaceutical Preparations/classification
12.
Article in English | MEDLINE | ID: mdl-25246425

ABSTRACT

BACKGROUND: This article describes capture of biological information using a hybrid approach that combines natural language processing to extract biological entities and crowdsourcing with annotators recruited via Amazon Mechanical Turk to judge correctness of candidate biological relations. These techniques were applied to extract gene- mutation relations from biomedical abstracts with the goal of supporting production scale capture of gene-mutation-disease findings as an open source resource for personalized medicine. RESULTS: The hybrid system could be configured to provide good performance for gene-mutation extraction (precision ∼82%; recall ∼70% against an expert-generated gold standard) at a cost of $0.76 per abstract. This demonstrates that crowd labor platforms such as Amazon Mechanical Turk can be used to recruit quality annotators, even in an application requiring subject matter expertise; aggregated Turker judgments for gene-mutation relations exceeded 90% accuracy. Over half of the precision errors were due to mismatches against the gold standard hidden from annotator view (e.g., incorrect EntrezGene identifier or incorrect mutation position extracted), or incomplete task instructions (e.g., the need to exclude nonhuman mutations). CONCLUSIONS: The hybrid curation model provides a readily scalable cost-effective approach to curation, particularly if coupled with expert human review to filter precision errors. We plan to generalize the framework and make it available as open source software. DATABASE URL: http://www.mitre.org/publications/technical-papers/hybrid-curation-of-gene-mutation-relations-combining-automated.


Subject(s)
Crowdsourcing/methods , Data Curation/methods , Genetic Predisposition to Disease , Information Storage and Retrieval/methods , Mutation/genetics , Natural Language Processing , Computational Biology/methods , Crowdsourcing/economics , Data Curation/economics , Databases, Genetic , Genomics , Humans
13.
J Biomed Inform ; 52: 448-56, 2014 Dec.
Article in English | MEDLINE | ID: mdl-25220766

ABSTRACT

Drug-disease treatment relationships, i.e., which drug(s) are indicated to treat which disease(s), are among the most frequently sought information in PubMed®. Such information is useful for feeding the Google Knowledge Graph, designing computational methods to predict novel drug indications, and validating clinical information in EMRs. Given the importance and utility of this information, there have been several efforts to create repositories of drugs and their indications. However, existing resources are incomplete. Furthermore, they neither label indications in a structured way nor differentiate them by drug-specific properties such as dosage form, and thus do not support computer processing or semantic interoperability. More recently, several studies have proposed automatic methods to extract structured indications from drug descriptions; however, their performance is limited by natural language challenges in disease named entity recognition and indication selection. In response, we report LabeledIn: a human-reviewed, machine-readable and source-linked catalog of labeled indications for human drugs. More specifically, we describe our semi-automatic approach to derive LabeledIn from drug descriptions through human annotations with aids from automatic methods. As the data source, we use the drug labels (or package inserts) submitted to the FDA by drug manufacturers and made available in DailyMed. Our machine-assisted human annotation workflow comprises: (i) a grouping method to remove redundancy and identify representative drug labels to be used for human annotation, (ii) an automatic method to recognize and normalize mentions of diseases in drug labels as candidate indications, and (iii) a two-round annotation workflow for human experts to judge the pre-computed candidates and deliver the final gold standard. In this study, we focused on 250 highly accessed drugs in PubMed Health, a newly developed public web resource for consumers and clinicians on prevention and treatment of diseases. These 250 drugs corresponded to more than 8000 drug labels (500 unique) in DailyMed in which 2950 candidate indications were pre-tagged by an automatic tool. After being reviewed independently by two experts, 1618 indications were selected, and additional 97 (missed by computer) were manually added, with an inter-annotator agreement of 88.35% as measured by the Kappa coefficient. Our final annotation results in LabeledIn consist of 7805 drug-disease treatment relationships where drugs are represented as a triplet of ingredient, dose form, and strength. A systematic comparison of LabeledIn with an existing computer-derived resource revealed significant discrepancies, confirming the need to involve humans in the creation of such a resource. In addition, LabeledIn is unique in that it contains detailed textual context of the selected indications in drug labels, making it suitable for the development of advanced computational methods for the automatic extraction of indications from free text. Finally, motivated by the studies on drug nomenclature and medication errors in EMRs, we adopted a fine-grained drug representation scheme, which enables the automatic identification of drugs with indications specific to certain dose forms or strengths. Future work includes expanding our coverage to more drugs and integration with other resources. The LabeledIn dataset and the annotation guidelines are available at http://ftp.ncbi.nlm.nih.gov/pub/lu/LabeledIn/.


Subject(s)
Drug Labeling/methods , Natural Language Processing , Documentation , Drug Therapy/classification , Humans , Software
14.
Article in English | MEDLINE | ID: mdl-24980129

ABSTRACT

BioC is a new simple XML format for sharing biomedical text and annotations and libraries to read and write that format. This promotes the development of interoperable tools for natural language processing (NLP) of biomedical text. The interoperability track at the BioCreative IV workshop featured contributions using or highlighting the BioC format. These contributions included additional implementations of BioC, many new corpora in the format, biomedical NLP tools consuming and producing the format and online services using the format. The ease of use, broad support and rapidly growing number of tools demonstrate the need for and value of the BioC format. Database URL: http://bioc.sourceforge.net/.


Subject(s)
Computational Biology , Data Mining , Natural Language Processing , Software , Biomedical Research , Database Management Systems , Databases, Factual , Internet
15.
Article in English | MEDLINE | ID: mdl-25062914

ABSTRACT

The lack of interoperability among biomedical text-mining tools is a major bottleneck in creating more complex applications. Despite the availability of numerous methods and techniques for various text-mining tasks, combining different tools requires substantial efforts and time owing to heterogeneity and variety in data formats. In response, BioC is a recent proposal that offers a minimalistic approach to tool interoperability by stipulating minimal changes to existing tools and applications. BioC is a family of XML formats that define how to present text documents and annotations, and also provides easy-to-use functions to read/write documents in the BioC format. In this study, we introduce our text-mining toolkit, which is designed to perform several challenging and significant tasks in the biomedical domain, and repackage the toolkit into BioC to enhance its interoperability. Our toolkit consists of six state-of-the-art tools for named-entity recognition, normalization and annotation (PubTator) of genes (GenNorm), diseases (DNorm), mutations (tmVar), species (SR4GN) and chemicals (tmChem). Although developed within the same group, each tool is designed to process input articles and output annotations in a different format. We modify these tools and enable them to read/write data in the proposed BioC format. We find that, using the BioC family of formats and functions, only minimal changes were required to build the newer versions of the tools. The resulting BioC wrapped toolkit, which we have named tmBioC, consists of our tools in BioC, an annotated full-text corpus in BioC, and a format detection and conversion tool. Furthermore, through participation in the 2013 BioCreative IV Interoperability Track, we empirically demonstrate that the tools in tmBioC can be more efficiently integrated with each other as well as with external tools: Our experimental results show that using BioC reduces >60% in lines of code for text-mining tool integration. The tmBioC toolkit is publicly available at http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/. Database URL: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/.


Subject(s)
Computational Biology/methods , Data Curation/methods , Data Mining , Internet , Software
16.
Methods Mol Biol ; 1159: 11-31, 2014.
Article in English | MEDLINE | ID: mdl-24788259

ABSTRACT

Biomedical and life sciences literature is unique because of its exponentially increasing volume and interdisciplinary nature. Biomedical literature access is essential for several types of users including biomedical researchers, clinicians, database curators, and bibliometricians. In the past few decades, several online search tools and literature archives, generic as well as biomedicine specific, have been developed. We present this chapter in the light of three consecutive steps of literature access: searching for citations, retrieving full text, and viewing the article. The first section presents the current state of practice of biomedical literature access, including an analysis of the search tools most frequently used by the users, including PubMed, Google Scholar, Web of Science, Scopus, and Embase, and a study on biomedical literature archives such as PubMed Central. The next section describes current research and the state-of-the-art systems motivated by the challenges a user faces during query formulation and interpretation of search results. The research solutions are classified into five key areas related to text and data mining, text similarity search, semantic search, query support, relevance ranking, and clustering results. Finally, the last section describes some predicted future trends for improving biomedical literature access, such as searching and reading articles on portable devices, and adoption of the open access policy.


Subject(s)
Data Mining/methods , Periodicals as Topic , PubMed
17.
AMIA Annu Symp Proc ; 2014: 787-94, 2014.
Article in English | MEDLINE | ID: mdl-25954385

ABSTRACT

Extracting computable indications, i.e. drug-disease treatment relationships, from narrative drug resources is the key for building a gold standard drug indication repository. The two steps to the extraction problem are disease named-entity recognition (NER) to identify disease mentions from a free-text description and disease classification to distinguish indications from other disease mentions in the description. While there exist many tools for disease NER, disease classification is mostly achieved through human annotations. For example, we recently resorted to human annotations to prepare a corpus, LabeledIn, capturing structured indications from the drug labels submitted to FDA by pharmaceutical companies. In this study, we present an automatic end-to-end framework to extract structured and normalized indications from FDA drug labels. In addition to automatic disease NER, a key component of our framework is a machine learning method that is trained on the LabeledIn corpus to classify the NER-computed disease mentions as "indication vs. non-indication." Through experiments with 500 drug labels, our end-to-end system delivered 86.3% F1-measure in drug indication extraction, with 17% improvement over baseline. Further analysis shows that the indication classifier delivers a performance comparable to human experts and that the remaining errors are mostly due to disease NER (more than 50%). Given its performance, we conclude that our end-to-end approach has the potential to significantly reduce human annotation costs.


Subject(s)
Artificial Intelligence , Disease/classification , Drug Labeling , Information Storage and Retrieval/methods , Humans , United States , United States Food and Drug Administration
18.
Dig Surg ; 22(6): 446-51; discussion 452, 2005.
Article in English | MEDLINE | ID: mdl-16479114

ABSTRACT

BACKGROUND/AIMS: External pancreatic fistula (EPF) is a common sequel to surgical or percutaneous intervention for infective complications of acute severe pancreatitis. The present study was aimed at studying the clinical profile, course and outcome of patients with EPF following surgical or percutaneous management of these infective complications. METHODS: A retrospective analysis of clinical data of patients with EPF following intervention (surgical or percutaneous) for acute severe pancreatitis managed between January 1989 and April 2002 recorded on a prospective database was done. Univariate analysis of various factors (etiology, imaging findings prior to intervention, fistula characteristics and management) that could predict early closure of fistula was performed. RESULTS: Of 210 patients with acute severe pancreatitis, 43 (20%) patients developed EPF (mean age 38 (range 16-78) years, M:F ratio 5:1) following intervention for infected pancreatic necrosis (n=23) and pancreatic abscess (n=20) and constituted the study group. The fistula output was categorized as low (<200 ml), moderate (200-500 ml) and high (>500 ml) in 29 (67%), 11 (26%) and 3 (7%) patients, respectively. Fifteen patients (35%) had morbidity in the form of abscess (n=5), bleeding (n=1), pseudoaneurysm (n=2) and fever with no other focus of infection (n=7). Spontaneous closure of the fistula occurred in 38 (88%) patients. The average time to closure of fistula was 109+/- 26 (median 70) days. Fistula closed after intervention in 5 patients (2 after endoscopic papillotomy, 1 after fistulojejunostomy and 2 after downsizing the drains). Of the 38 patients with spontaneous closure, 9 (24%) patients developed a pseudocyst after a mean interval of 123 days of which 7 underwent surgical drainage of the cyst. Univariate analysis of various factors (etiology, imaging findings prior to intervention, fistula characteristics and management) failed to identify any factors that could predict early closure of fistula. CONCLUSIONS: EPF is a common sequel following intervention in acute severe pancreatitis. The majority of these are low output fistulae and close spontaneously with conservative management. One-fourth of patients with spontaneous closure develop a pseudocyst as a sequel, requiring surgical management.


Subject(s)
Cutaneous Fistula/etiology , Pancreatic Fistula/etiology , Pancreatitis, Acute Necrotizing/therapy , Adolescent , Adult , Aged , Cutaneous Fistula/surgery , Female , Humans , Male , Middle Aged , Pancreatic Fistula/surgery , Pancreatic Pseudocyst/etiology , Pancreatitis, Acute Necrotizing/surgery , Postoperative Complications , Retrospective Studies , Time Factors
19.
J Gastroenterol Hepatol ; 20(1): 56-61, 2005 Jan.
Article in English | MEDLINE | ID: mdl-15610447

ABSTRACT

BACKGROUND: Patients with long-standing extrahepatic portal venous obstruction (EHPVO) develop extensive collaterals in the hepatoduodenal ligament as a result of enlargement of the periportal veins. These patients are also prone to develop obstructive jaundice as a result of strictures and/or choledocholithiasis. Surgical management of obstructive jaundice in such patients becomes difficult in the presence of these collaterals. AIM: To review the approach to management of patients with EHPVO and obstructive jaundice. METHODS: Retrospective review of patients with EHPVO and obstructive jaundice requiring surgical and/or endoscopic management between 1992 and 2002. RESULTS: Thirteen patients (nine males, aged 12-50 years) with EHPVO and obstructive jaundice were evaluated. No patient had underlying cirrhosis or hepatocellular carcinoma. Five patients (group A) had biliary stricture; three (group B) had choledocholithiasis; and five (group C) had biliary stricture with choledocholithiasis. Primary surgical management was performed in group A (portosystemic shunt in four-strictures resolved in three; hepaticojejunostomy in one). In group B (n = 3) endoscopic stone extraction was successful in two patients. One patient underwent staged procedure (portosystemic shunt followed by biliary surgery). In group C, initial endoscopic management failed in four patients in whom it was attempted. All five patients thereafter underwent surgery (staged procedure, one; choledochoduodenostomy, one; devascularization, one; abandoned, two). Repeat postoperative endoscopic management was successful in two of the group C patients. Overall (group B and C), massive intraoperative hemorrhage occurred in three patients (one died). Postoperative hemorrhage occurred in one patient. CONCLUSION: In patients with EHPVO and obstructive jaundice, primary biliary tract surgery has significant morbidity and mortality. Endoscopic management should be the preferred modality. In patients with endoscopic failure, a staged procedure (portosystemic shunt followed by biliary surgery) should be preferred. Strictures alone may resolve after a portosystemic shunt. Endoscopic stenting may be required as an adjunct.


Subject(s)
Jaundice, Obstructive/therapy , Portal Vein , Adolescent , Adult , Child , Endoscopy , Female , Humans , Male , Middle Aged , Retrospective Studies , Vascular Diseases/therapy
20.
J Hepatobiliary Pancreat Surg ; 11(1): 40-4, 2004.
Article in English | MEDLINE | ID: mdl-15754045

ABSTRACT

BACKGROUND/PURPOSE: Laparoscopic cholecystectomy is the procedure of choice for patients with symptomatic cholelithiasis. This procedure is contraindicated in patients with gall-bladder cancer (GBC) because of fear of dissemination of the disease. One of the findings raising the suspicion of GBC is a thick-walled gallbladder (TWGB). METHODS: A prospective study of patients with TWGB was done over a period of 10 months at a tertiary-level referral hospital in northern India. We studied the clinical profiles, investigations (ultrasound [US] and computerized tomography [CT]) and management plans in these patients. RESULTS: A total of 60 patients were included in the study. After cholecystectomy, histopathology of gallbladders showed GBC in 2 (3.3%) patients. The remaining 58 patients had chronic cholecystitis, of whom 28 (48%) had xanthogranulomatous variant chronic cholecystitis. Cholecystectomy by the laparoscopic method was attempted in 46 (77%) patients and by open technique in the remaining 14 (23%) patients. Laparoscopic cholecystectomy was successful in 40 of the 46 (87%) patients in whom it was attempted. Obscure anatomy, suspicion of GBC, and bile duct injury were the causes of conversion, in the remaining 13% (6/46). None of the 11 patients who had a CT examination because of clinical or US suspicion of malignancy turned out to have GBC at final histology. Both the cases of GBC in this study were incidental findings on final histopathology. CONCLUSIONS: Laparoscopic cholecystectomy can be successfully performed in the majority of patients with diffuse TWGB, with appropriate selection. There is, however, an increased chance of conversion to open cholecystectomy in these patients. If there is an intraoperative suspicion of GBC, early conversion to open cholecystectomy and frozen section/imprint cytology will help to decide the further treatment during surgery.


Subject(s)
Cholecystectomy, Laparoscopic , Gallbladder/pathology , Adolescent , Adult , Aged , Aged, 80 and over , Cholecystectomy , Cholelithiasis/surgery , Contraindications , Frozen Sections , Gallbladder/diagnostic imaging , Gallbladder Neoplasms/surgery , Humans , Male , Middle Aged , Prospective Studies , Ultrasonography
SELECTION OF CITATIONS
SEARCH DETAIL
...