Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
Add more filters










Database
Language
Publication year range
1.
J Biomed Inform ; 63: 295-306, 2016 10.
Article in English | MEDLINE | ID: mdl-27597572

ABSTRACT

In this paper, we present an automated method for taxonomy learning, focusing on concept formation and hierarchical relation learning. To infer such relations, we partition the extracted concepts and group them into closely-related clusters using Hierarchical Agglomerative Clustering, informed by syntactic matching and semantic relatedness functions. We introduce a novel, unsupervised method for cluster detection based on automated dendrogram pruning, which is dynamic to each partition. We evaluate our approach with two different types of textual corpora, clinical trials descriptions and MEDLINE publication abstracts. The results of several experiments indicate that our method is superior to existing dynamic pruning and the state-of-art taxonomy learning methods. It yields higher concept coverage (95.75%) and higher accuracy of learned taxonomic relations (up to 0.71 average precision and 0.96 average recall).


Subject(s)
Cluster Analysis , MEDLINE , Semantics , Unsupervised Machine Learning , Electronic Data Processing , Humans , Knowledge
2.
Int J Med Inform ; 91: 1-9, 2016 Jul.
Article in English | MEDLINE | ID: mdl-27185504

ABSTRACT

OBJECTIVES: The Patient, Intervention, Control/Comparison, and Outcome (PICO) framework is an effective technique for framing a clinical question. We aim to develop the counterpart of PICO to structure clinical research data needs. METHODS: We use a data-driven approach to abstracting key concepts representing clinical research data needs by adapting and extending an expert-derived framework originally developed for defining cancer research data needs. We annotated clinical trial eligibility criteria, EHR data request logs, and data queries to electronic health records (EHR), to extract and harmonize concept classes representing clinical research data needs. We evaluated the class coverage, class preservation from the original framework, schema generalizability, schema understandability, and schema structural correctness through a semi-structured interview with eight multidisciplinary domain experts. We iteratively refined the schema based on the evaluations. RESULTS: Our data-driven schema preserved 68% of the 63 classes from the original framework and covered 88% (73/82) of the classes proposed by evaluators. Class coverage for participants of different backgrounds ranged from 60% to 100% with a median value of 95% agreement among the individual evaluators. The schema was found understandable and structurally sound. CONCLUSIONS: Our proposed schema may serve as the counterpart to PICO for improving the research data needs communication between researchers and informaticians.


Subject(s)
Biomedical Research/methods , Data Collection/methods , Clinical Trials as Topic/methods , Comparative Effectiveness Research , Data Collection/standards , Humans , Models, Theoretical , Needs Assessment
3.
J Biomed Inform ; 61: 176-84, 2016 06.
Article in English | MEDLINE | ID: mdl-27067901

ABSTRACT

The worldwide adoption of electronic health records (EHR) promises to accelerate clinical research, which lies at the heart of medical advances. However, the interrogation of such Big Data by clinical researchers can be laborious and error-prone, involving iterative and ineffective communication of data requests to data analysts. Research on this communication process is rare. There also exists no contemporary system that offers intelligent solutions to assist clinical researchers in their quest for clinical data. In this article, we first provide a detailed characterization of the challenges encountered in this communication space. Second, we identify promising synergies between fields studying human-to-human and human-machine communication that can shed light on biomedical data query mediation. We propose a mixed-initiative dialog-based approach to support autonomous clinical data access and recommend needed technology development and communication study for accelerating clinical research.


Subject(s)
Electronic Health Records , Medical Informatics/trends , Research Personnel , Biomedical Research , Communication , Humans
4.
J Biomed Inform ; 60: 66-76, 2016 Apr.
Article in English | MEDLINE | ID: mdl-26820188

ABSTRACT

OBJECTIVE: To develop a multivariate method for quantifying the population representativeness across related clinical studies and a computational method for identifying and characterizing underrepresented subgroups in clinical studies. METHODS: We extended a published metric named Generalizability Index for Study Traits (GIST) to include multiple study traits for quantifying the population representativeness of a set of related studies by assuming the independence and equal importance among all study traits. On this basis, we compared the effectiveness of GIST and multivariate GIST (mGIST) qualitatively. We further developed an algorithm called "Multivariate Underrepresented Subgroup Identification" (MAGIC) for constructing optimal combinations of distinct value intervals of multiple traits to define underrepresented subgroups in a set of related studies. Using Type 2 diabetes mellitus (T2DM) as an example, we identified and extracted frequently used quantitative eligibility criteria variables in a set of clinical studies. We profiled the T2DM target population using the National Health and Nutrition Examination Survey (NHANES) data. RESULTS: According to the mGIST scores for four example variables, i.e., age, HbA1c, BMI, and gender, the included observational T2DM studies had superior population representativeness than the interventional T2DM studies. For the interventional T2DM studies, Phase I trials had better population representativeness than Phase III trials. People at least 65years old with HbA1c value between 5.7% and 7.2% were particularly underrepresented in the included T2DM trials. These results confirmed well-known knowledge and demonstrated the effectiveness of our methods in population representativeness assessment. CONCLUSIONS: mGIST is effective at quantifying population representativeness of related clinical studies using multiple numeric study traits. MAGIC identifies underrepresented subgroups in clinical studies. Both data-driven methods can be used to improve the transparency of design bias in participation selection at the research community level.


Subject(s)
Algorithms , Biomedical Research/standards , Demography/methods , Selection Bias , Clinical Trials as Topic , Databases, Factual , Diabetes Mellitus, Type 2 , Humans , Medical Informatics Computing , Multivariate Analysis , Nutrition Surveys , Observational Studies as Topic , Patient Selection
5.
J Biomed Inform ; 59: 89-101, 2016 Feb.
Article in English | MEDLINE | ID: mdl-26657707

ABSTRACT

Clinical data access involves complex but opaque communication between medical researchers and query analysts. Understanding such communication is indispensable for designing intelligent human-machine dialog systems that automate query formulation. This study investigates email communication and proposes a novel scheme for classifying dialog acts in clinical research query mediation. We analyzed 315 email messages exchanged in the communication for 20 data requests obtained from three institutions. The messages were segmented into 1333 utterance units. Through a rigorous process, we developed a classification scheme and applied it for dialog act annotation of the extracted utterances. Evaluation results with high inter-annotator agreement demonstrate the reliability of this scheme. This dataset is used to contribute preliminary understanding of dialog acts distribution and conversation flow in this dialog space.


Subject(s)
Biomedical Research/methods , Communication , Electronic Health Records , Humans
6.
AMIA Annu Symp Proc ; 2015: 386-95, 2015.
Article in English | MEDLINE | ID: mdl-26958170

ABSTRACT

Terminologies can suffer from poor concept coverage due to delays in addition of new concepts. This study tests a similarity-based approach to recommending concepts from a text corpus to a terminology. Our approach involves extraction of candidate concepts from a given text corpus, which are represented using a set of features. The model learns the important features to characterize a concept and recommends new concepts to a terminology. Further, we propose a cost-effective evaluation methodology to estimate the effectiveness of terminology enrichment methods. To test our methodology, we use the clinical trial eligibility criteria free-text as an example text corpus to recommend concepts for SNOMED CT. We computed precision at various rank intervals to measure the performance of the methods. Results indicate that our automated algorithm is an effective method for concept recommendation.


Subject(s)
Information Storage and Retrieval/methods , Systematized Nomenclature of Medicine , Terminology as Topic , Algorithms , Computer Simulation
SELECTION OF CITATIONS
SEARCH DETAIL
...