Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 18 de 18
Filter
1.
IEEE Trans Knowl Data Eng ; 30(3): 573-584, 2018 Mar 01.
Article in English | MEDLINE | ID: mdl-30034201

ABSTRACT

Privacy concern in data sharing especially for health data gains particularly increasing attention nowadays. Now some patients agree to open their information for research use, which gives rise to a new question of how to effectively use the public information to better understand the private dataset without breaching privacy. In this paper, we specialize this question as selecting an optimal subset of the public dataset for M-estimators in the framework of differential privacy (DP) in [1]. From a perspective of non-interactive learning, we first construct the weighted private density estimation from the hybrid datasets under DP. Along the same line as [2], we analyze the accuracy of the DP M-estimators based on the hybrid datasets. Our main contributions are (i) we find that the bias-variance tradeoff in the performance of our M-estimators can be characterized in the sample size of the released dataset; (2) based on this finding, we develop an algorithm to select the optimal subset of the public dataset to release under DP. Our simulation studies and application to the real datasets confirm our findings and set a guideline in the real application.

2.
J Am Med Inform Assoc ; 25(3): 300-308, 2018 Mar 01.
Article in English | MEDLINE | ID: mdl-29346583

ABSTRACT

OBJECTIVE: Finding relevant datasets is important for promoting data reuse in the biomedical domain, but it is challenging given the volume and complexity of biomedical data. Here we describe the development of an open source biomedical data discovery system called DataMed, with the goal of promoting the building of additional data indexes in the biomedical domain. MATERIALS AND METHODS: DataMed, which can efficiently index and search diverse types of biomedical datasets across repositories, is developed through the National Institutes of Health-funded biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) consortium. It consists of 2 main components: (1) a data ingestion pipeline that collects and transforms original metadata information to a unified metadata model, called DatA Tag Suite (DATS), and (2) a search engine that finds relevant datasets based on user-entered queries. In addition to describing its architecture and techniques, we evaluated individual components within DataMed, including the accuracy of the ingestion pipeline, the prevalence of the DATS model across repositories, and the overall performance of the dataset retrieval engine. RESULTS AND CONCLUSION: Our manual review shows that the ingestion pipeline could achieve an accuracy of 90% and core elements of DATS had varied frequency across repositories. On a manually curated benchmark dataset, the DataMed search engine achieved an inferred average precision of 0.2033 and a precision at 10 (P@10, the number of relevant results in the top 10 search results) of 0.6022, by implementing advanced natural language processing and terminology services. Currently, we have made the DataMed system publically available as an open source package for the biomedical community.

3.
J Am Med Inform Assoc ; 24(6): 1211-1220, 2017 Nov 01.
Article in English | MEDLINE | ID: mdl-29016974

ABSTRACT

OBJECTIVES: To introduce blockchain technologies, including their benefits, pitfalls, and the latest applications, to the biomedical and health care domains. TARGET AUDIENCE: Biomedical and health care informatics researchers who would like to learn about blockchain technologies and their applications in the biomedical/health care domains. SCOPE: The covered topics include: (1) introduction to the famous Bitcoin crypto-currency and the underlying blockchain technology; (2) features of blockchain; (3) review of alternative blockchain technologies; (4) emerging nonfinancial distributed ledger technologies and applications; (5) benefits of blockchain for biomedical/health care applications when compared to traditional distributed databases; (6) overview of the latest biomedical/health care applications of blockchain technologies; and (7) discussion of the potential challenges and proposed solutions of adopting blockchain technologies in biomedical/health care domains.


Subject(s)
Computer Security , Data Mining , Medical Informatics , Algorithms , Commerce , Confidentiality , Data Mining/economics
4.
Sci Data ; 4: 170059, 2017 06 06.
Article in English | MEDLINE | ID: mdl-28585923

ABSTRACT

Today's science increasingly requires effective ways to find and access existing datasets that are distributed across a range of repositories. For researchers in the life sciences, discoverability of datasets may soon become as essential as identifying the latest publications via PubMed. Through an international collaborative effort funded by the National Institutes of Health (NIH)'s Big Data to Knowledge (BD2K) initiative, we have designed and implemented the DAta Tag Suite (DATS) model to support the DataMed data discovery index. DataMed's goal is to be for data what PubMed has been for the scientific literature. Akin to the Journal Article Tag Suite (JATS) used in PubMed, the DATS model enables submission of metadata on datasets to DataMed. DATS has a core set of elements, which are generic and applicable to any type of dataset, and an extended set that can accommodate more specialized data types. DATS is a platform-independent model also available as an annotated serialization in schema.org, which in turn is widely used by major search engines like Google, Microsoft, Yahoo and Yandex.

6.
J Natl Cancer Inst ; 109(2)2017 02.
Article in English | MEDLINE | ID: mdl-27688295

ABSTRACT

Biospecimen donation is key to the Precision Medicine Initiative, which pioneers a model for accelerating biomedical research through individualized care. Personalized medicine should be made available to medically underserved populations, including the large and growing US Hispanic population. We present results of a study of 140 Hispanic women who underwent a breast biopsy at a safety-net hospital and were randomly assigned to receive information and request for consent for biospecimen and data sharing by the patient's physician or a research assistant. Consent rates were high (97.1% and 92.9% in the physician and research assistant arms, respectively) and not different between groups (relative risk [RR] = 1.05, 95% confidence interval [CI] = 0.96 to 1.10). Consistent with a small but growing literature, we show that perceptions of Hispanics' unwillingness to participate in biospecimen sharing for research are not supported by data. Safety-net clinics and hospitals offer untapped possibilities for enhancing participation of underserved populations in the exciting Precision Medicine Initiative.


Subject(s)
Biological Specimen Banks , Breast/pathology , Hispanic or Latino , Information Dissemination , Informed Consent , Adult , Biopsy , Cooperative Behavior , Female , Humans , Middle Aged , Precision Medicine , Random Allocation , Safety-net Providers , Vulnerable Populations
7.
BMC Med Inform Decis Mak ; 16 Suppl 3: 73, 2016 07 25.
Article in English | MEDLINE | ID: mdl-27454233

ABSTRACT

BACKGROUND: Accurately assessing pain for those who cannot make self-report of pain, such as minimally responsive or severely brain-injured patients, is challenging. In this paper, we attempted to address this challenge by answering the following questions: (1) if the pain has dependency structures in electronic signals and if so, (2) how to apply this pattern in predicting the state of pain. To this end, we have been investigating and comparing the performance of several machine learning techniques. METHODS: We first adopted different strategies, in which the collected original n-dimensional numerical data were converted into binary data. Pain states are represented in binary format and bound with above binary features to construct (n + 1) -dimensional data. We then modeled the joint distribution over all variables in this data using the Restricted Boltzmann Machine (RBM). RESULTS: Seventy-eight pain data items were collected. Four individuals with the number of recorded labels larger than 1000 were used in the experiment. Number of avaliable data items for the four patients varied from 22 to 28. Discriminant RBM achieved better accuracy in all four experiments. CONCLUSION: The experimental results show that RBM models the distribution of our binary pain data well. We showed that discriminant RBM can be used in a classification task, and the initial result is advantageous over other classifiers such as support vector machine (SVM) using PCA representation and the LDA discriminant method.


Subject(s)
Pain/diagnosis , Pattern Recognition, Automated , Humans , Neural Networks, Computer
8.
J Am Med Inform Assoc ; 22(6): 1153-63, 2015 Nov.
Article in English | MEDLINE | ID: mdl-26555018

ABSTRACT

Biomedical Informatics is a growing interdisciplinary field in which research topics and citation trends have been evolving rapidly in recent years. To analyze these data in a fast, reproducible manner, automation of certain processes is needed. JAMIA is a "generalist" journal for biomedical informatics. Its articles reflect the wide range of topics in informatics. In this study, we retrieved Medical Subject Headings (MeSH) terms and citations of JAMIA articles published between 2009 and 2014. We use tensors (i.e., multidimensional arrays) to represent the interaction among topics, time and citations, and applied tensor decomposition to automate the analysis. The trends represented by tensors were then carefully interpreted and the results were compared with previous findings based on manual topic analysis. A list of most cited JAMIA articles, their topics, and publication trends over recent years is presented. The analyses confirmed previous studies and showed that, from 2012 to 2014, the number of articles related to MeSH terms Methods, Organization & Administration, and Algorithms increased significantly both in number of publications and citations. Citation trends varied widely by topic, with Natural Language Processing having a large number of citations in particular years, and Medical Record Systems, Computerized remaining a very popular topic in all years.


Subject(s)
Bibliometrics , Biomedical Research/trends , Medical Informatics/trends , Medical Subject Headings , Periodicals as Topic , Societies, Medical
9.
J Am Med Inform Assoc ; 21(1): 31-6, 2014.
Article in English | MEDLINE | ID: mdl-23989082

ABSTRACT

The database of genotypes and phenotypes (dbGaP) developed by the National Center for Biotechnology Information (NCBI) is a resource that contains information on various genome-wide association studies (GWAS) and is currently available via NCBI's dbGaP Entrez interface. The database is an important resource, providing GWAS data that can be used for new exploratory research or cross-study validation by authorized users. However, finding studies relevant to a particular phenotype of interest is challenging, as phenotype information is presented in a non-standardized way. To address this issue, we developed PhenDisco (phenotype discoverer), a new information retrieval system for dbGaP. PhenDisco consists of two main components: (1) text processing tools that standardize phenotype variables and study metadata, and (2) information retrieval tools that support queries from users and return ranked results. In a preliminary comparison involving 18 search scenarios, PhenDisco showed promising performance for both unranked and ranked search comparisons with dbGaP's search engine Entrez. The system can be accessed at http://pfindr.net.


Subject(s)
Algorithms , Databases, Genetic , Information Systems , Phenotype , Databases, Genetic/standards , Genome-Wide Association Study , Genotype , Humans , Subject Headings
10.
Article in English | MEDLINE | ID: mdl-24303228

ABSTRACT

The database of Genotypes and Phenotypes (dbGaP) is archiving the results of different Genome Wide Association Studies (GWAS). dbGaP has a multitude of phenotype variables, but they are not harmonized across studies. We proposed a method to standardize phenotype variables by classifying similar variables based on semantic distances. We first extracted variables description, enriched them using domain knowledge, and computed the distances among them. We used clustering techniques to classify the most similar variables. We used domain experts to audit clusters, annotated the clusters with appropriate labels, and used re-clustering to build a semantically-driven Genotypes and Phenotypes (sdGaP) ontology using the UMLS semantic network and metathesaurus. The sdGaP ontology allowed us to expand user queries and retrieve information using a semantic metric called density measure (DM). We illustrated the potential improvement of information retrieval using the sdGaP ontology in one search scenario using the variables from the Cleveland Family Study.

11.
PLoS One ; 8(9): e76384, 2013.
Article in English | MEDLINE | ID: mdl-24058713

ABSTRACT

The database of Genotypes and Phenotypes (dbGaP) contains various types of data generated from genome-wide association studies (GWAS). These data can be used to facilitate novel scientific discoveries and to reduce cost and time for exploratory research. However, idiosyncrasies and inconsistencies in phenotype variable names are a major barrier to reusing these data. We addressed these challenges in standardizing phenotype variables by formalizing their descriptions using Clinical Element Models (CEM). Designed to represent clinical data, CEMs were highly expressive and thus were able to represent a majority (77.5%) of the 215 phenotype variable descriptions. However, their high expressivity also made it difficult to directly apply them to research data such as phenotype variables in dbGaP. Our study suggested that simplification of the template models makes it more straightforward to formally represent the key semantics of phenotype variables.


Subject(s)
Databases, Genetic , Models, Genetic , Phenotype , Female , Genome-Wide Association Study , Humans , Male
12.
Med Care ; 51(8 Suppl 3): S45-52, 2013 Aug.
Article in English | MEDLINE | ID: mdl-23774519

ABSTRACT

INTRODUCTION: The need for a common format for electronic exchange of clinical data prompted federal endorsement of applicable standards. However, despite obvious similarities, a consensus standard has not yet been selected in the comparative effectiveness research (CER) community. METHODS: Using qualitative metrics for data retrieval and information loss across a variety of CER topic areas, we compare several existing models from a representative sample of organizations associated with clinical research: the Observational Medical Outcomes Partnership (OMOP), Biomedical Research Integrated Domain Group, the Clinical Data Interchange Standards Consortium, and the US Food and Drug Administration. RESULTS: While the models examined captured a majority of the data elements that are useful for CER studies, data elements related to insurance benefit design and plans were most detailed in OMOP's CDM version 4.0. Standardized vocabularies that facilitate semantic interoperability were included in the OMOP and US Food and Drug Administration Mini-Sentinel data models, but are left to the discretion of the end-user in Biomedical Research Integrated Domain Group and Analysis Data Model, limiting reuse opportunities. Among the challenges we encountered was the need to model data specific to a local setting. This was handled by extending the standard data models. DISCUSSION: We found that the Common Data Model from the OMOP met the broadest complement of CER objectives. Minimal information loss occurred in mapping data from institution-specific data warehouses onto the data models from the standards we assessed. However, to support certain scenarios, we found a need to enhance existing data dictionaries with local, institution-specific information.


Subject(s)
Comparative Effectiveness Research/organization & administration , Models, Theoretical , Systems Integration , Humans , Information Storage and Retrieval/methods , Vocabulary, Controlled
14.
NI 2012 (2012) ; 2012: 103, 2012.
Article in English | MEDLINE | ID: mdl-24199064

ABSTRACT

Nursing terminology development efforts in the United States and globally provide concept coverage across many domains of nursing practice. Efforts to integrate concepts from across terminology systems into a single reference terminology support broad concept coverage but do not provide a means to leverage the full benefits of the individual terminology systems. The purpose of this paper is to explore the feasibility of harmonizing the 198 Clinical Care Classification (CCC) System core intervention concepts with intervention concepts in the International Classification for Nursing Practice (ICNP®) as a means to leverage both the information model components of the CCC system and the broad concept coverage of the ICNP®. Findings suggest that the CCC system and ICNP® are largely interoperable and a common framework underlying the two terminology systems provides a foundation for harmonization.

15.
AMIA Annu Symp Proc ; 2011: 356-63, 2011.
Article in English | MEDLINE | ID: mdl-22195088

ABSTRACT

As health care systems and providers move towards meaningful use of electronic health records, the once distant vision of collaborative patient-centric, interdisciplinary plans of care, generated and updated across organizations and levels of care, may soon become a reality. Effective care planning is included in the proposed Stages 2-3 Meaningful Use quality measures. To facilitate interoperability, standardization of plan of care messaging, content, information and terminology models are needed. This degree of standardization requires local and national coordination. The purpose of this paper is to review some existing standards that may be leveraged to support development of interdisciplinary patient-centric plans of care. Standards are then applied to a use case to demonstrate one method for achieving patient-centric and interoperable interdisciplinary plan of care documentation. Our pilot work suggests that existing standards provide a foundation for adoption and implementation of patient-centric plans of care that are consistent with federal requirements.


Subject(s)
Electronic Health Records/standards , Meaningful Use , Patient Care Planning/standards , Patient-Centered Care , Health Level Seven , Humans , Pilot Projects , Systematized Nomenclature of Medicine , Terminology as Topic , United States
16.
J Am Med Inform Assoc ; 18 Suppl 1: i166-70, 2011 Dec.
Article in English | MEDLINE | ID: mdl-22180873

ABSTRACT

Biomedical informatics is a young, highly interdisciplinary field that is evolving quickly. It is important to know which published topics in generalist biomedical informatics journals elicit the most interest from the scientific community, and whether this interest changes over time, so that journals can better serve their readers. It is also important to understand whether free access to biomedical informatics articles impacts their citation rates in a significant way, so authors can make informed decisions about unlock fees, and journal owners and publishers understand the implications of open access. The topics and JAMIA articles from years 2009 and 2010 that have been most cited according to the Web of Science are described. To better understand the effects of free access in article dissemination, the number of citations per month after publication for articles published in 2009 versus 2010 was compared, since there was a significant change in free access to JAMIA articles between those years. Results suggest that there is a positive association between free access and citation rate for JAMIA articles.


Subject(s)
Bibliometrics , Medical Informatics/trends , Biomedical Research/trends , Medical Subject Headings , Periodicals as Topic
17.
J Am Med Inform Assoc ; 16(2): 238-46, 2009.
Article in English | MEDLINE | ID: mdl-19074298

ABSTRACT

OBJECTIVES: The purpose of this study was to evaluate the adequacy of the International Classification of Nursing Practice (1) (ICPN) Version 1.0 as a representational model for nursing assessment documentation. DESIGN AND MEASUREMENTS: To identify representational requirements of nursing assessments, the authors mapped key concepts and semantic relations extracted from standardized and local nursing admission assessment documentation forms/templates and inpatient admission assessment records to the ICNP. Next, they expanded the list of ICNP semantic relations with those obtained from the admission assessment forms/templates. The expanded ICNP semantic relations were then validated against the semantic relations identified from an additional set of admission assessment records and a set of 300 randomly selected North American Nursing Diagnosis Association defining characteristic phrases. The concept coverage of the ICNP was evaluated by mapping the concepts extracted from these sources to the ICNP concepts. The UMLS Methathesaurus was then used to map concepts without exact matches to other American Nursing Association (ANA) recognized terminologies. RESULTS: The authors found that along with the 30 existing ICNP semantic relations, an additional 17 are required for the ICNP to function as a representational model for nursing assessment documentation. Eight hundred and five unique assessment concepts were extracted from all sources. Forty-three percent of these unique assessment concepts had exact matches in the ICNP. An additional 20% had matches in the ICNP classified as narrower, broader, or "other." Of the concepts without exact matches in the ICNP, 81% had exact matches found in other ANA recognized terminologies. CONCLUSIONS: The broad concept coverage and the logic-based structure of the ICNP make it a flexible and robust standard. The ICNP provides a framework from which to capture and reuse atomic level data to facilitate evidence-based practice.


Subject(s)
Nursing Assessment/classification , Nursing Records/classification , Vocabulary, Controlled , Evidence-Based Nursing , Models, Theoretical , Semantics
18.
AMIA Annu Symp Proc ; : 954, 2008 Nov 06.
Article in English | MEDLINE | ID: mdl-18999149

ABSTRACT

The purpose of this study was to identify key concepts and semantic relations necessary to represent standardized and local patient assessment items in an electronic documentation system and to evaluate the degree to which coverage of both are represented by ICNP. A total of 805 unique assessment concepts were identified. Forty-three percent had exact matches in ICNP, and an additional 20% had matches in the ICNP classified as narrower, broader or other.


Subject(s)
Nursing Assessment/classification , Nursing Informatics/statistics & numerical data , Nursing Records/statistics & numerical data , Semantics , Terminology as Topic , Vocabulary, Controlled , Boston , Internationality
SELECTION OF CITATIONS
SEARCH DETAIL
...