Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
1.
Patterns (N Y) ; 2(6): 100255, 2021 Jun 11.
Article in English | MEDLINE | ID: mdl-34179842

ABSTRACT

The presence of personally identifiable information (PII) in natural language portions of electronic health records (EHRs) constrains their broad reuse. Despite continuous improvements in automated detection of PII, residual identifiers require manual validation and correction. Here, we describe an automated de-identification system that employs an ensemble architecture, incorporating attention-based deep-learning models and rule-based methods, supported by heuristics for detecting PII in EHR data. Detected identifiers are then transformed into plausible, though fictional, surrogates to further obfuscate any leaked identifier. Our approach outperforms existing tools, with a recall of 0.992 and precision of 0.979 on the i2b2 2014 dataset and a recall of 0.994 and precision of 0.967 on a dataset of 10,000 notes from the Mayo Clinic. The de-identification system presented here enables the generation of de-identified patient data at the scale required for modern machine-learning applications to help accelerate medical discoveries.

3.
Biomed Inform Insights ; 8(Suppl 1): 13-22, 2016.
Article in English | MEDLINE | ID: mdl-27385912

ABSTRACT

The concept of optimizing health care by understanding and generating knowledge from previous evidence, ie, the Learning Health-care System (LHS), has gained momentum and now has national prominence. Meanwhile, the rapid adoption of electronic health records (EHRs) enables the data collection required to form the basis for facilitating LHS. A prerequisite for using EHR data within the LHS is an infrastructure that enables access to EHR data longitudinally for health-care analytics and real time for knowledge delivery. Additionally, significant clinical information is embedded in the free text, making natural language processing (NLP) an essential component in implementing an LHS. Herein, we share our institutional implementation of a big data-empowered clinical NLP infrastructure, which not only enables health-care analytics but also has real-time NLP processing capability. The infrastructure has been utilized for multiple institutional projects including the MayoExpertAdvisor, an individualized care recommendation solution for clinical care. We compared the advantages of big data over two other environments. Big data infrastructure significantly outperformed other infrastructure in terms of computing speed, demonstrating its value in making the LHS a possibility in the near future.

4.
Mayo Clin Proc ; 89(1): 25-33, 2014 Jan.
Article in English | MEDLINE | ID: mdl-24388019

ABSTRACT

OBJECTIVE: To report the design and implementation of the Right Drug, Right Dose, Right Time-Using Genomic Data to Individualize Treatment protocol that was developed to test the concept that prescribers can deliver genome-guided therapy at the point of care by using preemptive pharmacogenomics (PGx) data and clinical decision support (CDS) integrated into the electronic medical record (EMR). PATIENTS AND METHODS: We used a multivariate prediction model to identify patients with a high risk of initiating statin therapy within 3 years. The model was used to target a study cohort most likely to benefit from preemptive PGx testing among the Mayo Clinic Biobank participants, with a recruitment goal of 1000 patients. We used a Cox proportional hazards model with variables selected through the Lasso shrinkage method. An operational CDS model was adapted to implement PGx rules within the EMR. RESULTS: The prediction model included age, sex, race, and 6 chronic diseases categorized by the Clinical Classifications Software for International Classification of Diseases, Ninth Revision codes (dyslipidemia, diabetes, peripheral atherosclerosis, disease of the blood-forming organs, coronary atherosclerosis and other heart diseases, and hypertension). Of the 2000 Biobank participants invited, 1013 (51%) provided blood samples, 256 (13%) declined participation, 555 (28%) did not respond, and 176 (9%) consented but did not provide a blood sample within the recruitment window (October 4, 2012, through March 20, 2013). Preemptive PGx testing included CYP2D6 genotyping and targeted sequencing of 84 PGx genes. Synchronous real-time CDS was integrated into the EMR and flagged potential patient-specific drug-gene interactions and provided therapeutic guidance. CONCLUSION: This translational project provides an opportunity to begin to evaluate the impact of preemptive sequencing and EMR-driven genome-guided therapy. These interventions will improve understanding and implementation of genomic data in clinical practice.


Subject(s)
Genetic Testing/standards , Pharmacogenetics/methods , Practice Guidelines as Topic , Precision Medicine/methods , Atherosclerosis/drug therapy , Cohort Studies , Decision Making , Diabetes Mellitus/drug therapy , Dyslipidemias/drug therapy , Electronic Health Records , Female , Genotyping Techniques , Hematopoiesis/drug effects , Humans , Hydroxymethylglutaryl-CoA Reductase Inhibitors/therapeutic use , Hypertension/drug therapy , Male , Middle Aged , Pharmacogenetics/standards , Pilot Projects , Precision Medicine/standards , Predictive Value of Tests , United States
5.
Mayo Clin Proc ; 86(7): 606-14, 2011 Jul.
Article in English | MEDLINE | ID: mdl-21646302

ABSTRACT

OBJECTIVE: To create a cohort for cost-effective genetic research, the Mayo Genome Consortia (MayoGC) has been assembled with participants from research studies across Mayo Clinic with high-throughput genetic data and electronic medical record (EMR) data for phenotype extraction. PARTICIPANTS AND METHODS: Eligible participants include those who gave general research consent in the contributing studies to share high-throughput genotyping data with other investigators. Herein, we describe the design of the MayoGC, including the current participating cohorts, expansion efforts, data processing, and study management and organization. A genome-wide association study to identify genetic variants associated with total bilirubin levels was conducted to test the genetic research capability of the MayoGC. RESULTS: Genome-wide significant results were observed on 2q37 (top single nucleotide polymorphism, rs4148325; P=5.0 × 10(-62)) and 12p12 (top single nucleotide polymorphism, rs4363657; P=5.1 × 10(-8)) corresponding to a gene cluster of uridine 5'-diphospho-glucuronosyltransferases (the UGT1A cluster) and solute carrier organic anion transporter family, member 1B1 (SLCO1B1), respectively. CONCLUSION: Genome-wide association studies have identified genetic variants associated with numerous phenotypes but have been historically limited by inadequate sample size due to costly genotyping and phenotyping. Large consortia with harmonized genotype data have been assembled to attain sufficient statistical power, but phenotyping remains a rate-limiting factor in gene discovery research efforts. The EMR consists of an abundance of phenotype data that can be extracted in a relatively quick and systematic manner. The MayoGC provides a model of a unique collaborative effort in the environment of a common EMR for the investigation of genetic determinants of diseases.


Subject(s)
Bilirubin/blood , Genome-Wide Association Study , Glucuronosyltransferase/genetics , Organic Anion Transporters/genetics , Polymorphism, Genetic/genetics , Adolescent , Adult , Aged , Aged, 80 and over , Bilirubin/genetics , Cohort Studies , Cost-Benefit Analysis , Electronic Health Records , Female , Genome-Wide Association Study/economics , Humans , Liver-Specific Organic Anion Transporter 1 , Male , Middle Aged , Phenotype , Young Adult
SELECTION OF CITATIONS
SEARCH DETAIL
...