Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 19 de 19
Filter
1.
Stud Health Technol Inform ; 294: 28-32, 2022 May 25.
Article in English | MEDLINE | ID: mdl-35612010

ABSTRACT

Sharing observational and interventional health data within a common data space enables university hospitals to leverage such data for biomedical discovery and moving towards a learning health system. OBJECTIVE: To describe the AP-HP Health Data Space (AHDS) and the IT services supporting piloting, research, innovation and patient care. METHODS: Built on three pillars - governance and ethics, technology and valorization - the AHDS and its major component, the Clinical Data Warehouse (CDW) have been developed since 2015. RESULTS: The AP-HP CDW has been made available at scale to AP-HP both healthcare professionals and public or private partners in January 2017. Supported by an institutional secured and high-performance cloud and an ecosystem of tools, mostly open source, the AHDS integrates a large amount of massive healthcare data collected during care and research activities. As of December 2021, the AHDS operates the electronic data capture for almost +840 clinical trials sponsored by AP-HP, the CDW is enabling the processing of health data from more than 11 million patients and generated +200 secondary data marts from IRB authorized research projects. During the Covid-19 pandemic, AHDS has had to evolve quickly to support administrative professionals and caregivers heavily involved in the reorganization of both patient care and biomedical research. CONCLUSION: The AP-HP Data Space is a key facilitator for data-driven evidence generation and making the health system more efficient and personalized.


Subject(s)
COVID-19 , Data Warehousing , Information Dissemination , COVID-19/epidemiology , Data Warehousing/methods , Health Personnel , Humans , Information Dissemination/methods , Pandemics
2.
J Med Internet Res ; 22(6): e18579, 2020 06 04.
Article in English | MEDLINE | ID: mdl-32496199

ABSTRACT

BACKGROUND: Health services researchers spend a substantial amount of time performing integration, cleansing, interpretation, and aggregation of raw data from multiple public or private data sources. Often, each researcher (or someone in their team) duplicates this effort for their own project, facing the same challenges and experiencing the same pitfalls discovered by those before them. OBJECTIVE: This paper described a design process for creating a data warehouse that includes the most frequently used databases in health services research. METHODS: The design is based on a conceptual iterative process model framework that utilizes the sociotechnical systems theory approach and includes the capacity for subsequent updates of the existing data sources and the addition of new ones. We introduce the theory and the framework and then explain how they are used to inform the methodology of this study. RESULTS: The application of the iterative process model to the design research process of problem identification and solution design for the Healthcare Research and Analytics Data Infrastructure Solution (HRADIS) is described. Each phase of the iterative model produced end products to inform the implementation of HRADIS. The analysis phase produced the problem statement and requirements documents. The projection phase produced a list of tasks and goals for the ideal system. Finally, the synthesis phase provided the process for a plan to implement HRADIS. HRADIS structures and integrates data dictionaries provided by the data sources, allowing the creation of dimensions and measures for a multidimensional business intelligence system. We discuss how HRADIS is complemented with a set of data mining, analytics, and visualization tools to enable researchers to more efficiently apply multiple methods to a given research project. HRADIS also includes a built-in security and account management framework for data governance purposes to ensure customized authorization depending on user roles and parts of the data the roles are authorized to access. CONCLUSIONS: To address existing inefficiencies during the obtaining, extracting, preprocessing, cleansing, and filtering stages of data processing in health services research, we envision HRADIS as a full-service data warehouse integrating frequently used data sources, processes, and methods along with a variety of data analytics and visualization tools. This paper presents the application of the iterative process model to build such a solution. It also includes a discussion on several prominent issues, lessons learned, reflections and recommendations, and future considerations, as this model was applied.


Subject(s)
Data Science/methods , Data Warehousing/methods , Databases, Factual/standards , Health Services Research/methods , Humans
3.
Cancer Epidemiol Biomarkers Prev ; 29(4): 777-786, 2020 04.
Article in English | MEDLINE | ID: mdl-32051191

ABSTRACT

BACKGROUND: Large-scale cancer epidemiology cohorts (CEC) have successfully collected, analyzed, and shared patient-reported data for years. CECs increasingly need to make their data more findable, accessible, interoperable, and reusable, or FAIR. How CECs should approach this transformation is unclear. METHODS: The California Teachers Study (CTS) is an observational CEC of 133,477 participants followed since 1995-1996. In 2014, we began updating our data storage, management, analysis, and sharing strategy. With the San Diego Supercomputer Center, we deployed a new infrastructure based on a data warehouse to integrate and manage data and a secure and shared workspace with documentation, software, and analytic tools that facilitate collaboration and accelerate analyses. RESULTS: Our new CTS infrastructure includes a data warehouse and data marts, which are focused subsets from the data warehouse designed for efficiency. The secure CTS workspace utilizes a remote desktop service that operates within a Health Insurance Portability and Accountability Act (HIPAA)- and Federal Information Security Management Act (FISMA)-compliant platform. Our infrastructure offers broad access to CTS data, includes statistical analysis and data visualization software and tools, flexibly manages other key data activities (e.g., cleaning, updates, and data sharing), and will continue to evolve to advance FAIR principles. CONCLUSIONS: Our scalable infrastructure provides the security, authorization, data model, metadata, and analytic tools needed to manage, share, and analyze CTS data in ways that are consistent with the NCI's Cancer Research Data Commons Framework. IMPACT: The CTS's implementation of new infrastructure in an ongoing CEC demonstrates how population sciences can explore and embrace new cloud-based and analytics infrastructure to accelerate cancer research and translation.See all articles in this CEBP Focus section, "Modernizing Population Science."


Subject(s)
Cloud Computing/legislation & jurisprudence , Data Collection/methods , Data Warehousing/methods , Health Information Management/methods , Neoplasms/epidemiology , Big Data , Computer Security , Data Collection/legislation & jurisprudence , Data Warehousing/legislation & jurisprudence , Health Information Management/legislation & jurisprudence , Health Insurance Portability and Accountability Act , Humans , Longitudinal Studies , Observational Studies as Topic/legislation & jurisprudence , Observational Studies as Topic/methods , Prospective Studies , United States
4.
JCO Clin Cancer Inform ; 3: 1-15, 2019 10.
Article in English | MEDLINE | ID: mdl-31633999

ABSTRACT

PURPOSE: Data collection in clinical trials is becoming complex, with a huge number of variables that need to be recorded, verified, and analyzed to effectively measure clinical outcomes. In this study, we used data warehouse (DW) concepts to achieve this goal. A DW was developed to accommodate data from a large clinical trial, including all the characteristics collected. We present the results related to baseline variables with the following objectives: developing a data quality (DQ) control strategy and improving outcome analysis according to the clinical trial primary end points. METHODS: Data were retrieved from the electronic case reporting forms (eCRFs) of the phase III, multicenter MCL0208 trial (ClinicalTrials.gov identifier: NCT02354313) of the Fondazione Italiana Linfomi for younger patients with untreated mantle cell lymphoma (MCL). The DW was created with a relational database management system. Recommended DQ dimensions were observed to monitor the activity of each site to handle DQ management during patient follow-up. The DQ management was applied to clinically relevant parameters that predicted progression-free survival to assess its impact. RESULTS: The DW encompassed 16 tables, which included 226 variables for 300 patients and 199,500 items of data. The tool allowed cross-comparison analysis and detected some incongruities in eCRFs, prompting queries to clinical centers. This had an impact on clinical end points, as the DQ control strategy was able to improve the prognostic stratification according to single parameters, such as tumor infiltration by flow cytometry, and even using established prognosticators, such as the MCL International Prognostic Index. CONCLUSION: The DW is a powerful tool to organize results from large phase III clinical trials and to effectively improve DQ through the application of effective engineered tools.


Subject(s)
Data Warehousing/methods , Data Warehousing/standards , Lymphoma, Mantle-Cell/mortality , Lymphoma, Mantle-Cell/therapy , Quality Assurance, Health Care/methods , Aged , Clinical Trials, Phase III as Topic , Disease Progression , Female , Humans , Lymphoma, Mantle-Cell/diagnosis , Male , Multicenter Studies as Topic , Neoplasm Staging , Randomized Controlled Trials as Topic , Survival Rate , Treatment Outcome
5.
Semin Diagn Pathol ; 36(5): 294-302, 2019 Sep.
Article in English | MEDLINE | ID: mdl-31227427

ABSTRACT

Application of lean process management strategies to process improvement in clinical and anatomic pathology laboratories afford opportunities to enhance workflow process to lower costs and simultaneously to improve patient safety. Bar-codes are now employed in most modern anatomic pathology laboratories to track specimens from the clinicians' office or the operating room all through the continuum of service to specimen disposal. In order to enhance patient safety and workload optimization strategies, novel computer hardware and software assets are being developed to enable monitoring, analysis, and improvement of specimen workflow and diagnostic accuracy. More recently, data warehouse technologies from the retail industry have been optimized to permit high-throughput analysis of granular data in the laboratory arena. These optimize mass-data analysis in real time in the information technology space. In this review we describe the application of an in-house designed data warehouse to the anatomic pathology assets of a large regional reference laboratory.


Subject(s)
Data Warehousing/methods , Laboratories/organization & administration , Pathology, Clinical/organization & administration , Quality Assurance, Health Care , Workflow , Humans , Pathology, Clinical/methods , Patient Safety
6.
J Digit Imaging ; 32(5): 870-879, 2019 10.
Article in English | MEDLINE | ID: mdl-31201587

ABSTRACT

In the last decades, the amount of medical imaging studies and associated metadata has been rapidly increasing. Despite being mostly used for supporting medical diagnosis and treatment, many recent initiatives claim the use of medical imaging studies in clinical research scenarios but also to improve the business practices of medical institutions. However, the continuous production of medical imaging studies coupled with the tremendous amount of associated data, makes the real-time analysis of medical imaging repositories difficult using conventional tools and methodologies. Those archives contain not only the image data itself but also a wide range of valuable metadata describing all the stakeholders involved in the examination. The exploration of such technologies will increase the efficiency and quality of medical practice. In major centers, it represents a big data scenario where Business Intelligence (BI) and Data Analytics (DA) are rare and implemented through data warehousing approaches. This article proposes an Extract, Transform, Load (ETL) framework for medical imaging repositories able to feed, in real-time, a developed BI (Business Intelligence) application. The solution was designed to provide the necessary environment for leading research on top of live institutional repositories without requesting the creation of a data warehouse. It features an extensible dashboard with customizable charts and reports, with an intuitive web-based interface that empowers the usage of novel data mining techniques, namely, a variety of data cleansing tools, filters, and clustering functions. Therefore, the user is not required to master the programming skills commonly needed for data analysts and scientists, such as Python and R.


Subject(s)
Data Mining/methods , Data Warehousing/methods , Metadata/statistics & numerical data , Radiology Information Systems/organization & administration , Radiology Information Systems/statistics & numerical data , Data Mining/statistics & numerical data , Data Warehousing/statistics & numerical data , Humans
7.
BMC Med ; 17(1): 68, 2019 03 27.
Article in English | MEDLINE | ID: mdl-30914045

ABSTRACT

Blockchain is a shared distributed digital ledger technology that can better facilitate data management, provenance and security, and has the potential to transform healthcare. Importantly, blockchain represents a data architecture, whose application goes far beyond Bitcoin - the cryptocurrency that relies on blockchain and has popularized the technology. In the health sector, blockchain is being aggressively explored by various stakeholders to optimize business processes, lower costs, improve patient outcomes, enhance compliance, and enable better use of healthcare-related data. However, critical in assessing whether blockchain can fulfill the hype of a technology characterized as 'revolutionary' and 'disruptive', is the need to ensure that blockchain design elements consider actual healthcare needs from the diverse perspectives of consumers, patients, providers, and regulators. In addition, answering the real needs of healthcare stakeholders, blockchain approaches must also be responsive to the unique challenges faced in healthcare compared to other sectors of the economy. In this sense, ensuring that a health blockchain is 'fit-for-purpose' is pivotal. This concept forms the basis for this article, where we share views from a multidisciplinary group of practitioners at the forefront of blockchain conceptualization, development, and deployment.


Subject(s)
Biomedical Technology , Computer Communication Networks , Delivery of Health Care/trends , Management Information Systems , Medical Informatics , Biomedical Technology/methods , Biomedical Technology/organization & administration , Biomedical Technology/trends , Computer Communication Networks/organization & administration , Computer Communication Networks/standards , Computer Communication Networks/supply & distribution , Computer Communication Networks/trends , Data Warehousing/methods , Data Warehousing/trends , Delivery of Health Care/methods , Delivery of Health Care/organization & administration , Electronic Data Processing/methods , Electronic Data Processing/organization & administration , Electronic Data Processing/trends , Equipment and Supplies Utilization/organization & administration , Equipment and Supplies Utilization/trends , High-Throughput Screening Assays/standards , Humans , Management Information Systems/standards , Management Information Systems/trends , Medical Informatics/methods , Medical Informatics/organization & administration , Medical Informatics/trends , Medical Records/standards
8.
Comput Methods Programs Biomed ; 181: 104825, 2019 Nov.
Article in English | MEDLINE | ID: mdl-30612785

ABSTRACT

OBJECTIVE: To identify common temporal evolution profiles in biological data and propose a semi-automated method to these patterns in a clinical data warehouse (CDW). MATERIALS AND METHODS: We leveraged the CDW of the European Hospital Georges Pompidou and tracked the evolution of 192 biological parameters over a period of 17 years (for 445,000 + patients, and 131 million laboratory test results). RESULTS: We identified three common profiles of evolution: discretization, breakpoints, and trends. We developed computational and statistical methods to identify these profiles in the CDW. Overall, of the 192 observed biological parameters (87,814,136 values), 135 presented at least one evolution. We identified breakpoints in 30 distinct parameters, discretizations in 32, and trends in 79. DISCUSSION AND CONCLUSION: our method allowed the identification of several temporal events in the data. Considering the distribution over time of these events, we identified probable causes for the observed profiles: instruments or software upgrades and changes in computation formulas. We evaluated the potential impact for data reuse. Finally, we formulated recommendations to enable safe use and sharing of biological data collection to limit the impact of data evolution in retrospective and federated studies (e.g. the annotation of laboratory parameters presenting breakpoints or trends).


Subject(s)
Clinical Laboratory Services/statistics & numerical data , Data Accuracy , Data Warehousing/methods , Electronic Health Records/statistics & numerical data , Information Storage and Retrieval , Medical Informatics/methods , Automation , Database Management Systems , France/epidemiology , Humans , Pattern Recognition, Automated , Reproducibility of Results , Retrospective Studies , Software , Systems Integration , Time Factors
9.
Arch Pathol Lab Med ; 143(4): 518-524, 2019 04.
Article in English | MEDLINE | ID: mdl-30525932

ABSTRACT

CONTEXT.­: The laboratory total testing process includes preanalytic, analytic, and postanalytic phases, but most laboratory quality improvement efforts address the analytic phase. Expanding quality improvement to preanalytic and postanalytic phases via use of medical data warehouses, repositories that include clinical, utilization, and administrative data, can improve patient care by ensuring appropriate test utilization. Cross-department, multidisciplinary collaboration to address gaps and improve patient and system outcomes is beneficial. OBJECTIVE.­: To demonstrate medical data warehouse utility for characterizing laboratory-associated quality gaps amenable to preanalytic or postanalytic interventions. DESIGN.­: A multidisciplinary team identified quality gaps. Medical data warehouse data were queried to characterize gaps. Organizational leaders were interviewed about quality improvement priorities. A decision aid with elements including national guidelines, local and national importance, and measurable outcomes was completed for each gap. RESULTS.­: Gaps identified included (1) test ordering; (2) diagnosis, detection, and documentation, and (3) high-risk medication monitoring. After examination of medical data warehouse data including enrollment, diagnoses, laboratory, pharmacy, and procedures for baseline performance, high-risk medication monitoring was selected, specifically alanine aminotransferase, aspartate aminotransferase, complete blood count, and creatinine testing among patients receiving disease-modifying antirheumatic drugs. The test utilization gap was in monitoring timeliness (eg, >60% of patients had a monitoring gap exceeding the guideline recommended frequency). Other contributors to selecting this gap were organizational enthusiasm, regulatory labeling, and feasibility of a significant laboratory role in addressing the gap. CONCLUSIONS.­: A multidisciplinary process facilitated identification and selection of a laboratory medicine quality gap. Medical data warehouse data were instrumental in characterizing gaps.


Subject(s)
Data Warehousing/methods , Laboratories/standards , Laboratory Proficiency Testing/methods , Quality Assurance, Health Care/methods , Humans
10.
J Am Med Inform Assoc ; 25(10): 1331-1338, 2018 10 01.
Article in English | MEDLINE | ID: mdl-30085008

ABSTRACT

Objective: Healthcare organizations use research data models supported by projects and tools that interest them, which often means organizations must support the same data in multiple models. The healthcare research ecosystem would benefit if tools and projects could be adopted independently from the underlying data model. Here, we introduce the concept of a reusable application programming interface (API) for healthcare and show that the i2b2 API can be adapted to support diverse patient-centric data models. Materials and Methods: We develop methodology for extending i2b2's pre-existing API to query additional data models, using i2b2's recent "multi-fact-table querying" feature. Our method involves developing data-model-specific i2b2 ontologies and mapping these to query non-standard table structure. Results: We implement this methodology to query OMOP and PCORnet models, which we validate with the i2b2 query tool. We implement the entire PCORnet data model and a five-domain subset of the OMOP model. We also demonstrate that additional, ancillary data model columns can be modeled and queried as i2b2 "modifiers." Discussion: i2b2's REST API can be used to query multiple healthcare data models, enabling shared tooling to have a choice of backend data stores. This enables separation between data model and software tooling for some of the more popular open analytic data models in healthcare. Conclusion: This methodology immediately allows querying OMOP and PCORnet using the i2b2 API. It is released as an open-source set of Docker images, and also on the i2b2 community wiki.


Subject(s)
Big Data , Data Warehousing/methods , Electronic Health Records , Internet , Biomedical Research , Databases, Factual , Humans , Models, Theoretical , Software , Vocabulary, Controlled
11.
J Diabetes Complications ; 32(7): 650-654, 2018 Jul.
Article in English | MEDLINE | ID: mdl-29903409

ABSTRACT

AIMS: This study validated enterprise data warehouse (EDW) data for a cohort of hospitalized patients with a primary diagnosis of diabetic ketoacidosis (DKA). METHODS: 247 patients with 319 admissions for DKA (ICD-9 code 250.12, 250.13, or 250.xx with biochemical criteria for DKA) were admitted to Northwestern Memorial Hospital from 1/1/2010 to 9/1/2013. Validation was performed by electronic medical record (EMR) review of 10% of admissions (N = 32). Classification of diabetes type (Type 1 vs. Type 2) and DKA clinical status were compared between the EMR review and EDW data. RESULTS: Key findings included incorrect classification of diabetes type in 5 of 32 (16%) admissions and indeterminable classification in 5 admissions. DKA was not present, based on the review, in 11 of 32 (34%) admissions. DKA was not present, based on biochemical criteria, in 15 of 32 (47%) admissions. CONCLUSIONS: This study found that EDW data have substantial errors. Some discrepancies can be addressed by refining the EDW query code, while others, related to diabetes classification and DKA diagnosis, cannot be corrected without improving clinical coding accuracy, consistency of medical record documentation, or EMR design. These results support the need for comprehensive validation of data for complex clinical populations obtained through data repositories such as the EDW.


Subject(s)
Data Warehousing , Diabetic Ketoacidosis/epidemiology , Electronic Health Records , Adult , Aged , Cohort Studies , Data Warehousing/methods , Data Warehousing/standards , Datasets as Topic/standards , Diabetes Mellitus, Type 1/epidemiology , Diabetes Mellitus, Type 2/epidemiology , Electronic Health Records/organization & administration , Electronic Health Records/standards , Electronic Health Records/supply & distribution , Female , Hospitalization/statistics & numerical data , Humans , Male , Middle Aged , Retrospective Studies
13.
Healthc Q ; 20(3): 22-28, 2017.
Article in English | MEDLINE | ID: mdl-29132446

ABSTRACT

Closed Loop Analytics© is receiving growing interest in healthcare as a term referring to information technology, local data and clinical analytics working together to generate evidence for improvement. The Closed Loop Analytics model consists of three loops corresponding to the decision-making levels of an organization and the associated data within each loop - Patients, Protocols, and Populations. The authors propose that each of these levels should utilize the same ecosystem of electronic health record (EHR) and enterprise data warehouse (EDW) enabled data, in a closed-loop fashion, with that data being repackaged and delivered to suit the analytic and decision support needs of each level, in support of better outcomes.


Subject(s)
Data Warehousing/methods , Delivery of Health Care/organization & administration , Electronic Health Records/organization & administration , Decision Making , Humans , Quality of Health Care
14.
Database (Oxford) ; 20172017 01 01.
Article in English | MEDLINE | ID: mdl-28605774

ABSTRACT

Database URL: http://GenomeHubs.org.As the generation and use of genomic datasets is becoming increasingly common in all areas of biology, the need for resources to collate, analyse and present data from one or more genome projects is becoming more pressing. The Ensembl platform is a powerful tool to make genome data and cross-species analyses easily accessible through a web interface and a comprehensive application programming interface. Here we introduce GenomeHubs, which provide a containerized environment to facilitate the setup and hosting of custom Ensembl genome browsers. This simplifies mirroring of existing content and import of new genomic data into the Ensembl database schema. GenomeHubs also provide a set of analysis containers to decorate imported genomes with results of standard analyses and functional annotations and support export to flat files, including EMBL format for submission of assemblies and annotations to International Nucleotide Sequence Database Collaboration.


Subject(s)
Data Warehousing/methods , Databases, Nucleic Acid , Genome , Internet , Sequence Analysis, DNA/methods , Web Browser , Animals , Humans
15.
AMIA Annu Symp Proc ; 2017: 1411-1420, 2017.
Article in English | MEDLINE | ID: mdl-29854210

ABSTRACT

Research data warehouses integrate research and patient data from one or more sources into a single data model that is designed for research. Typically, institutions update their warehouse by fully reloading it periodically. The alternative is to update the warehouse incrementally with new, changed and/or deleted data. Full reloads avoid having to correct and add to a live system, but they can render the data outdated for clinical trial accrual. They place a substantial burden on source systems, involve intermittent work that is challenging to resource, and may involve tight coordination across IT and informatics units. We have implemented daily incremental updating for our i2b2 data warehouse. Incremental updating requires substantial up-front development, and it can expose provisional data to investigators. However, it may support more use cases, it may be a better fit for academic healthcare IT organizational structures, and ongoing support needs appear to be similar or lower.


Subject(s)
Biomedical Research/organization & administration , Data Warehousing/methods , Databases as Topic/organization & administration , Humans
16.
Mod Healthc ; 47(8): 20-22, 2017 Feb.
Article in English | MEDLINE | ID: mdl-30605579

ABSTRACT

Hospitals treating childhood emergencies need instant access to records. By building data warehouses to centralize that information, they've discovered a new tool for preventing emergencies in the first place.


Subject(s)
Data Warehousing/methods , Electronic Health Records , Emergency Service, Hospital/organization & administration , Hospitals, Pediatric/organization & administration , Humans , Information Storage and Retrieval , Organizational Case Studies , Practice Patterns, Physicians' , Quality Improvement , Texas
17.
BMJ Open ; 6(8): e010962, 2016 08 04.
Article in English | MEDLINE | ID: mdl-27491665

ABSTRACT

INTRODUCTION: Blood transfusion has health-related, economical and safety implications. In order to optimise the transfusion chain, comprehensive research data are needed. The Dutch Transfusion Data warehouse (DTD) project aims to establish a data warehouse where data from donors and transfusion recipients are linked. This paper describes the design of the data warehouse, challenges and illustrative applications. STUDY DESIGN AND METHODS: Quantitative data on blood donors (eg, age, blood group, antibodies) and products (type of product, processing, storage time) are obtained from the national blood bank. These are linked to data on the transfusion recipients (eg, transfusions administered, patient diagnosis, surgical procedures, laboratory parameters), which are extracted from hospital electronic health records. APPLICATIONS: Expected scientific contributions are illustrated for 4 applications: determine risk factors, predict blood use, benchmark blood use and optimise process efficiency. For each application, examples of research questions are given and analyses planned. CONCLUSIONS: The DTD project aims to build a national, continuously updated transfusion data warehouse. These data have a wide range of applications, on the donor/production side, recipient studies on blood usage and benchmarking and donor-recipient studies, which ultimately can contribute to the efficiency and safety of blood transfusion.


Subject(s)
Blood Transfusion , Data Warehousing/methods , Blood Donors , Data Collection , Data Warehousing/standards , Evaluation Studies as Topic , Humans , Netherlands , Research Design , Risk Factors
19.
J Digit Imaging ; 29(3): 309-13, 2016 06.
Article in English | MEDLINE | ID: mdl-26518194

ABSTRACT

In 2010, the DICOM Data Warehouse (DDW) was launched as a data warehouse for DICOM meta-data. Its chief design goals were to have a flexible database schema that enabled it to index standard patient and study information, modality specific tags (public and private), and create a framework to derive computable information (derived tags) from the former items. Furthermore, it was to map the above information to an internally standard lexicon that enables a non-DICOM savvy programmer to write standard SQL queries and retrieve the equivalent data from a cohort of scanners, regardless of what tag that data element was found in over the changing epochs of DICOM and ensuing migration of elements from private to public tags. After 5 years, the original design has scaled astonishingly well. Very little has changed in the database schema. The knowledge base is now fluent in over 90 device types. Also, additional stored procedures have been written to compute data that is derivable from standard or mapped tags. Finally, an early concern is that the system would not be able to address the variability DICOM-SR objects has been addressed. As of this writing the system is indexing 300 MR, 600 CT, and 2000 other (XA, DR, CR, MG) imaging studies per day. The only remaining issue to be solved is the case for tags that were not prospectively indexed-and indeed, this final challenge may lead to a noSQL, big data, approach in a subsequent version.


Subject(s)
Data Warehousing/methods , Information Storage and Retrieval/methods , Radiology Information Systems , Software Design , Databases, Factual , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...