Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 71
Filter
1.
Sensors (Basel) ; 24(13)2024 Jun 29.
Article in English | MEDLINE | ID: mdl-39001017

ABSTRACT

The transition to smart manufacturing introduces heightened complexity in regard to the machinery and equipment used within modern collaborative manufacturing landscapes, presenting significant risks associated with equipment failures. The core ambition of smart manufacturing is to elevate automation through the integration of state-of-the-art technologies, including artificial intelligence (AI), the Internet of Things (IoT), machine-to-machine (M2M) communication, cloud technology, and expansive big data analytics. This technological evolution underscores the necessity for advanced predictive maintenance strategies that proactively detect equipment anomalies before they escalate into costly downtime. Addressing this need, our research presents an end-to-end platform that merges the organizational capabilities of data warehousing with the computational efficiency of Apache Spark. This system adeptly manages voluminous time-series sensor data, leverages big data analytics for the seamless creation of machine learning models, and utilizes an Apache Spark-powered engine for the instantaneous processing of streaming data for fault detection. This comprehensive platform exemplifies a significant leap forward in smart manufacturing, offering a proactive maintenance model that enhances operational reliability and sustainability in the digital manufacturing era.

2.
Drug Alcohol Depend ; 262: 111392, 2024 Jul 16.
Article in English | MEDLINE | ID: mdl-39029371

ABSTRACT

BACKGROUND: Little is known about how use patterns of medications for opioid use disorder (MOUDs) evolve from pre-incarceration to post-incarceration among incarcerated individuals with opioid use disorder. This article describes pre- and post-incarceration MOUD receipt during a period when naltrexone was the only type of MOUD offered in a state prison system, the Massachusetts Department of Correction (MADOC). METHODS: A retrospective cohort study of individuals with opioid use disorder who had an incarceration episode in MADOC during January 2015 to March 2019. The data source was the Massachusetts Public Health Data Warehouse, a multi-sector data platform that links individual-level data from multiple statewide datasets. We described patterns of MOUD receipt during the four weeks prior to and after an incarceration episode. Multivariable logistic regression models characterized predictors of post-incarceration MOUD receipt. RESULTS: In the male sample (n=691 incarcerations), from the pre- to post-incarceration periods, receipt of buprenorphine increased (14.3 % to 18.3 %), naltrexone increased (5.0 % to 10.5 %), and methadone decreased (4.7 % to 1.7 %). Similarly, in the female sample (n=892 incarcerations), from the pre- to post-incarceration periods, receipt of buprenorphine increased (10.3 % to 12.3 %, naltrexone increased (4.5 % to 9.3 %), and methadone decreased (5.0 % to 2.9 %). Much of the post-release naltrexone receipt occurred among participants in MADOC's pre-release naltrexone program. CONCLUSIONS: MOUD receipt was low but increased slightly in the post-incarceration period. This change was driven by increases in buprenorphine and naltrexone and despite decreases in methadone.

3.
J Med Internet Res ; 26: e56686, 2024 Jun 11.
Article in English | MEDLINE | ID: mdl-38749399

ABSTRACT

BACKGROUND: Asia consists of diverse nations with extremely variable health care systems. Integrated real-world data (RWD) research warehouses provide vast interconnected data sets that uphold statistical rigor. Yet, their intricate details remain underexplored, restricting their broader applications. OBJECTIVE: Building on our previous research that analyzed integrated RWD warehouses in India, Thailand, and Taiwan, this study extends the research to 7 distinct health care systems: Hong Kong, Indonesia, Malaysia, Pakistan, the Philippines, Singapore, and Vietnam. We aimed to map the evolving landscape of RWD, preferences for methodologies, and database use and archetype the health systems based on existing intrinsic capability for RWD generation. METHODS: A systematic scoping review methodology was used, centering on contemporary English literature on PubMed (search date: May 9, 2023). Rigorous screening as defined by eligibility criteria identified RWD studies from multiple health care facilities in at least 1 of the 7 target Asian nations. Point estimates and their associated errors were determined for the data collected from eligible studies. RESULTS: Of the 1483 real-world evidence citations identified on May 9, 2023, a total of 369 (24.9%) fulfilled the requirements for data extraction and subsequent analysis. Singapore, Hong Kong, and Malaysia contributed to ≥100 publications, with each country marked by a higher proportion of single-country studies at 51% (80/157), 66.2% (86/130), and 50% (50/100), respectively, and were classified as solo scholars. Indonesia, Pakistan, Vietnam, and the Philippines had fewer publications and a higher proportion of cross-country collaboration studies (CCCSs) at 79% (26/33), 58% (18/31), 74% (20/27), and 86% (19/22), respectively, and were classified as global collaborators. Collaboration with countries outside the 7 target nations appeared in 84.2% to 97.7% of the CCCSs of each nation. Among target nations, Singapore and Malaysia emerged as preferred research partners for other nations. From 2018 to 2023, most nations showed an increasing trend in study numbers, with Vietnam (24.5%) and Pakistan (21.2%) leading the growth; the only exception was the Philippines, which declined by -14.5%. Clinical registry databases were predominant across all CCCSs from every target nation. For single-country studies, Indonesia, Malaysia, and the Philippines favored clinical registries; Singapore had a balanced use of clinical registries and electronic medical or health records, whereas Hong Kong, Pakistan, and Vietnam leaned toward electronic medical or health records. Overall, 89.9% (310/345) of the studies took >2 years from completion to publication. CONCLUSIONS: The observed variations in contemporary RWD publications across the 7 nations in Asia exemplify distinct research landscapes across nations that are partially explained by their diverse economic, clinical, and research settings. Nevertheless, recognizing these variations is pivotal for fostering tailored, synergistic strategies that amplify RWD's potential in guiding future health care research and policy decisions. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): RR2-10.2196/43741.


Subject(s)
Delivery of Health Care , Humans , Delivery of Health Care/statistics & numerical data , Asia , Vietnam , Philippines , Indonesia , Malaysia , Pakistan , Singapore , Databases, Factual
5.
JMIR Med Inform ; 11: e42477, 2023 Dec 15.
Article in English | MEDLINE | ID: mdl-38100200

ABSTRACT

BACKGROUND: In recent years, health data collected during the clinical care process have been often repurposed for secondary use through clinical data warehouses (CDWs), which interconnect disparate data from different sources. A large amount of information of high clinical value is stored in unstructured text format. Natural language processing (NLP), which implements algorithms that can operate on massive unstructured textual data, has the potential to structure the data and make clinical information more accessible. OBJECTIVE: The aim of this review was to provide an overview of studies applying NLP to textual data from CDWs. It focuses on identifying the (1) NLP tasks applied to data from CDWs and (2) NLP methods used to tackle these tasks. METHODS: This review was performed according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. We searched for relevant articles in 3 bibliographic databases: PubMed, Google Scholar, and ACL Anthology. We reviewed the titles and abstracts and included articles according to the following inclusion criteria: (1) focus on NLP applied to textual data from CDWs, (2) articles published between 1995 and 2021, and (3) written in English. RESULTS: We identified 1353 articles, of which 194 (14.34%) met the inclusion criteria. Among all identified NLP tasks in the included papers, information extraction from clinical text (112/194, 57.7%) and the identification of patients (51/194, 26.3%) were the most frequent tasks. To address the various tasks, symbolic methods were the most common NLP methods (124/232, 53.4%), showing that some tasks can be partially achieved with classical NLP techniques, such as regular expressions or pattern matching that exploit specialized lexica, such as drug lists and terminologies. Machine learning (70/232, 30.2%) and deep learning (38/232, 16.4%) have been increasingly used in recent years, including the most recent approaches based on transformers. NLP methods were mostly applied to English language data (153/194, 78.9%). CONCLUSIONS: CDWs are central to the secondary use of clinical texts for research purposes. Although the use of NLP on data from CDWs is growing, there remain challenges in this field, especially with regard to languages other than English. Clinical NLP is an effective strategy for accessing, extracting, and transforming data from CDWs. Information retrieved with NLP can assist in clinical research and have an impact on clinical practice.

6.
JMIR Med Inform ; 11: e46725, 2023 Dec 21.
Article in English | MEDLINE | ID: mdl-38153801

ABSTRACT

Background: In recent years, many researchers have focused on the use of legacy data, such as pooled analyses that collect and reanalyze data from multiple studies. However, the methodology for the integration of preexisting databases whose data were collected for different purposes has not been established. Previously, we developed a tool to efficiently generate Study Data Tabulation Model (SDTM) data from hypothetical clinical trial data using the Clinical Data Interchange Standards Consortium (CDISC) SDTM. Objective: This study aimed to design a practical model for integrating preexisting databases using the CDISC SDTM. Methods: Data integration was performed in three phases: (1) the confirmation of the variables, (2) SDTM mapping, and (3) the generation of the SDTM data. In phase 1, the definitions of the variables in detail were confirmed, and the data sets were converted to a vertical structure. In phase 2, the items derived from the SDTM format were set as mapping items. Three types of metadata (domain name, variable name, and test code), based on the CDISC SDTM, were embedded in the Research Electronic Data Capture (REDCap) field annotation. In phase 3, the data dictionary, including the SDTM metadata, was outputted in the Operational Data Model (ODM) format. Finally, the mapped SDTM data were generated using REDCap2SDTM version 2. Results: SDTM data were generated as a comma-separated values file for each of the 7 domains defined in the metadata. A total of 17 items were commonly mapped to 3 databases. Because the SDTM data were set in each database correctly, we were able to integrate 3 independently preexisting databases into 1 database in the CDISC SDTM format. Conclusions: Our project suggests that the CDISC SDTM is useful for integrating multiple preexisting databases.

7.
JAMIA Open ; 6(3): ooad068, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37583654

ABSTRACT

Objective: i2b2 offers the possibility to store biomedical data of different projects in subject oriented data marts of the data warehouse, which potentially requires data replication between different projects and also data synchronization in case of data changes. We present an approach that can save this effort and assess its query performance in a case study that reflects real-world scenarios. Material and Methods: For data segregation, we used PostgreSQL's row level security (RLS) feature, the unit test framework pgTAP for validation and testing as well as the i2b2 application. No change of the i2b2 code was required. Instead, to leverage orchestration and deployment, we additionally implemented a command line interface (CLI). We evaluated performance using 3 different queries generated by i2b2, which we performed on an enlarged Harvard demo dataset. Results: We introduce the open source Python CLI i2b2rls, which orchestrates and manages security roles to implement data marts so that they do not need to be replicated and synchronized as different i2b2 projects. Our evaluation showed that our approach is on average 3.55 and on median 2.71 times slower compared to classic i2b2 data marts, but has more flexibility and easier setup. Conclusion: The RLS-based approach is particularly useful in a scenario with many projects, where data is constantly updated, user and group requirements change frequently or complex user authorization requirements have to be defined. The approach applies to both the i2b2 interface and direct database access.

8.
J Am Med Inform Assoc ; 30(12): 1985-1994, 2023 11 17.
Article in English | MEDLINE | ID: mdl-37632234

ABSTRACT

OBJECTIVE: Patients who receive most care within a single healthcare system (colloquially called a "loyalty cohort" since they typically return to the same providers) have mostly complete data within that organization's electronic health record (EHR). Loyalty cohorts have low data missingness, which can unintentionally bias research results. Using proxies of routine care and healthcare utilization metrics, we compute a per-patient score that identifies a loyalty cohort. MATERIALS AND METHODS: We implemented a computable program for the widely adopted i2b2 platform that identifies loyalty cohorts in EHRs based on a machine-learning model, which was previously validated using linked claims data. We developed a novel validation approach, which tests, using only EHR data, whether patients returned to the same healthcare system after the training period. We evaluated these tools at 3 institutions using data from 2017 to 2019. RESULTS: Loyalty cohort calculations to identify patients who returned during a 1-year follow-up yielded a mean area under the receiver operating characteristic curve of 0.77 using the original model and 0.80 after calibrating the model at individual sites. Factors such as multiple medications or visits contributed significantly at all sites. Screening tests' contributions (eg, colonoscopy) varied across sites, likely due to coding and population differences. DISCUSSION: This open-source implementation of a "loyalty score" algorithm had good predictive power. Enriching research cohorts by utilizing these low-missingness patients is a way to obtain the data completeness necessary for accurate causal analysis. CONCLUSION: i2b2 sites can use this approach to select cohorts with mostly complete EHR data.


Subject(s)
Algorithms , Electronic Health Records , Humans , Machine Learning , Delivery of Health Care , Electronics
9.
Stud Health Technol Inform ; 305: 287-290, 2023 Jun 29.
Article in English | MEDLINE | ID: mdl-37387019

ABSTRACT

Data harmonization is an important step in large-scale data analysis and for generating evidence on real world data in healthcare. With the OMOP common data model, a relevant instrument for data harmonization is available that is being promoted by different networks and communities. At the Hannover Medical School (MHH) in Germany, an Enterprise Clinical Research Data Warehouse (ECRDW) is established and harmonization of that data source is the focus of this work. We present MHH's first implementation of the OMOP common data model on top of the ECRDW data source and demonstrate the challenges concerning the mapping of German healthcare terminologies to a standardized format.


Subject(s)
Data Analysis , Data Warehousing , Germany , Health Facilities , Schools, Medical
10.
J Biomed Inform ; 140: 104325, 2023 04.
Article in English | MEDLINE | ID: mdl-36870586

ABSTRACT

Monoclonal antibodies (MAs) are increasingly used in the therapeutic arsenal. Clinical Data Warehouses (CDWs) offer unprecedented opportunities for research on real-word data. The objective of this work is to develop a knowledge organization system on MAs for therapeutic use (MATUs) applicable in Europe to query CDWs from a multi-terminology server (HeTOP). After expert consensus, three main health thesauri were selected: the MeSH thesaurus, the National Cancer Institute thesaurus (NCIt) and the SNOMED CT. These thesauri contain 1,723 MAs concepts, but only 99 (5.7 %) are identified as MATUs. The knowledge organisation system proposed in this article is a six-level hierarchical system according to their main therapeutic target. It includes 193 different concepts organised in a cross lingual terminology server, which will allow the inclusion of semantic extensions. Ninety nine (51.3 %) MATUs concepts and 94 (48.7 %) hierarchical concepts composed the knowledge organisation system. Two separates groups (an expert group and a validation group) carried out the selection, creation and validation processes. Queries identify, for unstructured data, 83 out of 99 (83.8 %) MATUs corresponding to 45,262 patients, 347,035 hospital stays and 427,544 health documents, and for structured data, 61 out of 99 (61.6 %) MATUs corresponding to 9,218 patients, 59,643 hospital stays and 104,737 hospital prescriptions. The volume of data in the CDW demonstrated the potential for using these data in clinical research, although not all MATUs are present in the CDW (16 missing for unstructured data and 38 for structured data). The knowledge organisation system proposed here improves the understanding of MATUs, the quality of queries and helps clinical researchers retrieve relevant medical information. The use of this model in CDW allows for the rapid identification of a large number of patients and health documents, either directly by a MATU of interest (e.g. Rituximab) but also by searching for parent concepts (e.g. Anti-CD20 Monoclonal Antibody).


Subject(s)
Antibodies, Monoclonal , Vocabulary, Controlled , Humans , Antibodies, Monoclonal/therapeutic use , Systematized Nomenclature of Medicine , Data Warehousing , Europe
11.
Int J Med Inform ; 170: 104976, 2023 02.
Article in English | MEDLINE | ID: mdl-36599261

ABSTRACT

INTRODUCTION: The cytochrome P450 (CYP450) enzyme system is involved in the metabolism of certain drugs and is responsible for most drug interactions. These interactions result in either an enzymatic inhibition or an enzymatic induction mechanism that has an impact on the therapeutic management of patients. Detecting these drug interactions will allow for better predictability in therapeutic response. Therefore, computerized solutions can represent a valuable help for clinicians in their tasks of detection. OBJECTIVE: The objective of this study is to provide a structured data-source of interactions involving the CYP450 enzyme system. These interactions are aimed to be integrated in the cross-lingual multi-terminology server HeTOP (Health Terminologies and Ontologies Portal), to support the query processing of the clinical data warehouse (CDW) EDSaN (Entrepôt de Données de Santé Normand). MATERIAL AND METHODS: A selection and curation of drug components (DCs) that share a relationship with the CYP450 system was performed from several international data sources. The DCs were linked according to the type of relationship which can be substrate, inhibitor, or inducer. These relationships were then integrated into the HeTOP server. To validate the CYP450 relationships, a semantic query was performed on the CDW, whose search engine is founded on HeTOP data (concepts, terms, and relations). RESULTS: A total of 776 DCs are associated by a new interaction relationship, integrated in HeTOP, by 14 enzymes. These are CYP450 1A2, 2A6, 2B6, 2C8, 2C9, 2C18, 2C19, 2D6, 2E1, 3A4, 3A7, 11B1,11B2 mitochondrial and P-glycoprotein, constituting a total of 2,088 relationships. A general modelling of cytochromic interactions was performed. From this model, 233,006 queries were processed in less than two hours, demonstrating the usefulness and performance of our CDW implementation. Moreover, they showed that in our university hospital, the concurrent prescription that could cause a cytochromic interaction is Bisoprolol with Amiodarone by enzymatic inhibition for 2,493 patients. DISCUSSION: The queries submitted to the CDW EDSaN allowed to highlight the most prescribed molecules simultaneously and potentially responsible for cytochromic interactions. In a second step, it would be interesting to evaluate the real clinical impact by looking for possible adverse effects of these interactions in the patients' files. Other computational solutions for cytochromic interactions exist. The impact of CYP450 is particularly important for drugs with narrow therapeutic window (NTW) as they can lead to increased toxicity or therapeutic failure. It is also important to define which drug component is a pro-drug and to considerate the many genetic polymorphisms of patients. CONCLUSION: The HeTOP server contains a non-negligible number of relationships between drug components and CYP450 from multiple reference sources. These data allow us to query our Clinical Data Warehouse to highlight these cytochromic interactions. It would be interesting in the future to assess the actual clinical impact in hospital reports.


Subject(s)
Cytochrome P-450 Enzyme System , Data Warehousing , Humans , Cytochrome P-450 Enzyme System/genetics , Cytochrome P-450 Enzyme System/metabolism
12.
Clin Epidemiol ; 14: 1547-1560, 2022.
Article in English | MEDLINE | ID: mdl-36540898

ABSTRACT

Purpose: Antibiotic-resistant bacteremia is a leading global cause of infectious disease morbidity and mortality. Clinical data warehouses (CDWs) allow for the secure, real-time coupling of diverse data sources from real-world clinical settings, including care-based medical-administrative data and laboratory-based microbiological data. The main purpose of this study was to assess the contribution of CDWs in the epidemiological study of antibiotic resistance by constructing a database of bacteremia patients, BactHub, and describing their main clinico-microbiological features and outcomes. Patients and Methods: Adult patients with bacteremia hospitalized between January 1, 2016 and December 31, 2019 in 14 acute care university hospitals from the Greater Paris area were identified; their first bacteremia episode was included. Data describing patients, episodes of bacteremia, bacterial isolates, and antimicrobial resistance were structured. Results: Among 29,228 patients with bacteremia, 41% of episodes were community-onset (CO) and 59% were hospital-acquired (HA). Thirty-day and ninety-day mortality rates were 15% and 20% in CO episodes, and 18% and 36% in HA episodes. Overall resistance rates were high, including third-generation cephalosporin resistance among Klebsiella pneumoniae (CO 21%, HA 37%) and Escherichia coli (CO 13%, HA 17%), and methicillin resistance among Staphylococcus aureus (CO 11%, HA 14%). Annual incidence rates increased significantly from 2017 to 2019, from 20.0 to 20.9 to 22.1 stays with bacteremia per 1000 stays (p < 0.0001). Conclusion: The Bacthub database provides accurate clinico-microbiological data describing bacteremia across France's largest hospital group. Data from Bacthub may inform surveillance and the clinical decision-making process for bacteremia patients, including choice of antimicrobial therapy. The database also offers opportunities for research, including analysis of hospital care pathways and significant patient outcomes such as mortality and recurrence of infection.

13.
Eur J Cancer ; 177: 72-79, 2022 12.
Article in English | MEDLINE | ID: mdl-36332437

ABSTRACT

AIMS: We analysed the impact of the SARS-CoV-2 pandemic (COVID-19) on the quality of breast cancer care in certified EUSOMA (European Society of Breast Cancer Specialists) breast centres. MATERIALS AND METHODS: The results of the EUSOMA quality indicators were compared, based on pseudonymised individual records, for the periods 1 March 2020 till 30 June 2020 (first COVID-19 peak in most countries in Europe) and 1 March 2019 till 30 June 2019. In addition, a questionnaire was sent to the participating Centres for investigating the impact of the COVID-19 pandemic on the organisation and the quality of breast cancer care. RESULTS: Forty-five centres provided data and 31 (67%) responded to the questionnaire. The total number of new cases dropped by 19% and there was a small significant higher tumour (p = 0.003) and lymph node (p = 0.011) stage at presentation. Comparing quality indicators (12,736 patients) by multivariable analysis showed mostly non-significant differences. Surgery could be performed in a COVID-free zone in 94% of the centres, COVID testing was performed before surgery in 96% of the centres, and surgical case load was reduced in 55% of the centres. Modifications of the indications for neoadjuvant endocrine therapy, chemotherapy, and targeted therapy were necessary in 23%, 23%, and 10% of the centres; changes in indications for adjuvant endocrine, chemo-, targeted, immune, and radiotherapy in 3%, 19%, 3%, 6%, and 10%, respectively. CONCLUSION: Quality of breast cancer care was well maintained in EUSOMA breast centres during the first wave of the COVID-19 pandemic. A small but significantly higher tumour and lymph node stage at presentation was observed.


Subject(s)
Breast Neoplasms , COVID-19 , Humans , Female , Pandemics , SARS-CoV-2 , Breast Neoplasms/diagnostic imaging , Breast Neoplasms/therapy , Breast Neoplasms/pathology , COVID-19 Testing
14.
JMIR Med Inform ; 10(11): e36711, 2022 Nov 01.
Article in English | MEDLINE | ID: mdl-36318244

ABSTRACT

BACKGROUND: Often missing from or uncertain in a biomedical data warehouse (BDW), vital status after discharge is central to the value of a BDW in medical research. The French National Mortality Database (FNMD) offers open-source nominative records of every death. Matching large-scale BDWs records with the FNMD combines multiple challenges: absence of unique common identifiers between the 2 databases, names changing over life, clerical errors, and the exponential growth of the number of comparisons to compute. OBJECTIVE: We aimed to develop a new algorithm for matching BDW records to the FNMD and evaluated its performance. METHODS: We developed a deterministic algorithm based on advanced data cleaning and knowledge of the naming system and the Damerau-Levenshtein distance (DLD). The algorithm's performance was independently assessed using BDW data of 3 university hospitals: Lille, Nantes, and Rennes. Specificity was evaluated with living patients on January 1, 2016 (ie, patients with at least 1 hospital encounter before and after this date). Sensitivity was evaluated with patients recorded as deceased between January 1, 2001, and December 31, 2020. The DLD-based algorithm was compared to a direct matching algorithm with minimal data cleaning as a reference. RESULTS: All centers combined, sensitivity was 11% higher for the DLD-based algorithm (93.3%, 95% CI 92.8-93.9) than for the direct algorithm (82.7%, 95% CI 81.8-83.6; P<.001). Sensitivity was superior for men at 2 centers (Nantes: 87%, 95% CI 85.1-89 vs 83.6%, 95% CI 81.4-85.8; P=.006; Rennes: 98.6%, 95% CI 98.1-99.2 vs 96%, 95% CI 94.9-97.1; P<.001) and for patients born in France at all centers (Nantes: 85.8%, 95% CI 84.3-87.3 vs 74.9%, 95% CI 72.8-77.0; P<.001). The DLD-based algorithm revealed significant differences in sensitivity among centers (Nantes, 85.3% vs Lille and Rennes, 97.3%, P<.001). Specificity was >98% in all subgroups. Our algorithm matched tens of millions of death records from BDWs, with parallel computing capabilities and low RAM requirements. We used the Inseehop open-source R script for this measurement. CONCLUSIONS: Overall, sensitivity/recall was 11% higher using the DLD-based algorithm than that using the direct algorithm. This shows the importance of advanced data cleaning and knowledge of a naming system through DLD use. Statistically significant differences in sensitivity between groups could be found and must be considered when performing an analysis to avoid differential biases. Our algorithm, originally conceived for linking a BDW with the FNMD, can be used to match any large-scale databases. While matching operations using names are considered sensitive computational operations, the Inseehop package released here is easy to run on premises, thereby facilitating compliance with cybersecurity local framework. The use of an advanced deterministic matching algorithm such as the DLD-based algorithm is an insightful example of combining open-source external data to improve the usage value of BDWs.

15.
BMC Bioinformatics ; 23(1): 401, 2022 Sep 29.
Article in English | MEDLINE | ID: mdl-36175857

ABSTRACT

BACKGROUND: Population variant analysis is of great importance for gathering insights into the links between human genotype and phenotype. The 1000 Genomes Project established a valuable reference for human genetic variation; however, the integrative use of the corresponding data with other datasets within existing repositories and pipelines is not fully supported. Particularly, there is a pressing need for flexible and fast selection of population partitions based on their variant and metadata-related characteristics. RESULTS: Here, we target general germline or somatic mutation data sources for their seamless inclusion within an interoperable-format repository, supporting integration among them and with other genomic data, as well as their integrated use within bioinformatic workflows. In addition, we provide VarSum, a data summarization service working on sub-populations of interest selected using filters on population metadata and/or variant characteristics. The service is developed as an optimized computational framework with an Application Programming Interface (API) that can be called from within any existing computing pipeline or programming script. Provided example use cases of biological interest show the relevance, power and ease of use of the API functionalities. CONCLUSIONS: The proposed data integration pipeline and data set extraction and summarization API pave the way for solid computational infrastructures that quickly process cumbersome variation data, and allow biologists and bioinformaticians to easily perform scalable analysis on user-defined partitions of large cohorts from increasingly available genetic variation studies. With the current tendency to large (cross)nation-wide sequencing and variation initiatives, we expect an ever growing need for the kind of computational support hereby proposed.


Subject(s)
Genomics , Metadata , Computational Biology , Genotype , Humans , Software
16.
JAMIA Open ; 5(3): ooac071, 2022 Oct.
Article in English | MEDLINE | ID: mdl-35936991

ABSTRACT

Objectives: Manual record review is a crucial step for electronic health record (EHR)-based research, but it has poor workflows and is error prone. We sought to build a tool that provides a unified environment for data review and chart abstraction data entry. Materials and Methods: ReviewR is an open-source R Shiny application that can be deployed on a single machine or made available to multiple users. It supports multiple data models and database systems, and integrates with the REDCap API for storing abstraction results. Results: We describe 2 real-world uses and extensions of ReviewR. Since its release in April 2021 as a package on CRAN it has been downloaded 2204 times. Discussion and Conclusion: ReviewR provides an easily accessible review interface for clinical data warehouses. Its modular, extensible, and open source nature afford future expansion by other researchers.

17.
Stud Health Technol Inform ; 290: 150-153, 2022 Jun 06.
Article in English | MEDLINE | ID: mdl-35672989

ABSTRACT

Clinical Data Warehouses (CDW) are gold mines and may be useful to manage the COVID-19 outbreak. This article details the use of CDW in order to retrieve patients for vaccination purposes. A list of 34 diseases (or conditions) was published by French Health Authorities to target individuals at a high risk of developing a severe form of COVID. Using a multilevel search engine, 23 queries were built based on structured or unstructured data using natural language processing features. The Diagnosis Related Group coding system was used alone in three queries (13.0%), coupled with unstructured data in four queries (17.4%), and unstructured data were used alone in 16 queries (69.6%). Eleven diseases (conditions) were too broad to be translated into queries. Finally, 6,006 unique re-identified patients were retrieved. This use case demonstrates the usefulness of the Rouen University Hospital CDW in retrieving patients for other purposes than translational research.


Subject(s)
COVID-19 , Data Warehousing , COVID-19/prevention & control , Electronic Health Records , Humans , Natural Language Processing , Vaccination
18.
Stud Health Technol Inform ; 290: 282-286, 2022 Jun 06.
Article in English | MEDLINE | ID: mdl-35673018

ABSTRACT

With the development of clinical databases and the ubiquity of EHRs, physicians and researchers alike have access to an unprecedented amount of data. Complexity of the available data has also increased since clinical reports are also included and require frameworks with natural language processing capabilities in order to process them and extract information not found in other types of documents. In the following work we implement a data processing pipeline performing phenotyping, disambiguation, negation and subject prediction on such reports. We compare it to an existing solution routinely used in a children's hospital with special focus on genetic diseases. We show that by replacing components based on rules and pattern matching with components leveraging deep learning models and fine-tuned word embeddings we obtain performance improvements of 7%, 10% and 27% in terms of F1 measure for each task. The solution we devised will help build more reliable decision support systems.


Subject(s)
Deep Learning , Child , Databases, Factual , Humans , Natural Language Processing
19.
Stud Health Technol Inform ; 290: 1046-1047, 2022 Jun 06.
Article in English | MEDLINE | ID: mdl-35673198

ABSTRACT

PREDIMED, Clinical Data Warehouse of Grenoble Alps University Hospital, is currently participating in daily COVID-19 epidemic follow-up via spatial and chronological analysis of geographical maps. This monitoring is aimed for cluster detection and vulnerable population discovery. Our real-time geographical representations allow us to track the epidemic both inside and outside the hospital.


Subject(s)
COVID-19 , COVID-19/epidemiology , Data Warehousing , Geography , Hospitals, University , Humans
20.
J Med Internet Res ; 24(5): e32845, 2022 05 11.
Article in English | MEDLINE | ID: mdl-35544299

ABSTRACT

Organizational, administrative, and educational challenges in establishing and sustaining biomedical data science infrastructures lead to the inefficient use of Research Patient Data Repositories (RPDRs). The challenges, including but not limited to deployment, sustainability, cost optimization, collaboration, governance, security, rapid response, reliability, stability, scalability, and convenience, restrict each other and may not be naturally alleviated through traditional hardware upgrades or protocol enhancements. This article attempts to borrow data science thinking and practices in the business realm, which we call the data industry viewpoint, to improve RPDRs.


Subject(s)
Databases as Topic , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...