Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 2 de 2
Filter
Add more filters










Database
Language
Publication year range
1.
Int J Med Inform ; 141: 104203, 2020 09.
Article in English | MEDLINE | ID: mdl-32485553

ABSTRACT

OBJECTIVE: This article introduces SCALPEL3 (Scalable Pipeline for Health Data), a scalable open-source framework for studies involving Large Observational Databases (LODs). It focuses on scalable medical concept extraction, easy interactive analysis, and helpers for data flow analysis to accelerate studies performed on LODs. MATERIALS AND METHODS: Inspired from web analytics, SCALPEL3 relies on distributed computing, data denormalization and columnar storage. It was compared to the existing SAS-Oracle SNDS infrastructure by performing several queries on a dataset containing a three years-long history of healthcare claims of 13.7 million patients. RESULTS AND DISCUSSION: SCALPEL3 horizontal scalability allows handling large tasks quicker than the existing infrastructure while it has comparable performance when using only a few executors. SCALPEL3 provides a sharp interactive control of data processing through legible code, which helps to build studies with full reproducibility, leading to improved maintainability and audit of studies performed on LODs. CONCLUSION: SCALPEL3 makes studies based on SNDS much easier and more scalable than the existing framework [1]. It is now used at the agency collecting SNDS data, at the French Ministry of Health and soon at the National Health Data Hub in France [2].


Subject(s)
Delivery of Health Care , Databases, Factual , France , Humans , Reproducibility of Results
2.
Biostatistics ; 21(4): 758-774, 2020 10 01.
Article in English | MEDLINE | ID: mdl-30851046

ABSTRACT

With the increased availability of large electronic health records databases comes the chance of enhancing health risks screening. Most post-marketing detection of adverse drug reaction (ADR) relies on physicians' spontaneous reports, leading to under-reporting. To take up this challenge, we develop a scalable model to estimate the effect of multiple longitudinal features (drug exposures) on a rare longitudinal outcome. Our procedure is based on a conditional Poisson regression model also known as self-controlled case series (SCCS). To overcome the need of precise risk periods specification, we model the intensity of outcomes using a convolution between exposures and step functions, which are penalized using a combination of group-Lasso and total-variation. Up to our knowledge, this is the first SCCS model with flexible intensity able to handle multiple longitudinal features in a single model. We show that this approach improves the state-of-the-art in terms of mean absolute error and computation time for the estimation of relative risks on simulated data. We apply this method on an ADR detection problem, using a cohort of diabetic patients extracted from the large French national health insurance database (SNIIRAM), a claims database containing medical reimbursements of more than 53 million people. This work has been done in the context of a research partnership between Ecole Polytechnique and CNAMTS (in charge of SNIIRAM).


Subject(s)
Drug-Related Side Effects and Adverse Reactions , Cohort Studies , Databases, Factual , Drug-Related Side Effects and Adverse Reactions/epidemiology , Electronic Health Records , Humans , Research Design
SELECTION OF CITATIONS
SEARCH DETAIL
...