Pesquisa | Portal Regional da BVS (teste)

ZiMM: A deep learning model for long term and blurry relapses with non-clinical claims data.

Kabeshova, Anastasiia; Yu, Yiyang; Lukacs, Bertrand; Bacry, Emmanuel; Gaïffas, Stéphane.

J Biomed Inform ; 110: 103531, 2020 10.

Artigo em Inglês | MEDLINE | ID: mdl-32818667

RESUMO

This paper considers the problems of modeling and predicting a long-term and "blurry" relapse that occurs after a medical act, such as a surgery. We do not consider a short-term complication related to the act itself, but a long-term relapse that clinicians cannot explain easily, since it depends on unknown sets or sequences of past events that occurred before the act. The relapse is observed only indirectly, in a "blurry" fashion, through longitudinal prescriptions of drugs over a long period of time after the medical act. We introduce a new model, called ZiMM (Zero-inflated Mixture of Multinomial distributions) in order to capture long-term and blurry relapses. On top of it, we build an end-to-end deep-learning architecture called ZiMM Encoder-Decoder (ZiMM ED) that can learn from the complex, irregular, highly heterogeneous and sparse patterns of health events that are observed through a claims-only database. ZiMM ED is applied on a "non-clinical" claims database, that contains only timestamped reimbursement codes for drug purchases, medical procedures and hospital diagnoses, the only available clinical feature being the age of the patient. This setting is more challenging than a setting where bedside clinical signals are available. Our motivation for using such a non-clinical claims database is its exhaustivity population-wise, compared to clinical electronic health records coming from a single or a small set of hospitals. Indeed, we consider a dataset containing the claims of almost all French citizens who had surgery for prostatic problems, with a history between 1.5 and 5 years. We consider a long-term (18 months) relapse (urination problems still occur despite surgery), which is blurry since it is observed only through the reimbursement of a specific set of drugs for urination problems. Our experiments show that ZiMM ED improves several baselines, including non-deep learning and deep-learning approaches, and that it allows working on such a dataset with minimal preprocessing work.

Assuntos

Aprendizado Profundo , Bases de Dados Factuais , Registros Eletrônicos de Saúde , Humanos , Recidiva

SCALPEL3: A scalable open-source library for healthcare claims databases.

Bacry, Emmanuel; Gaïffas, Stéphane; Leroy, Fanny; Morel, Maryan; Nguyen, Dinh-Phong; Sebiat, Youcef; Sun, Dian.

Int J Med Inform ; 141: 104203, 2020 09.

Artigo em Inglês | MEDLINE | ID: mdl-32485553

RESUMO

OBJECTIVE: This article introduces SCALPEL3 (Scalable Pipeline for Health Data), a scalable open-source framework for studies involving Large Observational Databases (LODs). It focuses on scalable medical concept extraction, easy interactive analysis, and helpers for data flow analysis to accelerate studies performed on LODs. MATERIALS AND METHODS: Inspired from web analytics, SCALPEL3 relies on distributed computing, data denormalization and columnar storage. It was compared to the existing SAS-Oracle SNDS infrastructure by performing several queries on a dataset containing a three years-long history of healthcare claims of 13.7 million patients. RESULTS AND DISCUSSION: SCALPEL3 horizontal scalability allows handling large tasks quicker than the existing infrastructure while it has comparable performance when using only a few executors. SCALPEL3 provides a sharp interactive control of data processing through legible code, which helps to build studies with full reproducibility, leading to improved maintainability and audit of studies performed on LODs. CONCLUSION: SCALPEL3 makes studies based on SNDS much easier and more scalable than the existing framework [1]. It is now used at the agency collecting SNDS data, at the French Ministry of Health and soon at the National Health Data Hub in France [2].

Assuntos

Atenção à Saúde , Bases de Dados Factuais , França , Humanos , Reprodutibilidade dos Testes

ConvSCCS: convolutional self-controlled case series model for lagged adverse event detection.

Morel, Maryan; Bacry, Emmanuel; Gaïffas, Stéphane; Guilloux, Agathe; Leroy, Fanny.

Biostatistics ; 21(4): 758-774, 2020 10 01.

Artigo em Inglês | MEDLINE | ID: mdl-30851046

RESUMO

With the increased availability of large electronic health records databases comes the chance of enhancing health risks screening. Most post-marketing detection of adverse drug reaction (ADR) relies on physicians' spontaneous reports, leading to under-reporting. To take up this challenge, we develop a scalable model to estimate the effect of multiple longitudinal features (drug exposures) on a rare longitudinal outcome. Our procedure is based on a conditional Poisson regression model also known as self-controlled case series (SCCS). To overcome the need of precise risk periods specification, we model the intensity of outcomes using a convolution between exposures and step functions, which are penalized using a combination of group-Lasso and total-variation. Up to our knowledge, this is the first SCCS model with flexible intensity able to handle multiple longitudinal features in a single model. We show that this approach improves the state-of-the-art in terms of mean absolute error and computation time for the estimation of relative risks on simulated data. We apply this method on an ADR detection problem, using a cohort of diabetic patients extracted from the large French national health insurance database (SNIIRAM), a claims database containing medical reimbursements of more than 53 million people. This work has been done in the context of a research partnership between Ecole Polytechnique and CNAMTS (in charge of SNIIRAM).

Assuntos

Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Estudos de Coortes , Bases de Dados Factuais , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/epidemiologia , Registros Eletrônicos de Saúde , Humanos , Projetos de Pesquisa

Self-Exclusion among Online Poker Gamblers: Effects on Expenditure in Time and Money as Compared to Matched Controls.

Luquiens, Amandine; Dugravot, Aline; Panjo, Henri; Benyamina, Amine; Gaïffas, Stéphane; Bacry, Emmanuel.

Int J Environ Res Public Health ; 16(22)2019 11 11.

Artigo em Inglês | MEDLINE | ID: mdl-31717923

RESUMO

Background: No comparative data is available to report on the effect of online self-exclusion. The aim of this study was to assess the effect of self-exclusion in online poker gambling as compared to matched controls, after the end of the self-exclusion period. Methods: We included all gamblers who were first-time self-excluders over a 7-year period (n = 4887) on a poker website, and gamblers matched for gender, age and account duration (n = 4451). We report the effects over time of self-exclusion after it ended, on money (net losses) and time spent (session duration) using an analysis of variance procedure between mixed models with and without the interaction of time and self-exclusion. Analyzes were performed on the whole sample, on the sub-groups that were the most heavily involved in terms of time or money (higher quartiles) and among short-duration self-excluders (<3 months). Results: Significant effects of self-exclusion and short-duration self-exclusion were found for money and time spent over 12 months. Among the gamblers that were the most heavily involved financially, no significant effect on the amount spent was found. Among the gamblers who were the most heavily involved in terms of time, a significant effect was found on time spent. Short-duration self-exclusions showed no significant effect on the most heavily involved gamblers. Conclusions: Self-exclusion seems efficient in the long term. However, the effect on money spent of self-exclusions and of short-duration self-exclusions should be further explored among the most heavily involved gamblers.

Assuntos

Comportamento Aditivo , Financiamento Pessoal , Jogo de Azar/psicologia , Adulto , Estudos de Casos e Controles , Feminino , Humanos , Masculino , Estudos de Tempo e Movimento , Adulto Jovem

Comparison of methods for early-readmission prediction in a high-dimensional heterogeneous covariates and time-to-event outcome framework.

Bussy, Simon; Veil, Raphaël; Looten, Vincent; Burgun, Anita; Gaïffas, Stéphane; Guilloux, Agathe; Ranque, Brigitte; Jannot, Anne-Sophie.

BMC Med Res Methodol ; 19(1): 50, 2019 03 06.

Artigo em Inglês | MEDLINE | ID: mdl-30841867

RESUMO

BACKGROUND: Choosing the most performing method in terms of outcome prediction or variables selection is a recurring problem in prognosis studies, leading to many publications on methods comparison. But some aspects have received little attention. First, most comparison studies treat prediction performance and variable selection aspects separately. Second, methods are either compared within a binary outcome setting (where we want to predict whether the readmission will occur within an arbitrarily chosen delay or not) or within a survival analysis setting (where the outcomes are directly the censored times), but not both. In this paper, we propose a comparison methodology to weight up those different settings both in terms of prediction and variables selection, while incorporating advanced machine learning strategies. METHODS: Using a high-dimensional case study on a sickle-cell disease (SCD) cohort, we compare 8 statistical methods. In the binary outcome setting, we consider logistic regression (LR), support vector machine (SVM), random forest (RF), gradient boosting (GB) and neural network (NN); while on the survival analysis setting, we consider the Cox Proportional Hazards (PH), the CURE and the C-mix models. We also propose a method using Gaussian Processes to extract meaningfull structured covariates from longitudinal data. RESULTS: Among all assessed statistical methods, the survival analysis ones obtain the best results. In particular the C-mix model yields the better performances in both the two considered settings (AUC =0.94 in the binary outcome setting), as well as interesting interpretation aspects. There is some consistency in selected covariates across methods within a setting, but not much across the two settings. CONCLUSIONS: It appears that learning withing the survival analysis setting first (so using all the temporal information), and then going back to a binary prediction using the survival estimates gives significantly better prediction performances than the ones obtained by models trained "directly" within the binary outcome setting.

Assuntos

Anemia Falciforme/diagnóstico , Anemia Falciforme/terapia , Avaliação de Resultados em Cuidados de Saúde/estatística & dados numéricos , Readmissão do Paciente/estatística & dados numéricos , Estudos de Coortes , Humanos , Modelos Logísticos , Aprendizado de Máquina , Análise Multivariada , Redes Neurais de Computação , Avaliação de Resultados em Cuidados de Saúde/métodos , Prognóstico , Modelos de Riscos Proporcionais , Reprodutibilidade dos Testes , Máquina de Vetores de Suporte , Análise de Sobrevida

C-mix: A high-dimensional mixture model for censored durations, with applications to genetic data.

Bussy, Simon; Guilloux, Agathe; Gaïffas, Stéphane; Jannot, Anne-Sophie.

Stat Methods Med Res ; 28(5): 1523-1539, 2019 05.

Artigo em Inglês | MEDLINE | ID: mdl-29658407

RESUMO

We introduce a supervised learning mixture model for censored durations (C-mix) to simultaneously detect subgroups of patients with different prognosis and order them based on their risk. Our method is applicable in a high-dimensional setting, i.e. with a large number of biomedical covariates. Indeed, we penalize the negative log-likelihood by the Elastic-Net, which leads to a sparse parameterization of the model and automatically pinpoints the relevant covariates for the survival prediction. Inference is achieved using an efficient Quasi-Newton Expectation Maximization algorithm, for which we provide convergence properties. The statistical performance of the method is examined on an extensive Monte Carlo simulation study and finally illustrated on three publicly available genetic cancer datasets with high-dimensional covariates. We show that our approach outperforms the state-of-the-art survival models in this context, namely both the CURE and Cox proportional hazards models penalized by the Elastic-Net, in terms of C-index, AUC( t) and survival prediction. Thus, we propose a powerful tool for personalized medicine in cancerology.

Assuntos

Modelos Estatísticos , Neoplasias/genética , Medicina de Precisão , Algoritmos , Humanos , Método de Monte Carlo , Neoplasias/mortalidade , Prognóstico , Modelos de Riscos Proporcionais

Description and assessment of trustability of motives for self-exclusion reported by online poker gamblers in a cohort using account-based gambling data.

Luquiens, Amandine; Vendryes, Delphine; Aubin, Henri-Jean; Benyamina, Amine; Gaiffas, Stéphane; Bacry, Emmanuel.

BMJ Open ; 8(12): e022541, 2018 12 22.

Artigo em Inglês | MEDLINE | ID: mdl-30580263

RESUMO

OBJECTIVE: Self-exclusion is one of the main responsible gambling tools. The aim of this study was to assess the reliability of self-exclusion motives in self-reports to the gambling service provider. SETTINGS: This is a retrospective cohort using prospective account-based gambling data obtained from a poker gambling provider. PARTICIPANTS: Over a period of 7 years we included all poker gamblers self-excluding for the first time, and reporting a motive for their self-exclusion (n=1996). We explored two groups: self-excluders who self-reported a motive related to addiction and those who reported a commercial motive. RESULTS: No between-group adjusted difference was found on gambling summary variables. Sessions in the two groups were poorly discriminated one from another on four different machine-learning models. More than two-thirds of the gamblers resumed poker gambling after a first self-exclusion (n=1368), half of them within the first month. No between-group difference was found for the course of gambling after the first self-exclusion. 60.1% of first-time self-excluders self-excluded again (n=822). Losses in the previous month were greater before second self-exclusions than before the first. CONCLUSIONS: Reported motives for self-exclusion appear non-informative, and could be misleading. Multiple self-exclusions seem to be more the rule than the exception. The process of self-exclusion should therefore be optimised from the first occurrence to protect heavy gamblers.

Assuntos

Controle Comportamental , Comportamento Aditivo/psicologia , Jogo de Azar/psicologia , Confiança/psicologia , Adaptação Psicológica , Adulto , Estudos de Coortes , Bases de Dados Factuais , Feminino , Jogo de Azar/epidemiologia , Humanos , Masculino , Pessoa de Meia-Idade , Motivação , Estudos Retrospectivos , Autorrelato

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA