Pesquisa | Portal Regional da BVS (teste)

Streaming constrained binary logistic regression with online standardized data.

Lalloué, Benoît; Monnez, Jean-Marie; Albuisson, Eliane.

J Appl Stat ; 49(6): 1519-1539, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35707109

RESUMO

Online learning is a method for analyzing very large datasets ('big data') as well as data streams. In this article, we consider the case of constrained binary logistic regression and show the interest of using processes with an online standardization of the data, in particular to avoid numerical explosions or to allow the use of shrinkage methods. We prove the almost sure convergence of such a process and propose using a piecewise constant step-size such that the latter does not decrease too quickly and does not reduce the speed of convergence. We compare twenty-four stochastic approximation processes with raw or online standardized data on five real or simulated data sets. Results show that, unlike processes with raw data, processes with online standardized data can prevent numerical explosions and yield the best results.

Sequential linear regression with online standardized data.

Duarte, Kévin; Monnez, Jean-Marie; Albuisson, Eliane.

PLoS One ; 13(1): e0191186, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-29346392

RESUMO

The present study addresses the problem of sequential least square multidimensional linear regression, particularly in the case of a data stream, using a stochastic approximation process. To avoid the phenomenon of numerical explosion which can be encountered and to reduce the computing time in order to take into account a maximum of arriving data, we propose using a process with online standardized data instead of raw data and the use of several observations per step or all observations until the current step. Herein, we define and study the almost sure convergence of three processes with online standardized data: a classical process with a variable step-size and use of a varying number of observations per step, an averaged process with a constant step-size and use of a varying number of observations per step, and a process with a variable or constant step-size and use of all observations until the current step. Their convergence is obtained under more general assumptions than classical ones. These processes are compared to classical processes on 11 datasets for a fixed total number of observations used and thereafter for a fixed processing time. Analyses indicate that the third-defined process typically yields the best results.

Assuntos

Modelos Lineares , Algoritmos , Interpretação Estatística de Dados , Processos Estocásticos

Prognostic Value of Estimated Plasma Volume in Heart Failure.

Duarte, Kévin; Monnez, Jean-Marie; Albuisson, Eliane; Pitt, Bertram; Zannad, Faiez; Rossignol, Patrick.

JACC Heart Fail ; 3(11): 886-93, 2015 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-26541787

RESUMO

OBJECTIVES: The purpose of this study was to assess the prognostic value of the estimation of plasma volume or of its variation beyond clinical examination in a post-hoc analysis of EPHESUS (Eplerenone Post-Acute Myocardial Infarction Heart Failure Efficacy and Survival Study). BACKGROUND: Assessing congestion after discharge is challenging but of paramount importance to optimize patient management and to prevent hospital readmissions. METHODS: The present analysis was performed in a subset of 4,957 patients with available data (within a full dataset of 6,632 patients). The study endpoint was cardiovascular death or hospitalization for heart failure (HF) between months 1 and 3 after post-acute myocardial infarction HF. Estimated plasma volume variation (ΔePVS) between baseline and month 1 was estimated by the Strauss formula, which includes hemoglobin and hematocrit ratios. Other potential predictors, including congestion surrogates, hemodynamic and renal variables, and medical history variables, were tested. An instantaneous estimation of plasma volume at month 1 was defined and also tested. RESULTS: Multivariate analysis was performed with stepwise logistic regression. ΔePVS was selected in the model (odds ratio: 1.01; p = 0.004). The corresponding prognostic gain measured by integrated discrimination improvement was significant (7.57%; p = 0.01). Nevertheless, instantaneous estimation of plasma volume at month 1 was found to be a better predictor than ΔePVS. CONCLUSIONS: In HF complicating myocardial infarction, congestion as assessed by the Strauss formula and an instantaneous derived measurement of plasma volume provided a predictive value of early cardiovascular events beyond routine clinical assessment. Prospective trials to assess congestion management guided by this simple tool to monitor plasma volume are warranted.

Assuntos

Insuficiência Cardíaca/fisiopatologia , Infarto do Miocárdio/fisiopatologia , Volume Plasmático , Adulto , Idoso , Feminino , Insuficiência Cardíaca/diagnóstico , Insuficiência Cardíaca/mortalidade , Hematócrito , Hemoglobinas/metabolismo , Humanos , Modelos Logísticos , Masculino , Pessoa de Meia-Idade , Infarto do Miocárdio/diagnóstico , Infarto do Miocárdio/mortalidade , Valor Preditivo dos Testes , Prognóstico , Medição de Risco , Fatores de Risco , Sensibilidade e Especificidade

Data analysis techniques: a tool for cumulative exposure assessment.

Lalloué, Benoît; Monnez, Jean-Marie; Padilla, Cindy; Kihal, Wahida; Zmirou-Navier, Denis; Deguen, Séverine.

J Expo Sci Environ Epidemiol ; 25(2): 222-30, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-25248936

RESUMO

Everyone is subject to environmental exposures from various sources, with negative health impacts (air, water and soil contamination, noise, etc.or with positive effects (e.g. green space). Studies considering such complex environmental settings in a global manner are rare. We propose to use statistical factor and cluster analyses to create a composite exposure index with a data-driven approach, in view to assess the environmental burden experienced by populations. We illustrate this approach in a large French metropolitan area. The study was carried out in the Great Lyon area (France, 1.2 M inhabitants) at the census Block Group (BG) scale. We used as environmental indicators ambient air NO2 annual concentrations, noise levels and proximity to green spaces, to industrial plants, to polluted sites and to road traffic. They were synthesized using Multiple Factor Analysis (MFA), a data-driven technique without a priori modeling, followed by a Hierarchical Clustering to create BG classes. The first components of the MFA explained, respectively, 30, 14, 11 and 9% of the total variance. Clustering in five classes group: (1) a particular type of large BGs without population; (2) BGs of green residential areas, with less negative exposures than average; (3) BGs of residential areas near midtown; (4) BGs close to industries; and (5) midtown urban BGs, with higher negative exposures than average and less green spaces. Other numbers of classes were tested in order to assess a variety of clustering. We present an approach using statistical factor and cluster analyses techniques, which seem overlooked to assess cumulative exposure in complex environmental settings. Although it cannot be applied directly for risk or health effect assessment, the resulting index can help to identify hot spots of cumulative exposure, to prioritize urban policies or to compare the environmental burden across study areas in an epidemiological framework.

Assuntos

Poluentes Atmosféricos/análise , Análise por Conglomerados , Exposição Ambiental/análise , Análise Fatorial , Dióxido de Nitrogênio/análise , Monitoramento Ambiental , França/epidemiologia , Humanos , Indústrias , Lactente , Mortalidade Infantil , Recém-Nascido , Ruído , Fatores Socioeconômicos , Análise Espacial , Estatística como Assunto

A statistical procedure to create a neighborhood socioeconomic index for health inequalities analysis.

Lalloué, Benoît; Monnez, Jean-Marie; Padilla, Cindy; Kihal, Wahida; Le Meur, Nolwenn; Zmirou-Navier, Denis; Deguen, Séverine.

Int J Equity Health ; 12: 21, 2013 Mar 28.

Artigo em Inglês | MEDLINE | ID: mdl-23537275

RESUMO

INTRODUCTION: In order to study social health inequalities, contextual (or ecologic) data may constitute an appropriate alternative to individual socioeconomic characteristics. Indices can be used to summarize the multiple dimensions of the neighborhood socioeconomic status. This work proposes a statistical procedure to create a neighborhood socioeconomic index. METHODS: The study setting is composed of three French urban areas. Socioeconomic data at the census block scale come from the 1999 census. Successive principal components analyses are used to select variables and create the index. Both metropolitan area-specific and global indices are tested and compared. Socioeconomic categories are drawn with hierarchical clustering as a reference to determine "optimal" thresholds able to create categories along a one-dimensional index. RESULTS: Among the twenty variables finally selected in the index, 15 are common to the three metropolitan areas. The index explains at least 57% of the variance of these variables in each metropolitan area, with a contribution of more than 80% of the 15 common variables. CONCLUSIONS: The proposed procedure is statistically justified and robust. It can be applied to multiple geographical areas or socioeconomic variables and provides meaningful information to public health bodies. We highlight the importance of the classification method. We propose an R package in order to use this procedure.

Assuntos

Disparidades nos Níveis de Saúde , Características de Residência/estatística & dados numéricos , Fatores Socioeconômicos , Análise por Conglomerados , França , Humanos , Análise de Pequenas Áreas , População Urbana

Nonrandom variations in human cancer ESTs indicate that mRNA heterogeneity increases during carcinogenesis.

Brulliard, Marie; Lorphelin, Dalia; Collignon, Olivier; Lorphelin, Walter; Thouvenot, Benoit; Gothié, Emmanuel; Jacquenet, Sandrine; Ogier, Virginie; Roitel, Olivier; Monnez, Jean-Marie; Vallois, Pierre; Yen, Frances T; Poch, Olivier; Guenneugues, Marc; Karcher, Gilles; Oudet, Pierre; Bihain, Bernard E.

Proc Natl Acad Sci U S A ; 104(18): 7522-7, 2007 May 01.

Artigo em Inglês | MEDLINE | ID: mdl-17452638

RESUMO

Virtually all cancer biological attributes are heterogeneous. Because of this, it is currently difficult to reconcile results of cancer transcriptome and proteome experiments. It is also established that cancer somatic mutations arise at rates higher than suspected, but yet are insufficient to explain all cancer cell heterogeneity. We have analyzed sequence variations of 17 abundantly expressed genes in a large set of human ESTs originating from either normal or cancer samples. We show that cancer ESTs have greater variations than normal ESTs for >70% of the tested genes. These variations cannot be explained by known and putative SNPs. Furthermore, cancer EST variations were not random, but were determined by the composition of the substituted base (b0) as well as that of the bases located upstream (up to b - 4) and downstream (up to b + 3) of the substitution event. The replacement base was also not randomly selected but corresponded in most cases (73%) to a repetition of b - 1 or of b + 1. Base substitutions follow a specific pattern of affected bases: A and T substitutions were preferentially observed in cancer ESTs. In contrast, cancer somatic mutations [Sjoblom T, et al. (2006) Science 314:268-274] and SNPs identified in the genes of the current study occurred preferentially with C and G. On the basis of these observations, we developed a working hypothesis that cancer EST heterogeneity results primarily from increased transcription infidelity.

Assuntos

Transformação Celular Neoplásica/genética , Etiquetas de Sequências Expressas , Variação Genética/genética , Neoplasias/genética , Sequência de Bases , Humanos , RNA Mensageiro/genética , Vimentina/genética

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA