Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Sci Rep ; 14(1): 8855, 2024 04 17.
Artigo em Inglês | MEDLINE | ID: mdl-38632488

RESUMO

Health and disease are fundamentally influenced by microbial communities and their genes (the microbiome). An in-depth analysis of microbiome structure that enables the classification of individuals based on their health can be crucial in enhancing diagnostics and treatment strategies to improve the overall well-being of an individual. In this paper, we present a novel semi-supervised methodology known as Randomized Feature Selection based Latent Dirichlet Allocation (RFSLDA) to study the impact of the gut microbiome on a subject's health status. Since the data in our study consists of fuzzy health labels, which are self-reported, traditional supervised learning approaches may not be suitable. As a first step, based on the similarity between documents in text analysis and gut-microbiome data, we employ Latent Dirichlet Allocation (LDA), a topic modeling approach which uses microbiome counts as features to group subjects into relatively homogeneous clusters, without invoking any knowledge of observed health status (labels) of subjects. We then leverage information from the observed health status of subjects to associate these clusters with the most similar health status making it a semi-supervised approach. Finally, a feature selection technique is incorporated into the model to improve the overall classification performance. The proposed method provides a semi-supervised topic modelling approach that can help handle the high dimensionality of the microbiome data in association studies. Our experiments reveal that our semi-supervised classification algorithm is effective and efficient in terms of high classification accuracy compared to popular supervised learning approaches like SVM and multinomial logistic model. The RFSLDA framework is attractive because it (i) enhances clustering accuracy by identifying key bacteria types as indicators of health status, (ii) identifies key bacteria types within each group based on estimates of the proportion of bacteria types within the groups, and (iii) computes a measure of within-group similarity to identify highly similar subjects in terms of their health status.


Assuntos
Microbioma Gastrointestinal , Microbiota , Humanos , Algoritmos
2.
Sci Rep ; 13(1): 3292, 2023 Feb 25.
Artigo em Inglês | MEDLINE | ID: mdl-36841850

RESUMO

Recent advances in technology have led to an explosion of data in virtually all domains of our lives. Modern biomedical devices can acquire a large number of physical readings from patients. Often, these readings are stored in the form of time series data. Such time series data can form the basis for important research to advance healthcare and well being. Due to several considerations including data size, patient privacy, etc., the original, full data may not be available to secondary parties or researchers. Instead, suppose that a subset of the data is made available. A fast and reliable record linkage algorithm enables us to accurately match patient records in the original and subset databases while maintaining privacy. The problem of record linkage when the attributes include time series has not been studied much in the literature. We introduce two main contributions in this paper. First, we propose a novel, very efficient, and scalable record linkage algorithm that is employed on time series data. This algorithm is 400× faster than the previous work. Second, we introduce a privacy preserving framework that enables health institutions to safely release their raw time series records to researchers with bare minimum amount of identifying information.

3.
Accid Anal Prev ; 98: 157-166, 2017 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-27723517

RESUMO

This paper describes a comparison of pedestrian compliance at traffic signals with two types of pedestrian phasing: concurrent, where both pedestrians and vehicular traffic are directed to move in the same directions at the same time, and exclusive, where pedestrians are directed to move during their own dedicated phase while all vehicular traffic is stopped. Exclusive phasing is usually perceived to be safer, especially by senior and disabled advocacy groups, although these safety benefits depend upon pedestrians waiting for the walk signal. This paper investigates whether or not there are differences between pedestrian compliance at signals with exclusive pedestrian phasing and those with concurrent phasing and whether these differences continue to exist when compliance at exclusive phasing signals is evaluated as if they had concurrent phasing. Pedestrian behavior was observed at 42 signalized intersections in central Connecticut with both concurrent and exclusive pedestrian phasing. Binary regression models were estimated to predict pedestrian compliance as a function of the pedestrian phasing type and other intersection characteristics, such as vehicular and pedestrian volume, crossing distance and speed limit. We found that pedestrian compliance is significantly higher at intersections with concurrent pedestrian phasing than at those with exclusive pedestrian phasing, but this difference is not significant when compliance at exclusive phase intersections is evaluated as if it had concurrent phasing. This suggests that pedestrians treat exclusive phase intersections as though they have concurrent phasing, rendering the safety benefits of exclusive pedestrian phasing elusive. No differences were observed for senior or non-senior pedestrians.


Assuntos
Acidentes de Trânsito/prevenção & controle , Pedestres/estatística & dados numéricos , Gestão da Segurança/estatística & dados numéricos , Caminhada , Connecticut , Planejamento Ambiental , Humanos , Modelos Teóricos , População Urbana
4.
Accid Anal Prev ; 99(Pt A): 6-19, 2017 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-27846421

RESUMO

In an effort to improve traffic safety, there has been considerable interest in estimating crash prediction models and identifying factors contributing to crashes. To account for crash frequency variations among crash types and severities, crash prediction models have been estimated by type and severity. The univariate crash count models have been used by researchers to estimate crashes by crash type or severity, in which the crash counts by type or severity are assumed to be independent of one another and modelled separately. When considering crash types and severities simultaneously, this may neglect the potential correlations between crash counts due to the presence of shared unobserved factors across crash types or severities for a specific roadway intersection or segment, and might lead to biased parameter estimation and reduce model accuracy. The focus on this study is to estimate crashes by both crash type and crash severity using the Integrated Nested Laplace Approximation (INLA) Multivariate Poisson Lognormal (MVPLN) model, and identify the different effects of contributing factors on different crash type and severity counts on rural two-lane highways. The INLA MVPLN model can simultaneously model crash counts by crash type and crash severity by accounting for the potential correlations among them and significantly decreases the computational time compared with a fully Bayesian fitting of the MVPLN model using Markov Chain Monte Carlo (MCMC) method. This paper describes estimation of MVPLN models for three-way stop controlled (3ST) intersections, four-way stop controlled (4ST) intersections, four-way signalized (4SG) intersections, and roadway segments on rural two-lane highways. Annual Average Daily traffic (AADT) and variables describing roadway conditions (including presence of lighting, presence of left-turn/right-turn lane, lane width and shoulder width) were used as predictors. A Univariate Poisson Lognormal (UPLN) was estimated by crash type and severity for each highway facility, and their prediction results are compared with the MVPLN model based on the Average Predicted Mean Absolute Error (APMAE) statistic. A UPLN model for total crashes was also estimated to compare the coefficients of contributing factors with the models that estimate crashes by crash type and severity. The model coefficient estimates show that the signs of coefficients for presence of left-turn lane, presence of right-turn lane, land width and speed limit are different across crash type or severity counts, which suggest that estimating crashes by crash type or severity might be more helpful in identifying crash contributing factors. The standard errors of covariates in the MVPLN model are slightly lower than the UPLN model when the covariates are statistically significant, and the crash counts by crash type and severity are significantly correlated. The model prediction comparisons illustrate that the MVPLN model outperforms the UPLN model in prediction accuracy. Therefore, when predicting crash counts by crash type and crash severity for rural two-lane highways, the MVPLN model should be considered to avoid estimation error and to account for the potential correlations among crash type counts and crash severity counts.


Assuntos
Acidentes de Trânsito/estatística & dados numéricos , Condução de Veículo/estatística & dados numéricos , Automóveis/estatística & dados numéricos , Modelos Teóricos , População Rural , Segurança/estatística & dados numéricos , Humanos , Cadeias de Markov , Modelos Estatísticos , Método de Monte Carlo , Distribuição de Poisson , Análise de Regressão
5.
Accid Anal Prev ; 83: 26-36, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26162641

RESUMO

This paper describes the estimation of pedestrian crash count and vehicle interaction severity prediction models for a sample of signalized intersections in Connecticut with either concurrent or exclusive pedestrian phasing. With concurrent phasing, pedestrians cross at the same time as motor vehicle traffic in the same direction receives a green phase, while with exclusive phasing, pedestrians cross during their own phase when all motor vehicle traffic on all approaches is stopped. Pedestrians crossing at each intersection were observed and classified according to the severity of interactions with motor vehicles. Observation intersections were selected to represent both types of signal phasing while controlling for other physical characteristics. In the nonlinear mixed models for interaction severity, pedestrians crossing on the walk signal at an exclusive signal experienced lower interaction severity compared to those crossing on the green light with concurrent phasing; however, pedestrians crossing on a green light where an exclusive phase was available experienced higher interaction severity. Intersections with concurrent phasing have fewer total pedestrian crashes than those with exclusive phasing but more crashes at higher severity levels. It is recommended that exclusive pedestrian phasing only be used at locations where pedestrians are more likely to comply.


Assuntos
Acidentes de Trânsito/estatística & dados numéricos , Sinais (Psicologia) , Planejamento Ambiental , Pedestres , Segurança , Ferimentos e Lesões/epidemiologia , Connecticut , Humanos , Modelos Teóricos , Veículos Automotores , Caminhada
6.
Accid Anal Prev ; 64: 78-85, 2014 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-24333771

RESUMO

Uncovering the temporal trend in crash counts provides a good understanding of the context for pedestrian safety. With a rareness of pedestrian crashes it is impossible to investigate monthly temporal effects with an individual segment/intersection level data, thus the time dependence should be derived from the aggregated level data. Most previous studies have used annual data to investigate the differences in pedestrian crashes between different regions or countries in a given year, and/or to look at time trends of fatal pedestrian injuries annually. Use of annual data unfortunately does not provide sufficient information on patterns in time trends or seasonal effects. This paper describes statistical methods uncovering patterns in monthly pedestrian crashes aggregated on urban roads in Connecticut from January 1995 to December 2009. We investigate the temporal behavior of injury severity levels, including fatal (K), severe injury (A), evident minor injury (B), and non-evident possible injury and property damage only (C and O), as proportions of all pedestrian crashes in each month, taking into consideration effects of time trend, seasonal variations and VMT (vehicle miles traveled). This type of dependent multivariate data is characterized by positive components which sum to one, and occurs in several applications in science and engineering. We describe a dynamic framework with vector autoregressions (VAR) for modeling and predicting compositional time series. Combining these predictions with predictions from a univariate statistical model for total crash counts will then enable us to predict pedestrian crash counts with the different injury severity levels. We compare these predictions with those obtained from fitting separate univariate models to time series of crash counts at each injury severity level. We also show that the dynamic models perform better than the corresponding static models. We implement the Integrated Nested Laplace Approximation (INLA) approach to enable fast Bayesian posterior computation. Taking CO injury severity level as a baseline for the compositional analysis, we conclude that there was a noticeable shift in the proportion of pedestrian crashes from injury severity A to B, while the increase for injury severity K was extremely small over time. This shift to the less severe injury level (from A to B) suggests that the overall safety on urban roads in Connecticut is improving. In January and February, there was some increase in the proportions for levels A and B over the baseline, indicating a seasonal effect. We found evidence that an increase in VMT would result in a decrease of proportions over the baseline for all injury severity levels. Our dynamic model uncovered a decreasing trend in all pedestrian crash counts before April 2005, followed by a noticeable increase and a flattening out until the end of the fitting period. This appears to be largely due to the behavior of injury severity level A pedestrian crashes.


Assuntos
Acidentes de Trânsito/estatística & dados numéricos , Índices de Gravidade do Trauma , Connecticut , Humanos , Modelos Lineares , Modelos Estatísticos , Fatores de Tempo
7.
Accid Anal Prev ; 58: 53-8, 2013 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-23702439

RESUMO

The question of whether crash injury severity should be modeled using an ordinal response model or a non-ordered (multinomial) response model is persistent in traffic safety engineering. This paper proposes the use of the partial proportional odds (PPO) model as a statistical modeling technique that both bridges the gap between ordered and non-ordered response modeling, and avoids violating the key assumptions in the behavior of crash severity inherent in these two alternatives. The partial proportional odds model is a type of logistic regression that allows certain individual predictor variables to ignore the proportional odds assumption which normally forces predictor variables to affect each level of the response variable with the same magnitude, while other predictor variables retain this proportional odds assumption. This research looks at the effectiveness of this PPO technique in predicting vehicular crash severities on Connecticut state roads using data from 1995 to 2009. The PPO model is compared to ordinal and multinomial response models on the basis of adequacy of model fit, significance of covariates, and out-of-sample prediction accuracy. The results of this study show that the PPO model has adequate fit and performs best overall in terms of covariate significance and holdout prediction accuracy. Combined with the ability to accurately represent the theoretical process of crash injury severity prediction, this makes the PPO technique a favorable approach for crash injury severity modeling by adequately modeling and predicting the ordinal nature of the crash severity process and addressing the non-proportional contributions of some covariates.


Assuntos
Acidentes de Trânsito/estatística & dados numéricos , Índices de Gravidade do Trauma , Idoso , Connecticut , Humanos , Modelos Logísticos , Pessoa de Meia-Idade , Modelos Estatísticos , Razão de Chances
8.
Accid Anal Prev ; 50: 1003-13, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22954370

RESUMO

This paper introduces dynamic time series modeling in a Bayesian framework to uncover temporal patterns in highway crashes in Connecticut. Existing state sources provide data describing the time for each crash and demographic attributes of persons involved over the time period from January 1995 to December 2009 as well as the traffic volumes and the characteristics of the roads on which these crashes occurred. Induced exposure techniques are used to estimate the exposure for senior and non-senior drivers by road access type (limited access and surface roads) and area type (urban or rural). We show that these dynamic models fit the data better than the usual GLM framework while also permitting discovery of temporal trends in the estimation of parameters, and that computational difficulties arising from Markov Chain Monte Carlo (MCMC) techniques can be handled by the innovative Integrated Nested Laplace Approximations (INLA). Using these techniques we find that while overall safety is increasing over time, the level of safety for senior drivers has remained more stagnant than for non-senior drivers, particularly on rural limited access roads. The greatest opportunity for improvement of safety for senior drivers is on rural surface roads.


Assuntos
Acidentes de Trânsito/estatística & dados numéricos , Condução de Veículo , Idoso , Teorema de Bayes , Connecticut/epidemiologia , Feminino , Humanos , Escala de Gravidade do Ferimento , Modelos Lineares , Masculino , Cadeias de Markov , Método de Monte Carlo , Fatores de Risco , Segurança , Fatores de Tempo
9.
Accid Anal Prev ; 38(6): 1071-80, 2006 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-16782038

RESUMO

The study describes an investigation of the relationship between crash occurrence and hourly volume counts for small samples of highway segments from two states: Michigan and Connecticut. We used a hierarchical Bayesian framework to fit binary regression models for predicting crash occurrence for each of four crash types: (1) single-vehicle, (2) multi-vehicle same direction, (3) multi-vehicle opposite direction, and (4) multi-vehicle intersecting direction, as a function of the hourly volume, segment length, speed limit and pavement width. The results reveal how the relationship between crashes and hourly volume varies by time of day, thus improving the accuracy of crash occurrence predictions. The results show that even accounting for time of day, the disaggregate exposure measure - hourly volume - is indeed non-linear for each of the four crash types. This implies that at any time of day, the crash occurrence is not proportional to the hourly volume. These findings help us to further understand the relationship between crash occurrence and hourly volume, segment length and other risk factors, and facilitate more meaningful comparisons of the safety record of seemingly similar highway locations.


Assuntos
Acidentes de Trânsito/estatística & dados numéricos , Condução de Veículo/estatística & dados numéricos , Teorema de Bayes , População Rural/estatística & dados numéricos , Connecticut , Humanos , Michigan , Análise de Regressão
10.
Accid Anal Prev ; 36(2): 183-91, 2004 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-14642873

RESUMO

A critical part of any risk assessment is identifying how to represent exposure to the risk involved. Recent research shows that the relationship between crash count and traffic volume is non-linear; consequently, a simple crash rate computed as the ratio of crash count to volume is not proper for comparing the safety of sites with different traffic volumes. To solve this problem, we describe a new approach for relating traffic volume and crash incidence. Specifically, we disaggregate crashes into four types: (1) single-vehicle, (2) multi-vehicle same direction, (3) multi-vehicle opposite direction, and (4) multi-vehicle intersecting, and define candidate exposure measures for each that we hypothesize will be linear with respect to each crash type. This paper describes initial investigation using crash and physical characteristics data for highway segments in Michigan from the Highway Safety Information System (HSIS). We use zero-inflated-Poisson (ZIP) modeling to estimate models for predicting counts for each of the above crash types as a function of the daily volume, segment length, speed limit and roadway width. We found that the relationship between crashes and the daily volume (AADT) is non-linear and varies by crash type, and is significantly different from the relationship between crashes and segment length for all crash types. Our research will provide information to improve accuracy of crash predictions and, thus, facilitate more meaningful comparison of the safety record of seemingly similar highway locations.


Assuntos
Acidentes de Trânsito/prevenção & controle , Acidentes de Trânsito/estatística & dados numéricos , Medição de Risco/métodos , Condução de Veículo/estatística & dados numéricos , Humanos , Michigan , Modelos Estatísticos , Veículos Automotores/classificação , População Rural/estatística & dados numéricos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...