Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
1.
PLoS One ; 19(5): e0297544, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38809823

RESUMO

Statistical quality control is concerned with the analysis of production and manufacturing processes. Control charts are process control techniques, commonly applied to observe and control deviations. Shewhart control charts are very sensitive and used for large shifts based on the basic assumption of normality. Cumulative Sum (CUSUM) control charts are effective for identifying that may have special causes, such as outliers or excessive variability in subgroup means. This study uses a CUSUM control chart problems structure to evaluate the performance of robust dispersion parameters. We investigated the design structure features of various control charts, based on currently defined estimators and some new robust scale estimators using trimming and winsorization in different scenarios. The Median Absolute Deviation based on trimming and winsorization is introduced. The effectiveness of CUSUM control charts based on these estimators is evaluated in terms of average run length (ARL) and Standard Deviation of the Run Length (SDRL) using a simulation study. The results show the robustness of the CUSUM chart in observing small changes in magnitude for both normal and contaminated data. In general, robust estimators MADTM and MADWM based on CUSUM charts outperform in all environments.


Assuntos
Controle de Qualidade , Modelos Estatísticos , Simulação por Computador , Algoritmos
2.
BMC Med Inform Decis Mak ; 24(1): 120, 2024 May 07.
Artigo em Inglês | MEDLINE | ID: mdl-38715002

RESUMO

In recent times, time-to-event data such as time to failure or death is routinely collected alongside high-throughput covariates. These high-dimensional bioinformatics data often challenge classical survival models, which are either infeasible to fit or produce low prediction accuracy due to overfitting. To address this issue, the focus has shifted towards introducing a novel approaches for feature selection and survival prediction. In this article, we propose a new hybrid feature selection approach that handles high-dimensional bioinformatics datasets for improved survival prediction. This study explores the efficacy of four distinct variable selection techniques: LASSO, RSF-vs, SCAD, and CoxBoost, in the context of non-parametric biomedical survival prediction. Leveraging these methods, we conducted comprehensive variable selection processes. Subsequently, survival analysis models-specifically CoxPH, RSF, and DeepHit NN-were employed to construct predictive models based on the selected variables. Furthermore, we introduce a novel approach wherein only variables consistently selected by a majority of the aforementioned feature selection techniques are considered. This innovative strategy, referred to as the proposed method, aims to enhance the reliability and robustness of variable selection, subsequently improving the predictive performance of the survival analysis models. To evaluate the effectiveness of the proposed method, we compare the performance of the proposed approach with the existing LASSO, RSF-vs, SCAD, and CoxBoost techniques using various performance metrics including integrated brier score (IBS), concordance index (C-Index) and integrated absolute error (IAE) for numerous high-dimensional survival datasets. The real data applications reveal that the proposed method outperforms the competing methods in terms of survival prediction accuracy.


Assuntos
Redes Neurais de Computação , Humanos , Análise de Sobrevida , Estatísticas não Paramétricas , Biologia Computacional/métodos
3.
Sci Rep ; 13(1): 20020, 2023 Nov 16.
Artigo em Inglês | MEDLINE | ID: mdl-37973894

RESUMO

The article introduces a novel Bayesian AEWMA Control Chart that integrates different loss functions (LFs) like the square error loss function and Linex loss function under an informative prior for posterior and posterior predictive distributions, implemented across diverse ranked set sampling (RSS) designs. The main objective is to detect small to moderate shifts in the process mean, with the average run length and standard deviation of run length serving as performance measures. The study employs a hard bake process in semiconductor production to demonstrate the effectiveness of the proposed chart, comparing it with existing control charts through Monte Carlo simulations. The results underscore the superiority of the proposed approach, particularly under RSS designs compared to simple random sampling (SRS), in identifying out-of-control signals. Overall, this study contributes a comprehensive method integrating various LFs and RSS schemes, offering a more precise and efficient approach for detecting shifts in the process mean. Real-world applications highlight the heightened sensitivity of the suggested chart in identifying out-of-control signals compared to existing Bayesian charts using SRS.

4.
Front Public Health ; 10: 922795, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35968475

RESUMO

In this article, a new hybrid time series model is proposed to predict COVID-19 daily confirmed cases and deaths. Due to the variations and complexity in the data, it is very difficult to predict its future trajectory using linear time series or mathematical models. In this research article, a novel hybrid ensemble empirical mode decomposition and error trend seasonal (EEMD-ETS) model has been developed to forecast the COVID-19 pandemic. The proposed hybrid model decomposes the complex, nonlinear, and nonstationary data into different intrinsic mode functions (IMFs) from low to high frequencies, and a single monotone residue by applying EEMD. The stationarity of each IMF component is checked with the help of the augmented Dicky-Fuller (ADF) test and is then used to build up the EEMD-ETS model, and finally, future predictions have been obtained from the proposed hybrid model. For illustration purposes and to check the performance of the proposed model, four datasets of daily confirmed cases and deaths from COVID-19 in Italy, Germany, the United Kingdom (UK), and France have been used. Similarly, four different statistical metrics, i.e., root mean square error (RMSE), symmetric mean absolute parentage error (sMAPE), mean absolute error (MAE), and mean absolute percentage error (MAPE) have been used for a comparison of different time series models. It is evident from the results that the proposed hybrid EEMD-ETS model outperforms the other time series and machine learning models. Hence, it is worthy to be used as an effective model for the prediction of COVID-19.


Assuntos
COVID-19 , COVID-19/epidemiologia , Previsões , Humanos , Modelos Teóricos , Pandemias , Estações do Ano
5.
Sci Rep ; 12(1): 10992, 2022 06 29.
Artigo em Inglês | MEDLINE | ID: mdl-35768449

RESUMO

Outlying observations have a large influence on the linear model selection process. In this article, we present a novel approach to robust model selection in linear regression to accommodate the situations where outliers are present in the data. The model selection criterion is based on two components, the robust conditional expected prediction loss, and a robust goodness-of-fit with a penalty term. We estimate the conditional expected prediction loss by using the out-of-bag stratified bootstrap approach. In the presence of outliers, the stratified bootstrap ensures that we obtain bootstrap samples that are similar to the original sample data. Furthermore, to control the undue effect of outliers, we use the robust MM-estimator and a bounded loss function in the proposed criterion. Specifically, we observe that instead of minimizing the penalized loss function or the conditional expected prediction loss separately, it is better to minimize them simultaneously. The simulation and real-data based studies confirm the consistent and satisfactory behavior of our bootstrap model selection procedure in the presence of response outliers and covariate outliers.


Assuntos
Modelos Estatísticos , Simulação por Computador , Modelos Lineares
6.
Sci Total Environ ; 793: 148595, 2021 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-34174604

RESUMO

In the present study, hydro-meteorological variables of Chitral Basin in Hindukush region of Pakistan were studied to predict the changes in climatic components such as temperature, precipitation, humidity and river flow based on observed data from 1990 to 2019. Uncertainties in climate change projection were studied using various statistical methods, such as trend variability analysis via stationarity test and validation of regression assumptions prior to fitting of regression estimates. Also, multiple regression models were estimated for each hydro-meteorological variables for the given 30 years of observed data. Results demonstrated that temperature and, precipitation were inversely related with one another. It was observed from the regression model that temperature is decreases by 0.309 °C on the average increases in precipitation by one unit. Temperature also decreases for the increase in humidity by average 0.086 °C. Since, precipitation is negatively related with temperature, thus for increases in temperature the annual precipitation decreases by 0.278 mm annually. Humidity on the other hand, increases by 0.207% by increasing in precipitation and the temperature that causes humidity to decrease by 0.99%. Thus, it demonstrated that the flow in Chitral river increases due to precipitation by 0.306 m3/s for the change in precipitation by one unit. Findings from the present study negated the general perceptions that flow in the Chitral river has increased due to recession of glaciers with increase in the intensity of temperature.


Assuntos
Mudança Climática , Rios , Meteorologia , Análise de Regressão , Temperatura
7.
PeerJ Comput Sci ; 7: e562, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34141889

RESUMO

In this paper, a novel feature selection method called Robust Proportional Overlapping Score (RPOS), for microarray gene expression datasets has been proposed, by utilizing the robust measure of dispersion, i.e., Median Absolute Deviation (MAD). This method robustly identifies the most discriminative genes by considering the overlapping scores of the gene expression values for binary class problems. Genes with a high degree of overlap between classes are discarded and the ones that discriminate between the classes are selected. The results of the proposed method are compared with five state-of-the-art gene selection methods based on classification error, Brier score, and sensitivity, by considering eleven gene expression datasets. Classification of observations for different sets of selected genes by the proposed method is carried out by three different classifiers, i.e., random forest, k-nearest neighbors (k-NN), and support vector machine (SVM). Box-plots and stability scores of the results are also shown in this paper. The results reveal that in most of the cases the proposed method outperforms the other methods.

8.
PLoS One ; 15(11): e0242762, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33253248

RESUMO

OBJECTIVES: Forecasting epidemics like COVID-19 is of crucial importance, it will not only help the governments but also, the medical practitioners to know the future trajectory of the spread, which might help them with the best possible treatments, precautionary measures and protections. In this study, the popular autoregressive integrated moving average (ARIMA) will be used to forecast the cumulative number of confirmed, recovered cases, and the number of deaths in Pakistan from COVID-19 spanning June 25, 2020 to July 04, 2020 (10 days ahead forecast). METHODS: To meet the desire objectives, data for this study have been taken from the Ministry of National Health Service of Pakistan's website from February 27, 2020 to June 24, 2020. Two different ARIMA models will be used to obtain the next 10 days ahead point and 95% interval forecast of the cumulative confirmed cases, recovered cases, and deaths. Statistical software, RStudio, with "forecast", "ggplot2", "tseries", and "seasonal" packages have been used for data analysis. RESULTS: The forecasted cumulative confirmed cases, recovered, and the number of deaths up to July 04, 2020 are 231239 with a 95% prediction interval of (219648, 242832), 111616 with a prediction interval of (101063, 122168), and 5043 with a 95% prediction interval of (4791, 5295) respectively. Statistical measures i.e. root mean square error (RMSE) and mean absolute error (MAE) are used for model accuracy. It is evident from the analysis results that the ARIMA and seasonal ARIMA model is better than the other time series models in terms of forecasting accuracy and hence recommended to be used for forecasting epidemics like COVID-19. CONCLUSION: It is concluded from this study that the forecasting accuracy of ARIMA models in terms of RMSE, and MAE are better than the other time series models, and therefore could be considered a good forecasting tool in forecasting the spread, recoveries, and deaths from the current outbreak of COVID-19. Besides, this study can also help the decision-makers in developing short-term strategies with regards to the current number of disease occurrences until an appropriate medication is developed.


Assuntos
COVID-19/epidemiologia , Previsões , Humanos , Modelos Estatísticos , Paquistão/epidemiologia , Estações do Ano
9.
J Pak Med Assoc ; 70(7): 1169-1172, 2020 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-32799268

RESUMO

OBJECTIVE: To assess the risk factors associated with tonsillitis. METHODS: The cross-sectional study was conducted at Mardan Medical Complex and District Headquarter Hospital, Mardan, Pakistan, from January to June 2018, and comprised tonsillitis patients. Data was collected using a questionnaire which included different risk factors like age 1-10 years, gender, residential area, dietary habit etc. Data was analysed using SPSS 20. RESULTS: Of the 325 subjects, 200(61.54%), were clinically diagnosed with tonsillitis; 138(69%) being males. Age, unhygienic living condition, balanced diet, stressful environment and the use of sore/spicy foods were identified as significantly associated factors (p<0.05). CONCLUSIONS: Age, unhygienic living condition, balanced diet, stressful environment and the use of sore/spicy food were found to have a strong association with tonsillitis.


Assuntos
Tonsilite , Criança , Pré-Escolar , Estudos Transversais , Comportamento Alimentar , Humanos , Lactente , Masculino , Paquistão/epidemiologia , Fatores de Risco , Tonsilite/epidemiologia
10.
J Health Econ ; 70: 102257, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-31923782

RESUMO

We investigate whether social interactions among pregnant women can lead to increased Medicaid participation within this population. Using geographically fine vital statistics data, we exploit variation in Medicaid use among recently pregnant mothers, within small neighborhoods, to study the impact on participation among currently pregnant women. Women are more likely to use Medicaid benefits while pregnant including prenatal care, when previously pregnant women on their census block also received similar benefits. Network effects are relatively larger for young first-time mothers as well as for women within neighborhoods with lower initial levels of welfare program knowledge.


Assuntos
Redes Comunitárias , Cobertura do Seguro , Medicaid , Adolescente , Adulto , Feminino , Humanos , Observação , Grupo Associado , Gravidez , Estados Unidos , Estatísticas Vitais , Adulto Jovem
11.
J Pak Med Assoc ; 70(12(B)): 2356-2362, 2020 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-33475543

RESUMO

OBJECTIVE: The aim of this study is to filter out the most informative genes that mainly regulate the target tissue class, increase classification accuracy, reduce the curse of dimensionality, and discard redundant and irrelevant genes. METHOD: This paper presented the idea of gene selection using bagging sub-forest (BSF). The proposed method provided genes importance grounded on the idea specified in the standard random forest algorithm. The new method is compared with three state-of-the art methods, i.e., Wilcoxon, masked painter and proportional overlapped score (POS). These methods were applied on 5 data sets, i.e. Colon, Lymph node breast cancer, Leukaemia, Serrated colorectal carcinomas, and Breast Cancer. Comparison was done by selecting top 20 genes by applying the gene selection methods and applying random forest (RF) and support vector machine (SVM) classifiers to assess their predictive performance on the datasets with selected genes. Classification accuracy, Brier score, and sensitivity have been used as performance measures. RESULTS: The proposed method gave better results than the other methods using both random forest and SVM classifiers on all the datasets among all the feature selection methods. CONCLUSIONS: The proposed method showed improved performance in terms of classification accuracy, Brier score and sensitivity, and hence, could be used as a novel method for gene selection to classify tissue samples into their correct classes.


Assuntos
Aprendizado de Máquina , Máquina de Vetores de Suporte , Algoritmos , Genes Reguladores , Genômica , Humanos
12.
J Pak Med Assoc ; 69(12): 1767-1770, 2019 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-31853100

RESUMO

OBJECTIVE: To estimate the prevalence of asthma in children aged <10 years, and to identify important risk factors for asthma.. METHODS: The case-control study was conducted at Mardan Medical Complex and District Head Quarters Hospital, Mardan, Pakistan, from June to September 2017. Data was collected from paediatric patients of asthma as well as healthy controls through a self-designed questionnaire. SPSS 19 was used for data analysis. RESULTS: Of the 647 subjects, 349(54%) were asthmatic cases and 298(46%) were controls. Among the cases, 201(57.6%) were females, while 148(42.4%) were males. There were 332(51%) subjects whose fathers were smokers, and of them 224(67%) had asthma and 125(37%) were non-asthmatic. Overall, 323(50%) subjects had carpet in their rooms, and of them 221(68%) had asthma. Among other risk factors, subjects aged <5 years had 1.49 time more likely to have asthma with (odds ratio: 1.49, 95% confidence interval: 0.963-1.988). CONCLUSIONS: Female gender, fathers' smoking, having carpet in the room and age <5 year were found to be the main risk factors associated with asthma.


Assuntos
Asma/epidemiologia , Estudos de Casos e Controles , Criança , Pré-Escolar , Feminino , Conhecimentos, Atitudes e Prática em Saúde , Humanos , Lactente , Recém-Nascido , Masculino , Paquistão/epidemiologia , Pais , Fatores de Risco , Fumar
13.
PLoS One ; 14(11): e0225427, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31756205

RESUMO

Educational researchers, psychologists, social, epidemiological and medical scientists are often dealing with multilevel data. Sometimes, the response variable in multilevel data is categorical in nature and needs to be analyzed through Multilevel Logistic Regression Models. The main theme of this paper is to provide guidelines for the analysts to select an appropriate sample size while fitting multilevel logistic regression models for different threshold parameters and different estimation methods. Simulation studies have been performed to obtain optimum sample size for Penalized Quasi-likelihood (PQL) and Maximum Likelihood (ML) Methods of estimation. Our results suggest that Maximum Likelihood Method performs better than Penalized Quasi-likelihood Method and requires relatively small sample under chosen conditions. To achieve sufficient accuracy of fixed and random effects under ML method, we established ''50/50" and ''120/50" rule respectively. On the basis our findings, a ''50/60" and ''120/70" rules under PQL method of estimation have also been recommended.


Assuntos
Análise Multinível/métodos , Projetos de Pesquisa/normas , Simulação por Computador , Guias como Assunto , Humanos , Funções Verossimilhança , Modelos Logísticos , Tamanho da Amostra
14.
Soc Sci Med ; 237: 112453, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31442823

RESUMO

OBJECTIVE: To study how the recent rise in terrorist activity affects health of children exposed to violence. METHOD: Using spatial and temporal variation in terrorist attacks in Pakistan, combined with a fixed effect strategy at various levels, we identify the causal effect of terrorist activity on height, weight, and health behaviors of children. RESULTS: A one-standard deviation increased intensity of attack, defined as number of fatalities per attack, leads to approximately 5 more children per 1000 being stunted if attacks occur during gestation and between 12 and 19 more children per 1000 being stunted if attacks occur post birth. For low weight, a measure of short-term malnutrition, we find a one-standard deviation increased intensity of terrorist attack leads to between 8 and 12 more children per 1000 being low weight if attacks occur post birth. For both severely stunted and very low weight, we find statistically significant effects only for attacks during gestation. We also document a reduction of between 2 and 8 per 1000 children in vaccination take-up, in response to terrorism immediately before birth. CONCLUSIONS: Overall, we conclude that violent events experienced in utero or in early childhood can have long lasting impacts on health and human capital development. Reduced interaction with healthcare infrastructure is a possible mechanism at work.


Assuntos
Saúde da Criança/estatística & dados numéricos , Terrorismo , Estatura , Peso Corporal , Pré-Escolar , Feminino , Transtornos do Crescimento/epidemiologia , Transtornos do Crescimento/etiologia , Comportamentos Relacionados com a Saúde , Humanos , Lactente , Recém-Nascido , Masculino , Paquistão/epidemiologia , Terrorismo/estatística & dados numéricos , Cobertura Vacinal/estatística & dados numéricos , Adulto Jovem
15.
Comput Math Methods Med ; 2019: 9089856, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30992712

RESUMO

The medical data are often filed for each patient in clinical studies in order to inform decision-making. Usually, medical data are generally skewed to the right, and skewed distributions can be the appropriate candidates in making inferences using Bayesian framework. Furthermore, the Bayesian estimators of skewed distribution can be used to tackle the problem of decision-making in medicine and health management under uncertainty. For medical diagnosis, physician can use the Bayesian estimators to quantify the effects of the evidence in increasing the probability that the patient has the particular disease considering the prior information. The present study focuses the development of Bayesian estimators for three-parameter Frechet distribution using noninformative prior and gamma prior under LINEX (linear exponential) and general entropy (GE) loss functions. Since the Bayesian estimators cannot be expressed in closed forms, approximate Bayesian estimates are discussed via Lindley's approximation. These results are compared with their maximum likelihood counterpart using Monte Carlo simulations. Our results indicate that Bayesian estimators under general entropy loss function with noninformative prior (BGENP) provide the smallest mean square error for all sample sizes and different values of parameters. Furthermore, a data set about the survival times of a group of patients suffering from head and neck cancer is analyzed for illustration purposes.


Assuntos
Teorema de Bayes , Modelos Estatísticos , Biologia Computacional , Simulação por Computador , Tomada de Decisões Assistida por Computador , Neoplasias de Cabeça e Pescoço/mortalidade , Neoplasias de Cabeça e Pescoço/terapia , Humanos , Funções Verossimilhança , Computação Matemática , Método de Monte Carlo , Análise de Sobrevida
16.
PLoS One ; 12(3): e0172807, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28253278

RESUMO

Gene-mapping studies, regularly, rely on examination for Mendelian transmission of marker alleles in a pedigree as a way of screening for genotyping errors and mutations. For analysis of family data sets, it is, usually, necessary to resolve or remove the genotyping errors prior to consideration. At the Center of Inherited Disease Research (CIDR), to deal with their large-scale data flow, they formalized their data cleaning approach in a set of rules based on PedCheck output. We scrutinize via carefully designed simulations that how well CIDR's data cleaning rules work in practice. We found that genotype errors in siblings are detected more often than in parents for less polymorphic SNPs and vice versa for more polymorphic SNPs. Through computer simulations, we conclude that some of the CIDR's rules work poorly in some circumstances, and we suggest a set of modified data cleaning rules that may work better than CIDR's rules.


Assuntos
Alelos , Marcadores Genéticos/genética , Linhagem , Estatística como Assunto/métodos , Adulto , Criança , Feminino , Frequência do Gene , Humanos , Masculino , Projetos de Pesquisa
17.
PLoS One ; 11(11): e0166990, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27898702

RESUMO

Exponential Smooth Transition Autoregressive (ESTAR) models can capture non-linear adjustment of the deviations from equilibrium conditions which may explain the economic behavior of many variables that appear non stationary from a linear viewpoint. Many researchers employ the Kapetanios test which has a unit root as the null and a stationary nonlinear model as the alternative. However this test statistics is based on the assumption of normally distributed errors in the DGP. Cook has analyzed the size of the nonlinear unit root of this test in the presence of heavy-tailed innovation process and obtained the critical values for both finite variance and infinite variance cases. However the test statistics of Cook are oversized. It has been found by researchers that using conventional tests is dangerous though the best performance among these is a HCCME. The over sizing for LM tests can be reduced by employing fixed design wild bootstrap remedies which provide a valuable alternative to the conventional tests. In this paper the size of the Kapetanios test statistic employing hetroscedastic consistent covariance matrices has been derived and the results are reported for various sample sizes in which size distortion is reduced. The properties for estimates of ESTAR models have been investigated when errors are assumed non-normal. We compare the results obtained through the fitting of nonlinear least square with that of the quantile regression fitting in the presence of outliers and the error distribution was considered to be from t-distribution for various sample sizes.


Assuntos
Coleta de Dados/estatística & dados numéricos , Modelos Estatísticos , Dinâmica não Linear , Humanos , Distribuição Normal , Análise de Regressão , Tamanho da Amostra
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...