Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 121
Filtrar
1.
Water Sci Technol ; 90(1): 156-167, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-39007312

RESUMO

Model parameter estimation is a well-known inverse problem, as long as single-value point data are available as observations of system performance measurement. However, classical statistical methods, such as the minimization of an objective function or maximum likelihood, are no longer straightforward, when measurements are imprecise in nature. Typical examples of the latter include censored data and binary information. Here, we explore Approximate Bayesian Computation as a simple method to perform model parameter estimation with such imprecise information. We demonstrate the method for the example of a plain rainfall-runoff model and illustrate the advantages and shortcomings. Last, we outline the value of Shapley values to determine which type of observation contributes to the parameter estimation and which are of minor importance.


Assuntos
Teorema de Bayes , Modelos Teóricos , Chuva , Modelos Estatísticos
2.
Stat Med ; 43(20): 3943-3957, 2024 Sep 10.
Artigo em Inglês | MEDLINE | ID: mdl-38951953

RESUMO

Latent classification model is a class of statistical methods for identifying unobserved class membership among the study samples using some observed data. In this study, we proposed a latent classification model that takes a censored longitudinal binary outcome variable and uses its changing pattern over time to predict individuals' latent class membership. Assuming the time-dependent outcome variables follow a continuous-time Markov chain, the proposed method has two primary goals: (1) estimate the distribution of the latent classes and predict individuals' class membership, and (2) estimate the class-specific transition rates and rate ratios. To assess the model's performance, we conducted a simulation study and verified that our algorithm produces accurate model estimates (ie, small bias) with reasonable confidence intervals (ie, achieving approximately 95% coverage probability). Furthermore, we compared our model to four other existing latent class models and demonstrated that our approach yields higher prediction accuracies for latent classes. We applied our proposed method to analyze the COVID-19 data in Houston, Texas, US collected between January first 2021 and December 31st 2021. Early reports on the COVID-19 pandemic showed that the severity of a SARS-CoV-2 infection tends to vary greatly by cases. We found that while demographic characteristics explain some of the differences in individuals' experience with COVID-19, some unaccounted-for latent variables were associated with the disease.


Assuntos
Algoritmos , COVID-19 , Análise de Classes Latentes , Cadeias de Markov , Humanos , COVID-19/epidemiologia , Estudos Longitudinais , Simulação por Computador , Modelos Estatísticos , Texas/epidemiologia , SARS-CoV-2 , Feminino
3.
JMIR Public Health Surveill ; 10: e52353, 2024 Jul 18.
Artigo em Inglês | MEDLINE | ID: mdl-39024001

RESUMO

BACKGROUND: Multimorbidity is a significant public health concern, characterized by the coexistence and interaction of multiple preexisting medical conditions. This complex condition has been associated with an increased risk of COVID-19. Individuals with multimorbidity who contract COVID-19 often face a significant reduction in life expectancy. The postpandemic period has also highlighted an increase in frailty, emphasizing the importance of integrating existing multimorbidity details into epidemiological risk assessments. Managing clinical data that include medical histories presents significant challenges, particularly due to the sparsity of data arising from the rarity of multimorbidity conditions. Also, the complex enumeration of combinatorial multimorbidity features introduces challenges associated with combinatorial explosions. OBJECTIVE: This study aims to assess the severity of COVID-19 in individuals with multiple medical conditions, considering their demographic characteristics such as age and sex. We propose an evolutionary machine learning model designed to handle sparsity, analyzing preexisting multimorbidity profiles of patients hospitalized with COVID-19 based on their medical history. Our objective is to identify the optimal set of multimorbidity feature combinations strongly associated with COVID-19 severity. We also apply the Apriori algorithm to these evolutionarily derived predictive feature combinations to identify those with high support. METHODS: We used data from 3 administrative sources in Piedmont, Italy, involving 12,793 individuals aged 45-74 years who tested positive for COVID-19 between February and May 2020. From their 5-year pre-COVID-19 medical histories, we extracted multimorbidity features, including drug prescriptions, disease diagnoses, sex, and age. Focusing on COVID-19 hospitalization, we segmented the data into 4 cohorts based on age and sex. Addressing data imbalance through random resampling, we compared various machine learning algorithms to identify the optimal classification model for our evolutionary approach. Using 5-fold cross-validation, we evaluated each model's performance. Our evolutionary algorithm, utilizing a deep learning classifier, generated prediction-based fitness scores to pinpoint multimorbidity combinations associated with COVID-19 hospitalization risk. Eventually, the Apriori algorithm was applied to identify frequent combinations with high support. RESULTS: We identified multimorbidity predictors associated with COVID-19 hospitalization, indicating more severe COVID-19 outcomes. Frequently occurring morbidity features in the final evolved combinations were age>53, R03BA (glucocorticoid inhalants), and N03AX (other antiepileptics) in cohort 1; A10BA (biguanide or metformin) and N02BE (anilides) in cohort 2; N02AX (other opioids) and M04AA (preparations inhibiting uric acid production) in cohort 3; and G04CA (Alpha-adrenoreceptor antagonists) in cohort 4. CONCLUSIONS: When combined with other multimorbidity features, even less prevalent medical conditions show associations with the outcome. This study provides insights beyond COVID-19, demonstrating how repurposed administrative data can be adapted and contribute to enhanced risk assessment for vulnerable populations.


Assuntos
COVID-19 , Hospitalização , Aprendizado de Máquina , Multimorbidade , Humanos , COVID-19/epidemiologia , Itália/epidemiologia , Masculino , Feminino , Idoso , Hospitalização/estatística & dados numéricos , Pessoa de Meia-Idade , Estudos Longitudinais , Idoso de 80 Anos ou mais
4.
Int J Biostat ; 2024 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-38943460

RESUMO

Traditional methods for Sample Size Determination (SSD) based on power analysis exploit relevant fixed values or preliminary estimates for the unknown parameters. A hybrid classical-Bayesian approach can be used to formally incorporate information or model uncertainty on unknown quantities by using prior distributions according to the Bayesian approach, while still analysing the data in a frequentist framework. In this paper, we propose a hybrid procedure for SSD in two-arm superiority trials, that takes into account the different role played by the unknown parameters involved in the statistical power. Thus, different prior distributions are used to formalize design expectations and to model information or uncertainty on preliminary estimates involved at the analysis stage. To illustrate the method, we consider binary data and derive the proposed hybrid criteria using three possible parameters of interest, i.e. the difference between proportions of successes, the logarithm of the relative risk and the logarithm of the odds ratio. Numerical examples taken from the literature are presented to show how to implement the proposed procedure.

5.
BMC Bioinformatics ; 25(1): 155, 2024 Apr 20.
Artigo em Inglês | MEDLINE | ID: mdl-38641616

RESUMO

BACKGROUND: Classification of binary data arises naturally in many clinical applications, such as patient risk stratification through ICD codes. One of the key practical challenges in data classification using machine learning is to avoid overfitting. Overfitting in supervised learning primarily occurs when a model learns random variations from noisy labels in training data rather than the underlying patterns. While traditional methods such as regularization and early stopping have demonstrated effectiveness in interpolation tasks, addressing overfitting in the classification of binary data, in which predictions always amount to extrapolation, demands extrapolation-enhanced strategies. One such approach is hybrid mechanistic/data-driven modeling, which integrates prior knowledge on input features into the learning process, enhancing the model's ability to extrapolate. RESULTS: We present NoiseCut, a Python package for noise-tolerant classification of binary data by employing a hybrid modeling approach that leverages solutions of defined max-cut problems. In a comparative analysis conducted on synthetically generated binary datasets, NoiseCut exhibits better overfitting prevention compared to the early stopping technique employed by different supervised machine learning algorithms. The noise tolerance of NoiseCut stems from a dropout strategy that leverages prior knowledge of input features and is further enhanced by the integration of max-cut problems into the learning process. CONCLUSIONS: NoiseCut is a Python package for the implementation of hybrid modeling for the classification of binary data. It facilitates the integration of mechanistic knowledge on the input features into learning from data in a structured manner and proves to be a valuable classification tool when the available training data is noisy and/or limited in size. This advantage is especially prominent in medical and biomedical applications where data scarcity and noise are common challenges. The codebase, illustrations, and documentation for NoiseCut are accessible for download at https://pypi.org/project/noisecut/ . The implementation detailed in this paper corresponds to the version 0.2.1 release of the software.


Assuntos
Algoritmos , Software , Humanos , Aprendizado de Máquina Supervisionado , Aprendizado de Máquina
6.
BMC Med Res Methodol ; 24(1): 95, 2024 Apr 24.
Artigo em Inglês | MEDLINE | ID: mdl-38658821

RESUMO

BACKGROUND: Multimorbidity is typically associated with deficient health-related quality of life in mid-life, and the likelihood of developing multimorbidity in women is elevated. We address the issue of data sparsity in non-prevalent features by clustering the binary data of various rare medical conditions in a cohort of middle-aged women. This study aims to enhance understanding of how multimorbidity affects COVID-19 severity by clustering rare medical conditions and combining them with prevalent features for predictive modeling. The insights gained can guide the development of targeted interventions and improved management strategies for individuals with multiple health conditions. METHODS: The study focuses on a cohort of 4477 female patients, (aged 45-60) in Piedmont, Italy, and utilizes their multimorbidity data prior to the COVID-19 pandemic from their medical history from 2015 to 2019. The COVID-19 severity is determined by the hospitalization status of the patients from February to May 2020. Each patient profile in the dataset is depicted as a binary vector, where each feature denotes the presence or absence of a specific multimorbidity condition. By clustering the sparse medical data, newly engineered features are generated as a bin of features, and they are combined with the prevalent features for COVID-19 severity predictive modeling. RESULTS: From sparse data consisting of 174 input features, we have created a low-dimensional feature matrix of 17 features. Machine Learning algorithms are applied to the reduced sparsity-free data to predict the Covid-19 hospital admission outcome. The performance obtained for the corresponding models are as follows: Logistic Regression (accuracy 0.72, AUC 0.77, F1-score 0.69), Linear Discriminant Analysis (accuracy 0.7, AUC 0.77, F1-score 0.67), and Ada Boost (accuracy 0.7, AUC 0.77, F1-score 0.68). CONCLUSION: Mapping higher-dimensional data to a low-dimensional space can result in information loss, but reducing sparsity can be beneficial for Machine Learning modeling due to improved predictive ability. In this study, we addressed the issue of data sparsity in electronic health records and created a model that incorporates both prevalent and rare medical conditions, leading to more accurate and effective predictive modeling. The identification of complex associations between multimorbidity and the severity of COVID-19 highlights potential areas of focus for future research, including long COVID and intervention efforts.


Assuntos
COVID-19 , Multimorbidade , SARS-CoV-2 , Humanos , COVID-19/epidemiologia , Feminino , Pessoa de Meia-Idade , Itália/epidemiologia , Análise por Conglomerados , Índice de Gravidade de Doença , Hospitalização/estatística & dados numéricos , Qualidade de Vida , Estudos de Coortes , Aprendizado de Máquina
7.
Stat Methods Med Res ; 32(11): 2226-2239, 2023 11.
Artigo em Inglês | MEDLINE | ID: mdl-37776847

RESUMO

Sparse correlated binary data are frequently encountered in many applications involving either rare event cases or small sample sizes. In this study, we consider correlated binary data and a logit random effects model framework. We discuss h-likelihood estimates and how the computational procedure is affected by sparseness. We propose an adjustment to the fitting process that involves the adaption of the regression calibration method to the estimation of random effects. Using this adjustment, we correct for the bias in the random effects estimates resulting in better properties for the fixed effects estimates of the model. This is supported by the results of the simulation study that was conducted under different sparseness levels. The proposed adjusted h-likelihood estimation approach is also used for the analysis of two real meta-analyses data sets.


Assuntos
Modelos Estatísticos , Funções Verossimilhança , Simulação por Computador , Modelos Logísticos , Análise de Regressão , Viés
8.
Lifetime Data Anal ; 29(4): 807-822, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37438585

RESUMO

In modern biomedical datasets, it is common for recurrent outcomes data to be collected in an incomplete manner. More specifically, information on recurrent events is routinely recorded as a mixture of recurrent event data, panel count data, and panel binary data; we refer to this structure as general mixed recurrent event data. Although the aforementioned data types are individually well-studied, there does not appear to exist an established approach for regression analysis of the three component combination. Often, ad-hoc measures such as imputation or discarding of data are used to homogenize records prior to the analysis, but such measures lead to obvious concerns regarding robustness, loss of efficiency, and other issues. This work proposes a maximum likelihood regression estimation procedure for the combination of general mixed recurrent event data and establishes the asymptotic properties of the proposed estimators. In addition, we generalize the approach to allow for the existence of terminal events, a common complicating feature in recurrent event analysis. Numerical studies and application to the Childhood Cancer Survivor Study suggest that the proposed procedures work well in practical situations.


Assuntos
Análise de Regressão , Humanos , Criança , Simulação por Computador
9.
Nan Fang Yi Ke Da Xue Xue Bao ; 43(1): 105-110, 2023 Jan 20.
Artigo em Chinês | MEDLINE | ID: mdl-36856217

RESUMO

OBJECTIVE: To compare different methods for calculating sample size based on confidence interval estimation for a single proportion with different event incidences and precisions. METHODS: We compared 7 methods, namely Wald, AgrestiCoull add z2, Agresti-Coull add 4, Wilson Score, Clopper-Pearson, Mid-p, and Jefferys, for confidence interval estimation for a single proportion. The sample size was calculated using the search method with different parameter settings (proportion of specified events and half width of the confidence interval [ω=0.05, 0.1]). With Monte Carlo simulation, the estimated sample size was used to simulate and compare the width of the confidence interval, the coverage of the confidence interval and the ratio of the noncoverage probability. RESULTS: For a high accuracy requirement (ω =0.05), the Mid-p method and Clopper Pearson method performed better when the incidence of events was low (P < 0.15). In other settings, the performance of the 7 methods did not differ significantly except for a poor symmetry of the Wald method. In the setting of ω=0.1 with a very low p (0.01-0.05), failure of iteration occurred with nearly all the methods except for the Clopper-Pearson method. CONCLUSION: Different sample size determination methods based on confidence interval estimation should be selected for single proportions with different parameter settings.


Assuntos
Intervalos de Confiança , Tamanho da Amostra , Simulação por Computador , Método de Monte Carlo , Probabilidade
10.
Biometrics ; 79(3): 1788-1800, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-35950524

RESUMO

Correlated binary response data with covariates are ubiquitous in longitudinal or spatial studies. Among the existing statistical models, the most well-known one for this type of data is the multivariate probit model, which uses a Gaussian link to model dependence at the latent level. However, a symmetric link may not be appropriate if the data are highly imbalanced. Here, we propose a multivariate skew-elliptical link model for correlated binary responses, which includes the multivariate probit model as a special case. Furthermore, we perform Bayesian inference for this new model and prove that the regression coefficients have a closed-form unified skew-elliptical posterior with an elliptical prior. The new methodology is illustrated by an application to COVID-19 data from three different counties of the state of California, USA. By jointly modeling extreme spikes in weekly new cases, our results show that the spatial dependence cannot be neglected. Furthermore, the results also show that the skewed latent structure of our proposed model improves the flexibility of the multivariate probit model and provides a better fit to our highly imbalanced dataset.


Assuntos
COVID-19 , Humanos , Teorema de Bayes , COVID-19/epidemiologia , Modelos Estatísticos , Distribuição Normal , Análise Espacial , Método de Monte Carlo , Cadeias de Markov
11.
Birth Defects Res ; 115(3): 327-337, 2023 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-36345811

RESUMO

BACKGROUND: When analyzing fetal defect incidence in laboratory animal studies, correlation in responses within litters (i.e., litter effects) can lead to increased false-positive rates if litter effects are not incorporated into the analysis. Studies of fetal defects require analysis methods that are robust across a broad range of defect types, including those with zero or near-zero incidence rates in control groups. METHODS: A simulation study compared power and false-positive rates for six approaches across a range of background defect rates and litter size distributions. Statistical methods evaluated included ignoring the litter effect as well as parametric and nonparametric approaches based on litter proportions, generalized linear mixed models (GLMMs), the Rao-Scott Cochran-Armitage (RSCA) trend test, and a modification to the RSCA (mRSCA) introduced here to improve estimation at low background rates. These methods were also applied to a common and a rare defect from two prenatal developmental toxicology studies conducted by the National Toxicology Program (NTP). RESULTS: At background defect rates of 1%, the mRSCA and parametric litter proportion methods provided gains in power over the nonparametric litter proportion method, the GLMM method, and the RSCA method. Simulations involving litter loss in high-dose groups showed loss of power for both litter proportion methods. CONCLUSIONS: The mRSCA test developed here compares favorably with other litter-based approaches and is robust across a range of background defect rates and litter size distributions, making it a practical choice for prenatal developmental toxicology studies involving both common and rare fetal defects.


Assuntos
Feto , Cuidado Pré-Natal , Animais , Feminino , Gravidez , Correlação de Dados , Incidência , Tamanho da Ninhada de Vivíparos
12.
Artigo em Chinês | WPRIM (Pacífico Ocidental) | ID: wpr-971501

RESUMO

OBJECTIVE@#To compare different methods for calculating sample size based on confidence interval estimation for a single proportion with different event incidences and precisions.@*METHODS@#We compared 7 methods, namely Wald, AgrestiCoull add z2, Agresti-Coull add 4, Wilson Score, Clopper-Pearson, Mid-p, and Jefferys, for confidence interval estimation for a single proportion. The sample size was calculated using the search method with different parameter settings (proportion of specified events and half width of the confidence interval [ω=0.05, 0.1]). With Monte Carlo simulation, the estimated sample size was used to simulate and compare the width of the confidence interval, the coverage of the confidence interval and the ratio of the noncoverage probability.@*RESULTS@#For a high accuracy requirement (ω =0.05), the Mid-p method and Clopper Pearson method performed better when the incidence of events was low (P < 0.15). In other settings, the performance of the 7 methods did not differ significantly except for a poor symmetry of the Wald method. In the setting of ω=0.1 with a very low p (0.01-0.05), failure of iteration occurred with nearly all the methods except for the Clopper-Pearson method.@*CONCLUSION@#Different sample size determination methods based on confidence interval estimation should be selected for single proportions with different parameter settings.


Assuntos
Intervalos de Confiança , Tamanho da Amostra , Simulação por Computador , Método de Monte Carlo , Probabilidade
13.
J Appl Stat ; 49(16): 4122-4136, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36353303

RESUMO

With the rapid development of modern sensor technology, high-dimensional data streams appear frequently nowadays, bringing urgent needs for effective statistical process control (SPC) tools. In such a context, the online monitoring problem of high-dimensional and correlated binary data streams is becoming very important. Conventional SPC methods for monitoring multivariate binary processes may fail when facing high-dimensional applications due to high computational complexity and the lack of efficiency. In this paper, motivated by an application in extreme weather surveillance, we propose a novel pairwise approach that considers the most informative pairwise correlation between any two data streams. The information is then integrated into an exponential weighted moving average (EWMA) charting scheme to monitor abnormal mean changes in high-dimensional binary data streams. Extensive simulation study together with a real-data analysis demonstrates the efficiency and applicability of the proposed control chart.

14.
BMC Bioinformatics ; 23(1): 381, 2022 Sep 19.
Artigo em Inglês | MEDLINE | ID: mdl-36123637

RESUMO

Biclustering algorithm is an effective tool for processing gene expression datasets. There are two kinds of data matrices, binary data and non-binary data, which are processed by biclustering method. A binary matrix is usually converted from pre-processed gene expression data, which can effectively reduce the interference from noise and abnormal data, and is then processed using a biclustering algorithm. However, biclustering algorithms of dealing with binary data have a poor balance between running time and performance. In this paper, we propose a new biclustering algorithm called the Adjacency Difference Matrix Binary Biclustering algorithm (AMBB) for dealing with binary data to address the drawback. The AMBB algorithm constructs the adjacency matrix based on the adjacency difference values, and the submatrix obtained by continuously updating the adjacency difference matrix is called a bicluster. The adjacency matrix allows for clustering of gene that undergo similar reactions under different conditions into clusters, which is important for subsequent genes analysis. Meanwhile, experiments on synthetic and real datasets visually demonstrate that the AMBB algorithm has high practicability.


Assuntos
Análise de Dados , Perfilação da Expressão Gênica , Algoritmos , Expressão Gênica , Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos
15.
Educ Psychol Meas ; 82(5): 880-910, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-35989724

RESUMO

Exploratory graph analysis (EGA) is a commonly applied technique intended to help social scientists discover latent variables. Yet, the results can be influenced by the methodological decisions the researcher makes along the way. In this article, we focus on the choice regarding the number of factors to retain: We compare the performance of the recently developed EGA with various traditional factor retention criteria. We use both continuous and binary data, as evidence regarding the accuracy of such criteria in the latter case is scarce. Simulation results, based on scenarios resulting from varying sample size, communalities from major factors, interfactor correlations, skewness, and correlation measure, show that EGA outperforms the traditional factor retention criteria considered in most cases in terms of bias and accuracy. In addition, we show that factor retention decisions for binary data are preferably made using Pearson, instead of tetrachoric, correlations, which is contradictory to popular belief.

16.
J Appl Stat ; 49(10): 2535-2549, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35757040

RESUMO

Asymptotic approaches are traditionally used to calculate confidence intervals for intraclass correlation coefficient in a clustered binary study. When sample size is small to medium, or correlation or response rate is near the boundary, asymptotic intervals often do not have satisfactory performance with regard to coverage. We propose using the importance sampling method to construct the profile confidence limits for the intraclass correlation coefficient. Importance sampling is a simulation based approach to reduce the variance of the estimated parameter. Four existing asymptotic limits are used as statistical quantities for sample space ordering in the importance sampling method. Simulation studies are performed to evaluate the performance of the proposed accurate intervals with regard to coverage and interval width. Simulation results indicate that the accurate intervals based on the asymptotic limits by Fleiss and Cuzick generally have shorter width than others in many cases, while the accurate intervals based on Zou and Donner asymptotic limits outperform others when correlation and response rate are close to their boundaries.

17.
Life (Basel) ; 12(5)2022 May 11.
Artigo em Inglês | MEDLINE | ID: mdl-35629386

RESUMO

Quantitative and binary results are ubiquitous in biology. Inasmuch as an underlying genetic basis for the observed variation in these observations can be assumed, it is pertinent to infer the evolutionary relationships among the entities being measured. I present a computer program, PhyloM, that takes measurement data or binary data as input, using which, it directly generates a pairwise distance matrix that can then be subjected to the popular neighbor-joining (NJ) algorithm to produce a phylogenetic tree. PhyloM also has the option of nonparametric bootstrapping for testing the level of support for the inferred phylogeny. Finally, PhyloM also allows the user to root the tree on any desired branch. PhyloM was tested on Biolog Gen III growth data from isolates within the genus Chromobacterium and the closely related Aquitalea sp. This allowed a comparison with the genotypic tree inferred from whole-genome sequences for the same set of isolates. From this comparison, it was possible to infer parallel evolution. PhyloM is a stand-alone and easy-to-use computer program with a user-friendly graphical user interface that computes pairwise distances from measurement or binary data, which can then be used to infer phylogeny using NJ using a utility in the same program. Alternatively, the distance matrix can be downloaded for use in another program for phylogenetic inference or other purposes. It does not require any software to be installed or computer code written and is open source. The executable and computer code are available on GitHub.

18.
Biodivers Data J ; 10: e77875, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35437391

RESUMO

Processing and visualising trends in the binary data (presence or absence of electropherogram peaks), obtained from fragment analysis methods in molecular biology, can be a time-consuming and often cumbersome process. Scoring and analysing binary data (from methods, such as AFLPs, ISSRs and RFLPs) entail complex workflows that require a high level of computational and bioinformatic skills. The application presented here (BinMat) is a free, open-source and user-friendly R Shiny programme (https://clarkevansteenderen.shinyapps.io/BINMAT/) that automates the analysis pipeline on one platform. It is also available as an R package on the Comprehensive R Archive Network (CRAN) (https://cran.r-project.org/web/packages/BinMat/index.html). BinMat consolidates replicate sample pairs of binary data into consensus reads, produces summary statistics and allows the user to visualise their data as ordination plots and clustering trees without having to use multiple programmes and input files or rely on previous programming experience.

19.
BMC Med Res Methodol ; 22(1): 50, 2022 02 20.
Artigo em Inglês | MEDLINE | ID: mdl-35184731

RESUMO

BACKGROUND: Adaptive designs offer added flexibility in the execution of clinical trials, including the possibilities of allocating more patients to the treatments that turned out more successful, and early stopping due to either declared success or futility. Commonly applied adaptive designs, such as group sequential methods, are based on the frequentist paradigm and on ideas from statistical significance testing. Interim checks during the trial will have the effect of inflating the Type 1 error rate, or, if this rate is controlled and kept fixed, lowering the power. RESULTS: The purpose of the paper is to demonstrate the usefulness of the Bayesian approach in the design and in the actual running of randomized clinical trials during phase II and III. This approach is based on comparing the performance of the different treatment arms in terms of the respective joint posterior probabilities evaluated sequentially from the accruing outcome data, and then taking a control action if such posterior probabilities fall below a pre-specified critical threshold value. Two types of actions are considered: treatment allocation, putting on hold at least temporarily further accrual of patients to a treatment arm, and treatment selection, removing an arm from the trial permanently. The main development in the paper is in terms of binary outcomes, but extensions for handling time-to-event data, including data from vaccine trials, are also discussed. The performance of the proposed methodology is tested in extensive simulation experiments, with numerical results and graphical illustrations documented in a Supplement to the main text. As a companion to this paper, an implementation of the methods is provided in the form of a freely available R package 'barts'. CONCLUSION: The proposed methods for trial design provide an attractive alternative to their frequentist counterparts.


Assuntos
Ensaios Clínicos como Assunto , Projetos de Pesquisa , Teorema de Bayes , Simulação por Computador , Humanos , Futilidade Médica , Probabilidade
20.
Educ Psychol Meas ; 82(2): 254-280, 2022 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-35185159

RESUMO

This article studies the Type I error, false positive rates, and power of four versions of the Lagrange multiplier test to detect measurement noninvariance in item response theory (IRT) models for binary data under model misspecification. The tests considered are the Lagrange multiplier test computed with the Hessian and cross-product approach, the generalized Lagrange multiplier test and the generalized jackknife score test. The two model misspecifications are those of local dependence among items and nonnormal distribution of the latent variable. The power of the tests is computed in two ways, empirically through Monte Carlo simulation methods and asymptotically, using the asymptotic distribution of each test under the alternative hypothesis. The performance of these tests is evaluated by means of a simulation study. The results highlight that, under mild model misspecification, all tests have good performance while, under strong model misspecification, the tests performance deteriorates, especially for false positive rates under local dependence and power for small sample size under misspecification of the latent variable distribution. In general, the Lagrange multiplier test computed with the Hessian approach and the generalized Lagrange multiplier test have better performance in terms of false positive rates while the Lagrange multiplier test computed with the cross-product approach has the highest power for small sample sizes. The asymptotic power turns out to be a good alternative to the classic empirical power because it is less time consuming. The Lagrange tests studied here have been also applied to a real data set.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA