RESUMO
A central challenge in hypothesis testing (HT) lies in determining the optimal balance between Type I (false positive) and Type II (non-detection or false negative) error probabilities. Analyzing these errors' exponential rate of convergence, known as error exponents, provides crucial insights into system performance. Error exponents offer a lens through which we can understand how operational restrictions, such as resource constraints and impairments in communications, affect the accuracy of distributed inference in networked systems. This survey presents a comprehensive review of key results in HT, from the foundational Stein's Lemma to recent advancements in distributed HT, all unified through the framework of error exponents. We explore asymptotic and non-asymptotic results, highlighting their implications for designing robust and efficient networked systems, such as event detection through lossy wireless sensor monitoring networks, collective perception-based object detection in vehicular environments, and clock synchronization in distributed environments, among others. We show that understanding the role of error exponents provides a valuable tool for optimizing decision-making and improving the reliability of networked systems.
RESUMO
Information theory explains how systems encode and transmit information. This article examines the neuronal system, which processes information via neurons that react to stimuli and transmit electrical signals. Specifically, we focus on transfer entropy to measure the flow of information between sequences and explore its use in determining effective neuronal connectivity. We analyze the causal relationships between two discrete time series, X:=Xt:t∈Z and Y:=Yt:t∈Z, which take values in binary alphabets. When the bivariate process (X,Y) is a jointly stationary ergodic variable-length Markov chain with memory no larger than k, we demonstrate that the null hypothesis of the test-no causal influence-requires a zero transfer entropy rate. The plug-in estimator for this function is identified with the test statistic of the log-likelihood ratios. Since under the null hypothesis, this estimator follows an asymptotic chi-squared distribution, it facilitates the calculation of p-values when applied to empirical data. The efficacy of the hypothesis test is illustrated with data simulated from a neuronal network model, characterized by stochastic neurons with variable-length memory. The test results identify biologically relevant information, validating the underlying theory and highlighting the applicability of the method in understanding effective connectivity between neurons.
RESUMO
The Non-Informative Nuisance Parameter Principle concerns the problem of how inferences about a parameter of interest should be made in the presence of nuisance parameters. The principle is examined in the context of the hypothesis testing problem. We prove that the mixed test obeys the principle for discrete sample spaces. We also show how adherence of the mixed test to the principle can make performance of the test much easier. These findings are illustrated with new solutions to well-known problems of testing hypotheses for count data.
RESUMO
Proficiency testing (PT) determines the performance of individual laboratories for specific tests or measurements and it is used to monitor the reliability of laboratories measurements. PT plays a highly valuable role as it provides objective evidence of the competence of the participant laboratories. In this paper, we propose a multivariate calibration model to assess equivalence among laboratories measurements in PT. Our method allows to deal with multivariate data, where the item under test is measured at different levels. Although intuitive, the proposed model is nonergodic, which means that the asymptotic Fisher information matrix is random. As a consequence, a detailed asymptotic analysis was carried out to establish the strategy for comparing the results of the participating laboratories. To illustrate, we apply our method to analyze the data from the Brazilian engine test group, PT program, where the power of an engine was measured by eight laboratories at several levels of rotation.
RESUMO
A continuous time multivariate stochastic model is proposed for assessing the damage of a multi-type epidemic cause to a population as it unfolds. The instants when cases occur and the magnitude of their injure are random. Thus, we define a cumulative damage based on counting processes and a multivariate mark process. For a large population we approximate the behavior of this damage process by its asymptotic distribution. Also, we analyze the distribution of the stopping times when the numbers of cases caused by the epidemic attain levels beyond certain thresholds. We focus on introducing some tools for statistical inference on the parameters related with the epidemic. In this regard, we present a general hypothesis test for homogeneity in epidemics and apply it to data of Covid-19 in Chile.
Assuntos
COVID-19 , Doenças Transmissíveis , Epidemias , Humanos , Processos Estocásticos , Modelos Biológicos , COVID-19/epidemiologia , Doenças Transmissíveis/epidemiologiaRESUMO
Given the limitations of frequentist method for null hypothesis significance testing, different authors recommend alternatives such as Bayesian inference. A poor understanding of both statistical frameworks is common among clinicians. The present is a gentle narrative review of the frequentist and Bayesian methods intended for physicians not familiar with mathematics. The frequentist p-value is the probability of finding a value equal to or higher than that observed in a study, assuming that the null hypothesis (H0) is true. The H0 is rejected or not based on a p threshold of 0.05, and this dichotomous approach does not express the probability that the alternative hypothesis (H1) is true. The Bayesian method calculates the probability of H1 and H0 considering prior odds and the Bayes factor (Bf). Prior odds are the researcher's belief about the probability of H1, and the Bf quantifies how consistent the data is concerning H1 and H0. The Bayesian prediction is not dichotomous but is expressed in continuous scales of the Bf and of the posterior odds. The JASP software enables the performance of both frequentist and Bayesian analyses in a friendly and intuitive way, and its application is displayed at the end of the paper. In conclusion, the frequentist method expresses how consistent the data is with H0 in terms of p-values, with no consideration of the probability of H1. The Bayesian model is a more comprehensive prediction because it quantifies in continuous scales the evidence for H1 versus H0 in terms of the Bf and the
Dadas las limitaciones del método de significancia frecuentista basado en la hipótesis nula, diferentes autores recomiendan alternativas como la inferencia bayesiana. Es común entre los médicos una comprensión deficiente de ambos marcos estadísticos. Esta es una revisión narrativa amigable de los métodos frecuentista y bayesiano dirigida quienes no están familiarizados con las matemáticas. El valor de p frecuentista es la probabilidad de encontrar un valor igual o superior al observado en un estudio, asumiendo que la hipótesis nula (H0) es cierta. La H0 se rechaza o no con base en un umbral p de 0.05, y este enfoque dicotómico no expresa la probabilidad de que la hipótesis alternativa (H1) sea verdadera. El método bayesiano calcula la probabilidad de H1 y H0 considerando las probabilidades a priori y el factor de Bayes (fB). Las probabilidades a priori son la creencia del investigador sobre la probabilidad de H1, y el fB cuantifica cuán consistentes son los datos con respecto a H1 y H0. La predicción bayesiana no es dicotómica, sino que se expresa en escalas continuas del fB y de las probabilidades a posteriori. El programa JASP permite realizar análisis frecuentista y bayesiano de una forma simple e intuitiva, y su aplicación se muestra al final del documento. En conclusión, el método frecuentista expresa cuán consistentes son los datos con H0 en términos de valores p, sin considerar la probabilidad de H1. El modelo bayesiano es una predicción más completa porque cuantifica en escalas continuas la evidencia de H1 versus H0 en términos del fB y de las probabilidades a posteriori.
Assuntos
Humanos , Testes de Hipótese , Teorema de Bayes , Histonas , UrologistasRESUMO
Although null hypothesis testing (NHT) is the primary method for analyzing data in many natural sciences, it has been increasingly criticized. Recently, approaches based on information theory (IT) have become popular and were held by many to be superior because it enables researchers to properly assess the strength of the evidence that data provide for competing hypotheses. Many studies have compared IT and NHT in the context of model selection and stepwise regression, but a systematic comparison of the most basic uses of statistics by ecologists is still lacking. We used computer simulations to compare how both approaches perform in four basic test designs (t-test, ANOVA, correlation tests, and multiple linear regression). Performance was measured by the proportion of simulated samples for which each method provided the correct conclusion (power), the proportion of detected effects with a wrong sign (S-error), and the mean ratio of the estimated effect to the true effect (M-error). We also checked if the p-value from significance tests correlated to a measure of strength of evidence, the Akaike weight. In general both methods performed equally well. The concordance is explained by the monotonic relationship between p-values and evidence weights in simple designs, which agree with analytic results. Our results show that researchers can agree on the conclusions drawn from a data set even when they are using different statistical approaches. By focusing on the practical consequences of inferences, such a pragmatic view of statistics can promote insightful dialogue among researchers on how to find a common ground from different pieces of evidence. A less dogmatic view of statistical inference can also help to broaden the debate about the role of statistics in science to the entire path that leads from a research hypothesis to a statistical hypothesis.
RESUMO
El factor de Bayes resulta una prueba recomendable para la comprobación de las hipótesis esta-dísticas atendiendo al estado de los p valores, empleando la escala de clasificación de Jeffreys preferiblemente
The Bayes factor is a recommended test for the verification of statistical hypotheses taking into account the state of the p values, preferably using the Jeffreys classification scale.
Assuntos
Humanos , Masculino , Feminino , Testes de Hipótese , Análise Fatorial , Pesquisa Operacional , Software , EstatísticaRESUMO
RESUMO Objetivo: Determinar a prevalência e os fatores de risco para conhecimento insuficiente sobre valores de p entre médicos e terapeutas respiratórios atuantes em terapia intensiva na Argentina. Métodos: Levantamento transversal on-line com 25 questões relativas às características dos participantes, autopercepção e conhecimento sobre valores de p (teoria e prática). Realizaram-se análises de estatística descritiva e regressão logística multivariada. Resultados: Analisaram-se 376 participantes. Não tinham conhecimento a respeito dos valores de p 237 participantes (63,1%). Segundo análise de regressão logística multivariada, falta de treinamento em metodologia científica (RC ajustadas 2,50; IC95% 1,37 - 4,53; p = 0,003) e a quantidade de leitura (< 6 artigos científicos por ano; RC ajustadas 3,27; IC95% 1,67 - 6,40; p = 0,001) foram identificados como independentemente associados com a falta de conhecimento sobre valores de p por parte dos participantes. Conclusão: A prevalência de conhecimento insuficiente com relação a valores de p entre médicos e terapeutas respiratórios na Argentina foi de 63%. Falta de treinamento em metodologia científica e quantidade de leitura (< 6 artigos científicos por ano) foram identificados como independentemente associados com a falta de conhecimento sobre valores de p por parte dos participantes.
ABSTRACT Objective: To determine the prevalence of and risk factors for insufficient knowledge related to p-values among critical care physicians and respiratory therapists in Argentina. Methods: This cross-sectional online survey contained 25 questions about respondents' characteristics, self-perception and p-value knowledge (theory and practice). Descriptive and multivariable logistic regression analyses were conducted. Results: Three hundred seventy-six respondents were analyzed. Two hundred thirty-seven respondents (63.1%) did not know about p-values. According to the multivariable logistic regression analysis, a lack of training on scientific research methodology (adjusted OR 2.50; 95%CI 1.37 - 4.53; p = 0.003) and the amount of reading (< 6 scientific articles per year; adjusted OR 3.27; 95%CI 1.67 - 6.40; p = 0.001) were found to be independently associated with the respondents' lack of p-value knowledge. Conclusion: The prevalence of insufficient knowledge regarding p-values among critical care physicians and respiratory therapists in Argentina was 63%. A lack of training on scientific research methodology and the amount of reading (< 6 scientific articles per year) were found to be independently associated with the respondents' lack of p-value knowledge.
Assuntos
Humanos , Conhecimentos, Atitudes e Prática em Saúde , Cuidados Críticos , Estudos Transversais , Inquéritos e Questionários , Fatores de RiscoRESUMO
To perform statistical inference for time series, one should be able to assess if they present deterministic or stochastic trends. For univariate analysis, one way to detect stochastic trends is to test if the series has unit roots, and for multivariate studies it is often relevant to search for stationary linear relationships between the series, or if they cointegrate. The main goal of this article is to briefly review the shortcomings of unit root and cointegration tests proposed by the Bayesian approach of statistical inference and to show how they can be overcome by the Full Bayesian Significance Test (FBST), a procedure designed to test sharp or precise hypothesis. We will compare its performance with the most used frequentist alternatives, namely, the Augmented Dickey-Fuller for unit roots and the maximum eigenvalue test for cointegration.
RESUMO
Theory-free characterizations of experimental systems miss normative and conceptual components that sometimes are crucial to understanding their historical development. In the following paper, we show that these components may be part of the intrinsic capacities of experimental systems themselves. We study a case of non-exploratory and theory-oriented research in experimental neuroscience that concerns the construction of free-viewing as an experimental system to test one particular pre-existing hypothesis, the Temporal Correlation Hypothesis (TCH), at a laboratory in Santiago de Chile, during 2002-2008. We show that the system does not take well-formulated pre-existing predictions or hypotheses to test them directly, but re-creates them and re-signifies them in terms that are not implied by the theoretical background from which they originally derived. Therefore, we conclude that there is a sui generis way in which experimental systems produce proper theoretical knowledge.
Assuntos
Conhecimento , Neurociências , Chile , Humanos , Análise Espaço-Temporal , Fatores de TempoRESUMO
HYpothesis testing using PHYlogenies (HyPhy) is a scriptable, open-source package for fitting a broad range of evolutionary models to multiple sequence alignments, and for conducting subsequent parameter estimation and hypothesis testing, primarily in the maximum likelihood statistical framework. It has become a popular choice for characterizing various aspects of the evolutionary process: natural selection, evolutionary rates, recombination, and coevolution. The 2.5 release (available from www.hyphy.org) includes a completely re-engineered computational core and analysis library that introduces new classes of evolutionary models and statistical tests, delivers substantial performance and stability enhancements, improves usability, streamlines end-to-end analysis workflows, makes it easier to develop custom analyses, and is mostly backward compatible with previous HyPhy releases.
Assuntos
Técnicas Genéticas , Filogenia , SoftwareRESUMO
Several biological systems such as the biomechanics of human heart, locomotion, and phyllotaxis of plants present a harmonic behavior because their fractal structure are associated to the golden ratio. The golden ratio (Φâ¯=â¯1.618033988749 ), also known as Phi, golden mean, golden section or divine proportion, is an irrational constant found in various forms in nature and recently has been used in many health areas. However, there is no literature on a specific statistical test to identify the golden ratio structures. To validate the results from each survey, it is necessary that statistical techniques be correctly selected and implemented, and the absence of a test to identify the golden ratio may undermines the scientific papers which have this goal. Since the golden number is a ratio, some tests have been wrongly applied in its identification. The objective of this paper is to present and to evaluate methods for identification of golden ratio. Four tests were evaluated: t-Student with ratio statistic (TR), with delta statistic (TΔ), with difference statistic (TED), and Wilcoxon test with statistic difference (WD). Data simulating different samples sizes (nâ¯=â¯2-200) and variability scenarios were used. The tests were assessed regarding type I error rate and power. For TΔ, type I error rate increased along with sample size and variability, achieving 50% in the scenario of relative standard deviation of 12.5% and 20.0% for line segments of lengths a and b, and sample size equal 200. This test also showed lower power when compared to the others in all scenarios. Similarly, for TR, the type I error rate was sensitive to the increasing in sample size, varying from 5 to 60%. On the other hand, WD and TED were associated to low type I error rates (around 5%) and high power (6.1% for sample size equal 2-100% for sample size equal 200). The TΔ and TR were inadequate to identify the golden ratio, since they did not controlled the type I error rate and/or presented low power, leading to possible erroneous conclusions. Therefore WD and TED, both with statistical of difference, appeared as the most appropriate methods to test golden ratio structures.
Assuntos
Interpretação Estatística de Dados , Modelos Estatísticos , Método de Monte Carlo , HumanosRESUMO
Dengue viruses (DENVs) are classified into four serotypes, each of which contains multiple genotypes. DENV genotypes introduced into the Americas over the past five decades have exhibited different rates and patterns of spatial dispersal. In order to understand factors underlying these patterns, we utilized a statistical framework that allows for the integration of ecological, socioeconomic, and air transport mobility data as predictors of viral diffusion while inferring the phylogeographic history. Predictors describing spatial diffusion based on several covariates were compared using a generalized linear model approach, where the support for each scenario and its contribution is estimated simultaneously from the data set. Although different predictors were identified for different serotypes, our analysis suggests that overall diffusion of DENV-1, -2, and -3 in the Americas was associated with airline traffic. The other significant predictors included human population size, the geographical distance between countries and between urban centers and the density of people living in urban environments.
RESUMO
Dando continuidade aos artigos da série "Perguntas que você sempre quis fazer, mas nunca teve coragem", que tem como objetivo responder e sugerir referências para o melhor entendimento das principais dúvidas dos pesquisadores do Hospital de Clínicas de Porto Alegre sobre estatística, este segundo artigo se propõe a responder às principais dúvidas levantadas sobre Teste de Hipóteses. São discutidas questões referentes à metodologia de um teste de hipóteses na concepção clássica de Inferência Estatística, bem como tamanho de efeito, tipos de erros, valor de p e poder. Os conceitos são abordados numa linguagem acessível ao público leigo e diversas referências são sugeridas para os curiosos em relação ao tema. (AU)
Continuing the series of articles "Questions you have always wanted to ask, but never had the courage to", which aims to answer the most common questions of researchers at Hospital de Clínicas de Porto Alegre regarding statistics and to suggest references for a better understanding, this second article addresses the topic of hypothesis testing. The hypothesis testing method is discussed from a classical conception of statistical inference, including effect size, type of errors, p-value and power. The concepts are explained in plain language for lay readers and several references are suggested for those curious about the topic. (AU)
Assuntos
Humanos , Testes de Hipótese , Interpretação Estatística de DadosRESUMO
Abstract The highest-order function of the mind is as a theorist. The memory system accumulates information about the outside world. The mind's theorist must sort through the information to formulate a theory about that world. The basic component of the system for theory building is a process called trolling. When the conscious mind is not being bombarded by external stimuli, or during certain stages of sleep, the mind's theorist trolls through memory searching for traces that contain similar information. If several traces are identified, analysis may yield information that was not evident when each was examined individually; reification of this sort can add new information to memory. The trolling process, and its ability to form new memory traces in the absence of external stimulation, is key to understanding many psychological phenomena.
Resumen La función superior de la mente es la de la construcción teórica. El sistema de memoria acumula información acerca del mundo externo y la mente constructora de teorías debe revisar dicha información para formular una teoría sobre el mundo. El componente básico del sistema de construcción teórica es un proceso llamado "trolling", que implica una búsqueda cuidadosa y sistemática. Cuando la mente consciente no está siendo bombardeada por estímulos externos o durante ciertas etapas del sueño, la mente teórica escudriña en la memoria buscando trazas que contengan información similar. Cuando se identifican varias trazas, es posible que el análisis arroje información que no era evidente al examinar cada traza de manera individual. Así, este tipo de reificación puede aportarle nueva información a la memoria. Dada su capacidad de formar nuevas trazas de memoria en ausencia de estímulos externos, el proceso de "trolling" es clave para la comprensión de muchos fenómenos psicológicos.
Resumo A maior função superior do cérebro é a de ser um teórico. O sistema da memória acumula informação sobre o mundo de fora. O teórico da mente deve investigar toda a informação para formular uma teoria sobre esse mundo. O componente básico do sistema para elaborar teorias é um processo chamado "trolling" (pesca de corrico). Quando a mente consciente não está sendo bombardeada por estímulos externos, ou durante certas etapas do sono, o teórico da mente "pesca" pelas memórias, procurando traços que contenham informações parecidas. Se vários traços são identificados, pode-se ter como resultado informação que não era evidente quando cada um foi analisado individualmente; reificação desse tipo pode adicionar nova informação à memória. O processo de "pescar" e sua habilidade para formar novos traços de memória na ausência de estímulos externos é fundamental para entender muitos fenômenos psicológicos.
RESUMO
El índice Dm tiene aplicaciones en psicometría, específicamente como una medida de validez de constructo de los ítems. En el presente trabajo se expone una aplicación alternativa del índice Dm para la comprobación de hipótesis generales en investigación empírica cuando esta involucra un constructo general y no es posible obtener una puntuación total para realizar dicho contraste. Para ello, es considerada la información de la comprobación de cada hipótesis específica, y sistematizada en Dm. Se ofrecen ejemplos para el caso de correlaciones y comparación de grupos. Entonces, la aplicación de Dm en el contexto de la comprobación de hipótesis generales es promisoria.
The Dm-index have many applications in psychometrics, specifically as a item´s construct validity measure. In this paper, we expose an alternative application of index-Dm to testing general hypothesis in empirical research when it includes a general construct, and isn´t possible to have a global score to make this testing. For this purpose, is considered the information of every specific hypothesis testing, and systematize it on Dm. This work offers examples for the correlation and group comparison procedures, obtaining sounds results. Then, the application of Dm in context of general hypothesis testing is promissory
RESUMO
The problem of event detection in general noisy signals arises in many applications; usually, either a functional form of the event is available, or a previous annotated sample with instances of the event that can be used to train a classification algorithm. There are situations, however, where neither functional forms nor annotated samples are available; then, it is necessary to apply other strategies to separate and characterize events. In this work, we analyze 15-min samples of an acoustic signal, and are interested in separating sections, or segments, of the signal which are likely to contain significant events. For that, we apply a sequential algorithm with the only assumption that an event alters the energy of the signal. The algorithm is entirely based on Bayesian methods.
RESUMO
Objetivo: Comparar el desempeño de cuatro pruebas estadísticas para la evaluación de la confiabilidad prueba/re-prueba de variables continuas. Métodos: estudio de simulación estadística desarrollado dentro en el marco de un estudio de pruebas diagnósticas in vitro en 120 dientes que cumplieron con los criterios de selección. Para cada diente posicionado en un dispositivo de estandarización se tomaron dos radiografías digitales (T0 y T1) a las cuales se evaluó la longitud dental. Los datos se analizaron con estadística descriptiva y luego la comparación estadística a través de "t" de Student pareada, coeficiente de correlación intraclase, coeficiente de correlación de Pearson y coeficiente de correlación y concordancia de Lin en el paquete Stat v.13.2 para Windows (StataCorp., TX., USA). Resultados: La media de longitud dental para T0 fue 21,15 mm y para T1 21,07 mm. La prueba "t" de Student reveló una diferencia de medias de 0,089 (P=0,00). El coeficiente de correlación intraclase fue 0,877 (IC 95%: 0,43 - 0,98), coeficiente de correlación de Pearson 0,93 y el coeficiente de correlación y concordancia de Lin 0,93 (IC 95%: 0,908 - 0,956). Conclusiones: La selección de una prueba estadística para evaluación de concordancia prueba/re-prueba debe hacerse teniendo en cuenta los objetivos del estudio en cada contexto y la posibilidad de cada método estadístico de valorar la presencia de error en los datos. Así, un método que actualmente cumple con este requerimiento esencial es el coeficiente de correlación y concordancia de Lin por lo cual se recomienda su uso en futuros estudios.
Objective: To compare the performance of four statistical tests in continuous variables test/retest reliability assessment. Methods: Statistical simulation study developed in the framework of an in vitro diagnostic test study including 120 teeth which met the inclusion criteria. Each tooth was positioned in a standardization device and was taken two digital x-rays (T0 and T1) in which we assessed tooth-length. Data were analyzed with descriptive statistics and then a statistical comparison was done with paired Student's "t" test, intraclass correlation coefficient, Pearson correlation coefficient and Lin's concordance correlation coefficient in Stata v.13.2 for Windows (StataCorp., TX., USA). Results: The average dental length for T0 was 21.15 mm and for T1 21.07 mm. Student's "t" test revealed an average difference of 0.089 (P=0.00). The intraclass correlation coefficient 0.877 (95% CI: 0.43 - 0.98), Pearson's productmoment correlation coefficient 0.93, and Lin's concordance correlation coefficient 0.93 (95% CI: 0.908 - 0.956). Conclusions: Selection of a statistical test for test/re-test reliability assessment should be made having in mind the research objectives in any context and the possibility of each method for error assessment. Thus, a method that currently complies with this essential requirement is Lin's concordance correlation coefficient, which is recommended for future test re-test research studies.