RESUMEN
Introducción. La prueba de significancia de la hipótesis nula (PSHN) constituye la herramienta más usada para evaluar hipótesis científicas y tomar decisiones al respecto, en especial en ciencias de la salud. Sin embargo, por décadas ha estado en el centro del debate, ya que se han identificado varios problemas conceptuales y de interpretación. Se realizó una revisión de artículos científicos que ilustran las críticas de esta controversia y su relevancia en el ámbito de la investigación en salud. Algunas alternativas para la PSHN son una adecuada interpretación del valor p, uso de intervalos de confianza, incluir el tamaño del efecto y adoptar un marco de inferencia bayesiana. En todos los casos en que se utilice PSHN, su uso debe ser claramente justificado.
Background. Null hypothesis significance testing (NSHT) constitutes the most widely applied tool for the evaluation of scientific hypotheses and decision making in health sciences. However, the method has been the centre of a heated debate where various criticisms related to conceptual and interpretational problems. A review of scientific articles that illustrate the criticisms of this controversy and its relevance in the field of health research was carried out. Some alternatives for the NSHT are an adequate interpretation of the p-value, use of confidence intervals, including the effect size and adopting a Bayesian inference framework. In all cases where NSHT is used, its use should be clearly justified.
RESUMEN
Given the limitations of frequentist method for null hypothesis significance testing, different authors recommend alternatives such as Bayesian inference. A poor understanding of both statistical frameworks is common among clinicians. The present is a gentle narrative review of the frequentist and Bayesian methods intended for physicians not familiar with mathematics. The frequentist p-value is the probability of finding a value equal to or higher than that observed in a study, assuming that the null hypothesis (H0) is true. The H0 is rejected or not based on a p threshold of 0.05, and this dichotomous approach does not express the probability that the alternative hypothesis (H1) is true. The Bayesian method calculates the probability of H1 and H0 considering prior odds and the Bayes factor (Bf). Prior odds are the researcher's belief about the probability of H1, and the Bf quantifies how consistent the data is concerning H1 and H0. The Bayesian prediction is not dichotomous but is expressed in continuous scales of the Bf and of the posterior odds. The JASP software enables the performance of both frequentist and Bayesian analyses in a friendly and intuitive way, and its application is displayed at the end of the paper. In conclusion, the frequentist method expresses how consistent the data is with H0 in terms of p-values, with no consideration of the probability of H1. The Bayesian model is a more comprehensive prediction because it quantifies in continuous scales the evidence for H1 versus H0 in terms of the Bf and the
Dadas las limitaciones del método de significancia frecuentista basado en la hipótesis nula, diferentes autores recomiendan alternativas como la inferencia bayesiana. Es común entre los médicos una comprensión deficiente de ambos marcos estadísticos. Esta es una revisión narrativa amigable de los métodos frecuentista y bayesiano dirigida quienes no están familiarizados con las matemáticas. El valor de p frecuentista es la probabilidad de encontrar un valor igual o superior al observado en un estudio, asumiendo que la hipótesis nula (H0) es cierta. La H0 se rechaza o no con base en un umbral p de 0.05, y este enfoque dicotómico no expresa la probabilidad de que la hipótesis alternativa (H1) sea verdadera. El método bayesiano calcula la probabilidad de H1 y H0 considerando las probabilidades a priori y el factor de Bayes (fB). Las probabilidades a priori son la creencia del investigador sobre la probabilidad de H1, y el fB cuantifica cuán consistentes son los datos con respecto a H1 y H0. La predicción bayesiana no es dicotómica, sino que se expresa en escalas continuas del fB y de las probabilidades a posteriori. El programa JASP permite realizar análisis frecuentista y bayesiano de una forma simple e intuitiva, y su aplicación se muestra al final del documento. En conclusión, el método frecuentista expresa cuán consistentes son los datos con H0 en términos de valores p, sin considerar la probabilidad de H1. El modelo bayesiano es una predicción más completa porque cuantifica en escalas continuas la evidencia de H1 versus H0 en términos del fB y de las probabilidades a posteriori.
Asunto(s)
Humanos , Pruebas de Hipótesis , Teorema de Bayes , Histonas , UrólogosRESUMEN
Resumen ANTECEDENTES: El valor de p es el método más empleado para estimar la significación estadística de cualquier hallazgo; sin embargo, en los últimos años se ha intensificado su debate al respecto, debido a la baja credibilidad y reproducibilidad de diversos estudios. OBJETIVO: Describir el estado actual del concepto del valor de p y la significación estadística (prueba de significación de la hipótesis nula [por sus siglas en inglés: Null Hypothesis Significance Testing: NHST]), especificar los problemas más importantes y puntualizar las soluciones propuestas para la mejor utilización de los conceptos. METODOLOGÍA: Se llevó a cabo la búsqueda bibliográfica en MEDLINE y Google Scholar, con los términos: "NHST", "Statistical significance; P value" en idioma inglés y español, de 2018-2019, limitándose a la selección de artículos publicados entre 2005 y 2019, mediante la revisión de tipo narrativo con búsqueda manual; sobre todo estudios de metodología. RESULTADOS: La búsqueda global reportó 1411 artículos: 875 de PubMed y 536 de Google Scholar. Se excluyeron 817 por duplicación, 155 sin acceso completo y 414 ensayos clínicos (sin metodología estadística); los 25 restantes fueron el motivo del análisis. CONCLUSIONES: El concepto del valor de p no es simple, tiene varias falacias y malas interpretaciones que deben considerarse para evitarlas en lo posible. Se recomienda no usar el término "estadísticamente significativo" o "significativo", sustituir el umbral de 0.05 por 0.005, informar valores de p precisos y con IC95%, riesgo relativo, razón de momios, tamaño del efecto o potencia y métodos bayesianos.
Abstract BACKGROUND: The P value is the most widely used method of estimating the statistical significance of any finding, however, in recent years the debate over the P value has been increasingly intensified due to the low credibility and reproducibility of many studies. OBJECTIVE: To describe the current state of the concept of the value of P and the statistical significance (Null Hypothesis Significance Testing (NHST), specify the most important problems and point out the solutions proposed in the literature for their best use. METHODOLOGY: Search in MEDLINE and Google Scholar, with the terms: "NHST", "Statistical significance; P value "in English and Spanish, carried out from 2018-2019, limited to articles published from 2005 to 2019, and a narrative-type review with manual search. Articles on methodology were preferably selected. RESULTS: The global search yielded 1411 articles, 875 from PubMed and 536 from Google Scholar. 817 were excluded by duplication, 155 without full access, 414 from clinical trials, without statistical methodology. The 25 selected articles were the reason for the analysis. CONCLUSIONS: The concept of the value of P is not simple, and it has several fallacies and misinterpretations that must be taken into account to avoid them as much as possible. Recommendations: Do not use "statistically significant" or "significant", replace the threshold of 0.05 with 0.005, report accurate P values with 95% CI, relative risk, odds ratio, effect size or power, and Bayesian methods.
RESUMEN
In a large number of randomized controlled trials, researchers provide P values for demographic data, which are commonly reported in table 1 of the article for the purpose of emphasizing the lack of differences between or among groups. As such, the authors intend to demonstrate that statistically insignificant P values in the demographic data confirm that group randomization was adequately performed. However, statistically insignificant P values do not necessarily reflect successful randomization. It is more important to rigorously establish a plan for statistical analysis during the design and planning stage of the study, and to consider whether any of the variables included in the demographic data could potentially affect the research results. If a researcher rigorously designed and planned a study, and performed it accordingly, the conclusions drawn from the results would not be influenced by P values, regardless of whether they were significant. In contrasts, imbalanced variables could affect the results after variance controlling, even though whole study process are well planned and executed. In this situation, the researcher can provide results with both the initial method and a second stage of analysis including such variables. Otherwise, for brief conclusions, it would be pointless to report P values in a table simply listing baseline data of the participants.
Asunto(s)
Sesgo , Métodos , Distribución AleatoriaRESUMEN
Most parametric tests start with the basic assumption on the distribution of populations. The conditions required to conduct the t-test include the measured values in ratio scale or interval scale, simple random extraction, normal distribution of data, appropriate sample size, and homogeneity of variance. The normality test is a kind of hypothesis test which has Type I and II errors, similar to the other hypothesis tests. It means that the sample size must influence the power of the normality test and its reliability. It is hard to find an established sample size for satisfying the power of the normality test. In the current article, the relationships between normality, power, and sample size were discussed. As the sample size decreased in the normality test, sufficient power was not guaranteed even with the same significance level. In the independent t-test, the change in power according to sample size and sample size ratio between groups was observed. When the sample size of one group was fixed and that of another group increased, power increased to some extent. However, it was not more efficient than increasing the sample sizes of both groups equally. To ensure the power in the normality test, sufficient sample size is required. The power is maximized when the sample size ratio between two groups is 1 : 1.
Asunto(s)
Bioestadística , Distribución Normal , Tamaño de la MuestraRESUMEN
Currently, it has been observed a growing number of publications in all fields of Dentistry. These publications act as scientific evidence, as well as a basis for clinical decision-making in dental care routine. It is important to note that the results and conclusions in articles are based on the p-value that is a purely probabilistic and statistical parameter, and it assists the researcher to accept or reject the null hypothesis being tested. The p-value was proposed by Fisher in 1925, and in Dentistry, it is usual to adopt the p-value stated in 0.05.1 In practical terms, when a statistical test results in p-value less than 0.05, the null hypothesis must be rejected (equality between groups), assuming that there is a difference between the assessed groups.2 In other words, p<0.05 indicates statistically significant difference between groups. Under a critical look, the researcher and reader should keep in mind that a statistical difference is not always reflecting a true clinical importance. In addition, a lack of statistical significance does not necessarily relate to the absence of clinical significance. The clinical importance is far beyond statistical calculations based on the p-value.3 A study presents clinical importance when the one being tested presents clinical effect capable to change the behavior of the dentist in daily routine. This judgment should be done by the researcher based on the results of his/her research, clinical experience and actual knowledge. In addition, estimates of effect sizes should be presented. This facilitates assessment of how large or small the observed effect could actually be in the population of interest, and hence how clinically important it could be. Kassab et al. (2006)4 compared periodontal parameters in groups with and without chemical biomodifciation of the root prior surgical coverage in cases of gingival recession. The group, that used edetic acid, statistically improved the periodontal parameters in relation to the group without surface biomodification. However, this difference was imperceptible to both dentist and patient. That is, the clinical result of root coverage will be the same when using or not acid biomodification of the root. In other words, there was not an important clinical effect of this step, although there was a significant difference. In the above example, it is clear that just because a statistic test is significant doesn't mean the effect it measures is significant or clinically important. Then, researchers
Asunto(s)
Humanos , Editorial , Odontología , Ortodoncia , Periodoncia , Prostodoncia , Cirugía Bucal , Bioestadística , Probabilidad , Epidemiología y Bioestadística , Odontología Pediátrica , Estadística , Endodoncia , Odontología GeriátricaRESUMEN
The aim of this paper is to present an alternative approach to measure effect size. The model proposed belongs to r family....(AU)
O objetivo deste artigo é apresentar uma medida alternativa de cálculo de tamanho de efeito. o modelo proposto pertence a família r....(AU)
Asunto(s)
Humanos , Masculino , Femenino , Historia del Siglo XX , Probabilidad , Investigación , EstadísticaRESUMEN
Resumen La "Práctica Basada en la Evidencia" requiere a los profesionales valorar de forma crítica los resultados de las investigaciones psicológicas. Sin embargo, las interpretaciones incorrectas de los valores p de probabilidad son abundantes y repetitivas. Estas interpretaciones incorrectas pueden afectar las decisiones profesionales y poner en riesgo la calidad de las intervenciones y la acumulación de un conocimiento científico válido. Por lo tanto, identificar el tipo de falacia que subyace a las decisiones estadísticas es fundamental para abordar y planificar estrategias de educación estadística dirigidas a intervenir sobre las interpretaciones incorrectas. En consecuencia, el objetivo de este estudio es analizar la interpretación del valor p en estudiantes y profesores universitarios de psicología. La muestra estuvo formada por 161 participantes (43 profesores y 118 estudiantes). La antigüedad media como profesor fue de 16.7 años (DE = 10.07). La edad media de los estudiantes fue de 21.59 (DE = 1.3). Los hallazgos sugieren que los estudiantes y profesores universitarios no conocen la interpretación correcta del valor p. La falacia de la probabilidad inversa presentó mayores problemas de comprensión. Además, se confundieron la significación estadística y la significación práctica o clínica. Estos resultados destacan la necesidad de la educación estadística y reeducación estadística.
Abstract The "Evidence Based Practice" requires professionals to critically assess the results of psychological research. However, incorrect interpretations of p values of probability are abundant and repetitive. These misconceptions may affect professional decisions and compromise the quality of interventions and the accumulation of a valid scientific knowledge. Therefore, identifying the types of fallacies that underlying statistical decisions is fundamental for approaching and planning statistical education strategies designed to intervene in incorrect interpretations. Consequently, the aim of this study is to analyze the interpretation of p value among university students of psychology and academic psychologists. The sample was composed of 161 participants (43 academics and 118 students). The mean number of years as academic was 16.7 (SD = 10.07). The mean age of university students was 21.59 years (SD = 1.3). The findings suggest that college students and academics do not know the correct interpretation of p values. The inverse probability fallacy presented major problems of comprehension. In addition, the participants confused statistical significance and practical significance or clinical or the findings. There is a need for statistical education and statistical re-education.
Asunto(s)
Universidades , Interpretación Estadística de Datos , Interpretación Estadística de Datos , DocentesRESUMEN
The previous articles of the Statistical Round in the Korean Journal of Anesthesiology posed a strong enquiry on the issue of null hypothesis significance testing (NHST). P values lie at the core of NHST and are used to classify all treatments into two groups: "has a significant effect" or "does not have a significant effect." NHST is frequently criticized for its misinterpretation of relationships and limitations in assessing practical importance. It has now provoked criticism for its limited use in merely separating treatments that "have a significant effect" from others that do not. Effect sizes and CIs expand the approach to statistical thinking. These attractive estimates facilitate authors and readers to discriminate between a multitude of treatment effects. Through this article, I have illustrated the concept and estimating principles of effect sizes and CIs.
Asunto(s)
Anestesiología , Intervalos de Confianza , PensamientoRESUMEN
All statistical tests have a p value that is significant when < 0.050. This value was arbitrarily determined by RA Fisher and accepted consensually over time. Since its genesis, this value has been questioned, and nowadays it is under the careful eye of many statisticians. This issue has led to a debate among the scientific community: obtaining p significance was considered as a guarantee that the research project would be an appropriate contrast between the hypothesis and the acceptance, or rejection, of it. The purpose of this paper is to construct a discussion about p significance.
Todas la pruebas estadísticas tienen un valor de p significativo a partir de < de 0.050, el cual fue arbitrariamente determinado por RA Fisher y aceptado por consenso a través del tiempo. Desde su génesis, este valor ha sido cuestionado y actualmente está bajo la mirada escrupulosa de muchos estadígrafos, por lo que se establece un debate en la comunidad científica donde clásicamente se consideraba obtener la significancia de p un sello de garantía, que el proyecto de investigación era capaz de aceptar o rechazar la hipótesis. El objetivo de este artículo es discutir los cuestionamientos de la significancia de p.
RESUMEN
Objective To study the influence on common logarithm of partition-coefficient (log P) value of insoluble drugs on nano-lipid emulsion properties ,including drug-loading amount ,in vitro release ,and phase distribution etc .Methods 6 insoluble drugs ,nimodipine (NIM) ,docetaxel (DTX) ,curcumin (CUR) ,paclitaxel (PTX) ,teniposide (TEN) ,silybin (SLB) ,were selected as the model drugs ,to investigate the relationship between log P value and nano-lipid emulsion of the dissolubility in PEG400 ,the amount of drug-loading ,particle diameter ,Zeta potential ,in vitro release ,and phase distribution respectively .Results With the increase of log P value ,drug solubility in PEG400 first increased and then decreased ,drug-loading in nano-lipid emulsion increased ,release rate in vitro of drug slowed down ,drug distribution in oil phase increased while in emulsion layer decreased .Log P value has no correlation with particle diameter and Zeta potential .Conclusion The properties of drug-loading nano-lipid emulsion can be preliminarily judged by log P values and the solubility in PEG400 of drugs .
RESUMEN
Background: In 1985, the center for disease control coined the name: “Acquired Immune Deficiency Syndrome (AIDS)” to refer a deadly illness. The World Health Organization (WHO) estimated that about 33.4 million people were suffering with AIDS and two million people (including 330,000 children) died in 2009 alone in many parts of the world. A scary fact is that the public worry about situations which might spread AIDS according to reported survey result in Meulders et al. (2001). This article develops and illustrates an appropriate statistical methodology to understand the meanings of the data. Methods: While the binomial model is a suitable underlying model for their responses, the data mean and dispersion violates the model’s required functional balance between them. This violation is called over-under dispersion. This article creates an innovative approach to assess whether the functional imbalance is too strong to reject the binomial model for the data. In a case of rejecting the model, what is a correct way of warning the public about the spreads of AIDS in a specified situation? This question is answered. Results: In the survey data about how AIDS/HIV might spread according to fifty respondents in thirteen nations, the functional balance exists only in three cases: “needle”, “blood” and “sex” justifying using the usual binomial model (1). In all other seven cases: “glass”, “eating”, “object”, “toilet”, “hands”, “kissing”, and “care” of an AIDS or HIV patient, there is a significant imbalance between the dispersion and its functional equivalence in terms of the mean suggesting that the new binomial called imbalanced binomial distribution (6) of this article should be used. The statistical power of this methodology is indeed excellent and hence the practitioners should make use of it. Conclusion: The new model called imbalanced binomial distribution (6) of this article is versatile enough to be useful in other research topics in the disciplines such as medicine, drug assessment, clinical trial outcomes, business, marketing, finance, economics, engineering and public health.
RESUMEN
Background: In times of an outbreak of a contagious deadly epidemic1-4 such as severe acute respiratory syndrome (SARS), the patients are quarantined and rushed to an emergency department of a hospital for treatment. Paradoxically, the nurses who treat them to become healthy get infected in spite of the nurses’ precautionary defensive alertness. This is so unfortunate because the nurses might not have been in close contact with the virus otherwise in their life. The nurses’ sufficient immunity level is a key factor to avert hospital site infection. Is it possible to quantify informatics about the nurses’ immunity from the virus? Methods: The above question is answered with a development of an appropriate new model and methodology. This new frequency trend is named Bumped-up Binomial Distribution (BBD). Several useful properties of the BBD are derived, applied, and explained using SARS data5 in the literature. Though SARS data are considered in the illustration, the contents of the article are versatile enough to analyze and interpret data from other contagious diseases. Results: With the help of BBD (3) and the Toronto data in Table 1, we have identified the informatics about the attending nurses’ sufficient immunity level. There were 32 nurses providing 16 patient care services. Though the nurses were precautionary to avoid infection, not all of them were immune to infection in those activities. Using the new methodology of this article, their sufficient immunity level is estimated to be only 0.25 in a scale of zero to one with a p-value of 0.001. It suggests that the nurses’ sufficient immunity level is statistically significant. The power of accepting the true alternative hypothesis of 0.50 immunity level, if it occurs, is calculated to be 0.948 in a scale of zero to one. It suggests that the methodology is powerful. Conclusions: The estimate of nurse’s sufficient immunity level is a helpful factor for the hospital administrators in the time of making work schedules and assignments of the nurses to highly contagious patients who come in to the emergency or regular wings of the hospital for treatment. When the approach and methodology of this article are applied, it would reduce if not a total elimination of the hospital site infections among the nurses and physicians.
RESUMEN
Background: Smoking is generally known to be carcinogenic and health hazardous. What is not clear is whether the smoking impacts on the woman’s reproductive process. There have been medical debates on whether a woman in the child bearing age may delay her pregnancy due to smoking. A definitive conclusion on this issue has not been reached perhaps due to a lack of appropriate data evidence. The missing link to answer the question might be exercising a suitable model to extract the pertinent data information on the number of missed menstrual cycles by smoking women versus non-smoking women. This article develops and demonstrates a statistical methodology to answer the question. Methods: To construct such a needed methodology, a new statistical distribution is introduced as an underlying model for the data on the number of missed menstrual cycles by women who smoke. This new distribution is named Tweaked Geometric Distribution (TGD). Several useful properties of the TGD are derived and explained using a historical data in the literature. Results: In the data of 100 smokers and 486 non-smokers, on the average, smoking women missed 3.22 menstrual cycles and non-smoking women missed only 1.96 menstrual cycles before becoming pregnant. The smoking women exhibited more variation than the non-smoking women and it suggests that the non-smoking women are more homogeneous while the smoking women are more heterogeneous. Furthermore, the impairment level to pregnancy due to smoking among the 486 women is estimated to be 5% in a possible scale of zero to one. The 5% impairment level appears like a small amount, but its impact can be felt once it is cast in terms of fecundity. What is fecundity? The terminology fecundity refers the chance for a woman to become pregnant. The fecundity is 0.24 for smoking woman while it is 0.34 for non-smoking woman. The fecundity of a non-smoking woman is more than twice the fecundity of a smoking woman. Conclusion: The smoking is really disadvantageous to any one in general and particularly to a woman who wants to become pregnant.
RESUMEN
Se realizó un estudio comparativo entre pruebas de permutación y asintóticas, aplicadas a tablas de contingencia de dimensión R×C no ordenadas, utilizando como medida de comparación la diferencia entre el p-valor exacto y asintótico. Se analizaron cinco (05) ejemplos que presentan tablas de contingencia no ordenadas, publicados en la literatura científica internacional relacionados con estudios biomédicos, con el objeto de mostrar bajo cuales condiciones ambos enfoques difieren o convergen para las pruebas de independencia de Pearson, Razón de Verosimilitud y Freeman-Halton. Los resultados mostraron que el comportamiento de las metodologías exacta y asintótica depende del tamaño de muestra, dimensión, balanceo y dispersión de la tabla de contingencia y prueba aplicada. Para los casos estudiados se encontró que los p-valores exactos y asintóticos presentaron diferencias notables para tamaños de muestras pequeños; sobre todo en tablas de contingencia desbalanceadas y dispersas; y mostraron convergencia de los p-valores asintóticos a los exactos en la medida que el tamaño de muestra y dimensión de la tabla era mayor.
A comparative study between permutation and asymptotic tests applied to unordered R×C dimension contingency tables was carried out, using the difference between the exact p-value and the asymptotic one as comparison measurement. Five (05) biomedical research-paper results based on unordered contingency tables were examined from international scientific literature, analyzing how different or equivalent they appear using Pearson, Likelihood ratio and Freeman-Halton independency tests. Results revealed that both methodologies, exact and asymptotic, behave depending on sample size, dimensions, balance and dispersion of the contingency table, as well as on the test applied. The exact and asymptotic p-values showed striking differences for small sample sizes mainly in unbalanced and sparse contingency tables, but they converged as the sample size and table dimensions increased.
RESUMEN
Las Pruebas de Hipótesis son el procedimiento de análisis más conocido por los investigadores y utilizado en las revistas científicaspero, a su vez, ellas han sido fuertemente criticadas, su uso ha sido cuestionado y restringido en algunos casos por las inconsistenciasobservadas en su aplicación. Este problema se analiza, en este artículo, tomando como punto de partida los Fundamentos de laMetodología Estadística y los diferentes enfoques que históricamente se han desarrollado para abordar el problema del análisis delas Hipótesis Estadísticas. Resaltándose un punto poco conocido por algunos: el carácter aleatorio de los valores P. Se presentanlos fundamentos de las soluciones de Fisher, Neyman-Pearson y Bayesiana y a partir de ellas se identifican las inconsistenciasdel procedimiento de conducta que indica identificar un valor P, compararlo con el valor del error de tipo I que usualmente esconsiderado como 0,05- y a partir de ahí decidir las conclusiones del análisis. Adicionalmente se identifican recomendaciones sobrecómo proceder en un problema, así como los retos a enfrentar, en lo docente y en lo metodológico, para analizar correctamente losdatos y determinar la validez de las hipótesis de interés...
Hypothesis testing is a well-known procedure for data analysiswidely used in scientific papers but, at the same time, strongly criticized and its use questioned and restricted in some cases due toinconsistencies observed from their application. This issue is analyzed in this paper on the basis of the fundamentals of the statisticalmethodology and the different approaches that have been historically developed to solve the problem of statistical hypothesis analysishighlighting a not well known point: the P value is a random variable. The fundamentals of Fisher´s, Neyman-Pearson´s and Bayesian´ssolutions are analyzed and based on them, the inconsistency of the commonly used procedure of determining a p value, compare it toa type I error value (usually 0.05) and get a conclusion is discussed and, on their basis, inconsistencies of the data analysis procedureare identified, procedure consisting in the identification of a P value, the comparison of the P-value with a type-I error value whichis usually considered to be 0.05 and upon this the decision on the conclusions of the analysis. Additionally, recommendations on thebest way to proceed when solving a problem are presented, as well as the methodological and teaching challenges to be faced whenanalyzing correctly the data and determining the validity of the hypotheses...
Os testes de hipóteses são o método de análisemelhor conhecido por pesquisadores e utilizado em revistas científicas; mas por sua vez, têm sido fortemente criticados, seu uso temsido questionado e, em alguns casos restritos pelas inconsistências observadas na sua aplicação. Esse problema é discutido neste artigo,tendo como ponto de partida os Fundamentos da Metodologia Estatística e as diferentes abordagens que historicamente têm sidodesenvolvidas para resolver o problema da analise das Hipóteses Estatísticas. Destacando-se um ponto pouco conhecido por alguns: ocaráter aleatório do p-valor. Apresentam-se os fundamentos das soluções de Fisher, Neyman-Pearson e Bayesiana e delas são identificadasas inconsistências do procedimento de conduta que orienta identificar um p-valor para compará-lo com o valor do erro de tipo I, queé geralmente considerado como 0,05 - e, posteriormente, decidir as conclusões da análise. Além disso, se identificam recomendaçõessobre como proceder num problema, e os desafios a serem enfrentados no ensino e no metodológico, para analisar corretamente osdados e determinar a validade das hipóteses de interesse...
Asunto(s)
Conducta/fisiología , Pruebas de HipótesisRESUMEN
Tendency to commit serious & trivial errors while certifying death still looms large. So some one may say, there is only one certificate in the world which is full of errors and that is death certificate! This study is done to evaluate errors in medical & non-medical part and to assess causes of errors of COD. Total 353 death certificates from teaching hospital were evaluated to detect different errors. Causes of errors of COD were scrutinized & confirmed after examining COD statements extensively. 21% death certificates were incompletely written. 99% certificates were incorrectly written. P value of correct & complete certificates with that of incorrect & incomplete certificates was found insignificant. Commonest error was use of ‘with’ instead of ‘due to’ & mention of ‘MOD’ at I a. Causes of these errors (99%) were Lack of training & diagnostic difficulty. Several errors were found in non-medical part which highlight ‘routine attitude’ of certifier. To change this scenario team work is needed. Team of doctors need to certify & supervise death certificates closely. Possibility of legal action against often erring certifying doctor may be appraised publically.
Asunto(s)
Causas de Muerte , Muerte/etiología , Muerte/legislación & jurisprudencia , Certificado de Defunción , Errores Diagnósticos/legislación & jurisprudencia , Humanos , ProbabilidadRESUMEN
Introducción: En medicina se ha privilegiado el valor p y lo que éste aporta. No obstante, cada día se usan otros criterios, como el intervalo de confianza, y nuevas formulaciones de las pruebas de hipótesis que pueden proveer más profundidad en la identificación de resultados clínicamente relevantes. Objetivos: Exponer criterios y pruebas de hipótesis que vayan más allá del valor p. Resultados: Se da una explicación a los intervalos de confianza y a diferentes pruebas de hipótesis para identificar, en el análisis de los datos de la investigación, los valores clínicamente relevantes. Conclusión: El valor p, los intervalos de confianza y la identificación de diferencias clínicamente relevantes por medio del uso de hipótesis de superioridad, de no inferioridad y de equivalencia son fundamentales para la investigación clínica...
Introduction: In medicine the p value has had an important place because of its contribution. In addition the confidence intervals and new formulations of significant test are used every day as a way to identify clinically relevant results. Objective: To describe the criteria and the significant test beyond the p value. Results: Confidence intervals and significant tests are review to identify in data analysis clinically relevant findings. Conclusion: The p value, confidence intervals and the identification of clinically relevant findings by means of superiority, non-inferiority and equivalence hypothesis are fundamentals in clinical research...
Asunto(s)
Investigación Biomédica , Intervalos de ConfianzaRESUMEN
Cuando los trabajadores de la salud o las personas con escasos conocimientos de bioestadística se involucran en investigaciones, especialmente de tipo cuantitativo, aplican técnicas estadísticas con las que pretenden analizar la información obtenida como resultado de un proceso de recolección de datos en cuya plantación no se hizo previsión del tipo de análisis que se podría necesitar para que los resultados fueran consecuentes con las hipótesis que desde un principio se ligan con todo proceso de indagación empírica, sistemática, controlada y reproducible -investigación- que busca resolver un problema especifico. Por ello, cuando se trata de interpretar los resultados de un estudio se pueden presentar errores respecto a la validez de los resultados obtenidos, especialmente cuando de manera empírica se quiere establecer el nivel de significación y, además, aclarar lo relacionado con el error que se produce cuando se acepta como válido un hallazgo que se origina por no haber formulado la hipótesis de trabajo (Error de tipo I).
Usually health professionals and people with little knowledge of statistics when involved with quantitative research they are faced to make statistical techniques to fulfill the data analysis resulting from a previous data collection. Generally they state hypothesis and later the information analysis can support the evidence in favor or against such hypothesis. In that point commonly they are faced to confusion when they try to interpret p value and type I error. The concept of p value and significance level will be approached in this paper and the difference among them will be cleared.
Asunto(s)
Estadística como Asunto/métodos , Valor Predictivo de las Pruebas , Pruebas de HipótesisRESUMEN
En este artículo se hace una revisión de los peligros que conlleva el uso del término significación estadística y la importancia de analizar la magnitud de las diferencias que se encuentran al final de los estudios de investigación. Para ello, se hace una presentación del concepto de significación estadística, los errores tipo I y tipo II y del concepto de relevancia clínica. Asimismo, se discute el uso de otro tipo de medidas como son los intervalos de confianza. Finalmente se presentan, a manera de conclusión, dos ideas básicas: la primera tiene que ver con la importancia de identificar la prueba estadística que mejor se ajuste al estudio para rechazar o aceptar la hipótesis nula y la necesidad de establecer si la magnitud de las diferencias obtenidas tienen alguna importancia desde el punto de vista clínico.
This article reviews the potential hazards of using the term statistical significance as well as the mportance of analyzing size effects of differences ound at the research reports articles. Thus, this rticle presents a review of concepts like statistical ignificance, type I and type II errors, and clinical elevance. Similarly, a discussion regarding other tatistical measures, such as confidence intervals, s presented. At last, two ideas are presented as ain conclusions of this analysis: the first délas ith the importance of identifying the best tatistical tests to either accept or reject the null ypothesis in a research study. The second idea ighlights the need of clarifying the clinical elevance of differences size effect.