Search | VHL Regional Portal

1.

Better to be in agreement than in bad company : A critical analysis of many kappa-like tests.

Silveira, Paulo Sergio Panse; Siqueira, Jose Oliveira.

Behav Res Methods ; 55(7): 3326-3347, 2023 10.

Article in English | MEDLINE | ID: mdl-36114386

ABSTRACT

We assessed several agreement coefficients applied in 2x2 contingency tables, which are commonly applied in research due to dichotomization. Here, we not only studied some specific estimators but also developed a general method for the study of any estimator candidate to be an agreement measurement. This method was developed in open-source R codes and it is available to the researchers. We tested this method by verifying the performance of several traditional estimators over all possible configurations with sizes ranging from 1 to 68 (total of 1,028,789 tables). Cohen's kappa showed handicapped behavior similar to Pearson's r, Yule's Q, and Yule's Y. Scott's pi, and Shankar and Bangdiwala's B seem to better assess situations of disagreement than agreement between raters. Krippendorff's alpha emulates, without any advantage, Scott's pi in cases with nominal variables and two raters. Dice's F1 and McNemar's chi-squared incompletely assess the information of the contingency table, showing the poorest performance among all. We concluded that Cohen's kappa is a measurement of association and McNemar's chi-squared assess neither association nor agreement; the only two authentic agreement estimators are Holley and Guilford's G and Gwet's AC1. The latter two estimators also showed the best performance over the range of table sizes and should be considered as the first choices for agreement measurement in contingency 2x2 tables. All procedures and data were implemented in R and are available to download from Harvard Dataverse https://doi.org/10.7910/DVN/HMYTCK.

Subject(s)

Dissent and Disputes , Humans , Observer Variation , Reproducibility of Results

2.

Measuring Interactions in Categorical Datasets Using Multivariate Symmetrical Uncertainty.

Gómez-Guerrero, Santiago; Ortiz, Inocencio; Sosa-Cabrera, Gustavo; García-Torres, Miguel; Schaerer, Christian E.

Entropy (Basel) ; 24(1)2021 Dec 30.

Article in English | MEDLINE | ID: mdl-35052090

ABSTRACT

Interaction between variables is often found in statistical models, and it is usually expressed in the model as an additional term when the variables are numeric. However, when the variables are categorical (also known as nominal or qualitative) or mixed numerical-categorical, defining, detecting, and measuring interactions is not a simple task. In this work, based on an entropy-based correlation measure for n nominal variables (named as Multivariate Symmetrical Uncertainty (MSU)), we propose a formal and broader definition for the interaction of the variables. Two series of experiments are presented. In the first series, we observe that datasets where some record types or combinations of categories are absent, forming patterns of records, which often display interactions among their attributes. In the second series, the interaction/non-interaction behavior of a regression model (entirely built on continuous variables) gets successfully replicated under a discretized version of the dataset. It is shown that there is an interaction-wise correspondence between the continuous and the discretized versions of the dataset. Hence, we demonstrate that the proposed definition of interaction enabled by the MSU is a valuable tool for detecting and measuring interactions within linear and non-linear models.

3.

A Memory-Efficient Encoding Method for Processing Mixed-Type Data on Machine Learning.

Lopez-Arevalo, Ivan; Aldana-Bobadilla, Edwin; Molina-Villegas, Alejandro; Galeana-Zapién, Hiram; Muñiz-Sanchez, Victor; Gausin-Valle, Saul.

Entropy (Basel) ; 22(12)2020 Dec 09.

Article in English | MEDLINE | ID: mdl-33316972

ABSTRACT

The most common machine-learning methods solve supervised and unsupervised problems based on datasets where the problem's features belong to a numerical space. However, many problems often include data where numerical and categorical data coexist, which represents a challenge to manage them. To transform categorical data into a numeric form, preprocessing tasks are compulsory. Methods such as one-hot and feature-hashing have been the most widely used encoding approaches at the expense of a significant increase in the dimensionality of the dataset. This effect introduces unexpected challenges to deal with the overabundance of variables and/or noisy data. In this regard, in this paper we propose a novel encoding approach that maps mixed-type data into an information space using Shannon's Theory to model the amount of information contained in the original data. We evaluated our proposal with ten mixed-type datasets from the UCI repository and two datasets representing real-world problems obtaining promising results. For demonstrating the performance of our proposal, this was applied for preparing these datasets for classification, regression, and clustering tasks. We demonstrate that our encoding proposal is remarkably superior to one-hot and feature-hashing encoding in terms of memory efficiency. Our proposal can preserve the information conveyed by the original data.

4.

The multinomial logistic regression model for predicting the discharge status after liver transplantation: estimation and diagnostics analysis.

Hashimoto, E M; Ortega, E M M; Cordeiro, G M; Suzuki, A K; Kattan, M W.

J Appl Stat ; 47(12): 2159-2177, 2020.

Article in English | MEDLINE | ID: mdl-35706842

ABSTRACT

The multinomial logistic regression model (MLRM) can be interpreted as a natural extension of the binomial model with logit link function to situations where the response variable can have three or more possible outcomes. In addition, when the categories of the response variable are nominal, the MLRM can be expressed in terms of two or more logistic models and analyzed in both frequentist and Bayesian approaches. However, few discussions about post modeling in categorical data models are found in the literature, and they mainly use Bayesian inference. The objective of this work is to present classic and Bayesian diagnostic measures for categorical data models. These measures are applied to a dataset (status) of patients undergoing kidney transplantation.

5.

Compressed kNN: K-Nearest Neighbors with Data Compression.

Salvador-Meneses, Jaime; Ruiz-Chavez, Zoila; Garcia-Rodriguez, Jose.

Entropy (Basel) ; 21(3)2019 Feb 28.

Article in English | MEDLINE | ID: mdl-33266949

ABSTRACT

The kNN (k-nearest neighbors) classification algorithm is one of the most widely used non-parametric classification methods, however it is limited due to memory consumption related to the size of the dataset, which makes them impractical to apply to large volumes of data. Variations of this method have been proposed, such as condensed KNN which divides the training dataset into clusters to be classified, other variations reduce the input dataset in order to apply the algorithm. This paper presents a variation of the kNN algorithm, of the type structure less NN, to work with categorical data. Categorical data, due to their nature, can be compressed in order to decrease the memory requirements at the time of executing the classification. The method proposes a previous phase of compression of the data to then apply the algorithm on the compressed data. This allows us to maintain the whole dataset in memory which leads to a considerable reduction of the amount of memory required. Experiments and tests carried out on known datasets show the reduction in the volume of information stored in memory and maintain the accuracy of the classification. They also show a slight decrease in processing time because the information is decompressed in real time (on-the-fly) while the algorithm is running.

6.

Estimate of genetic parameters for carcass traits and visual scores inmeat sheep using Bayesian inference via threshold and linear models / Estimativa de parâmetros genéticos para características de carcaça e escore corporal em ovinos de corte usando Inferência Bayesiana e modelos de limiar e linear

Figueiredo Filho, Luiz Antonio Silva; Sarmento, José Lindenberg Rocha; Ó, Alan Oliveira do; Santos, Natanael Pereira da Silva; Sena, Luciano Silva; de Sousa Júnior, Antonio.

Ciênc. rural (Online) ; 47(3): 1-6, 2017. ilus, tab

Article in English | VETINDEX | ID: biblio-1479875

ABSTRACT

The aim of this study was to estimate the variance components and genetic parameters for marbling in the ribeye area (MRA) and body condition score (BCS) using Bayesian inference via mixed linear and threshold animal models. Data were obtained from Santa Ines breed sheep reared in the Brazilian Mid-North region. Analyses considering the Monte Carlo methods were performed with Markov chains from 500000 cycles onward. A 200000-cycle initial burn-in was considered with values taken at every 250 cycles, in a total of 1200 samples. The Monte Carlo Error deviations were low for the means heritability in all chains by both linear and threshold models. Additive variances estimated by threshold model were higher than those estimated by the linear model. Marble meat from the ribeye area and body condition score can be used as selection criteria to obtain genetic progress in Santa Inês sheep.

O objetivo deste estudo foi estimar componentes de variância e parâmetros genéticos para marmoreio na área de olho de lombo (MOL) e escore de condição corporal (ECC) usando Inferência Bayesiana por meio dos modelos animais linear misto e de limiar. Os dados foram obtidos em rebanhos de ovinos da raça Santa Inês criados no Meio-Norte do Brasil. As análises, considerando os métodos de Monte Carlo, foram realizadas com cadeias de Markov a partir de 500000 ciclos. Considerou-se burn-in inicial de 200000 ciclos com valores tomados a cada 250 ciclos, obtendo-se 1200 amostras. Os desvios do erro de Monte Carlo foram baixos para as herdabilidades médias em todas as cadeias, tanto pelo modelo linear quanto de limiar. As variâncias aditivas, estimadas pelo modelo de limiar, foram maiores que aquelas obtidas pelo modelo linear. O marmoreio na área de olho de lombo e o escore da condição corporal podem ser usados como critérios de seleção para obter progresso genético em ovinos da raça Santa Inês.

Subject(s)

Animals , Sheep , Reference Standards , Bayes Theorem , Fats , Linear Models

7.

Estimate of genetic parameters for carcass traits and visual scores inmeat sheep using Bayesian inference via threshold and linear models / Estimativa de parâmetros genéticos para características de carcaça e escore corporal em ovinos de corte usando Inferência Bayesiana e modelos de limiar e linear

Figueiredo Filho, Luiz Antonio Silva; Sarmento, José Lindenberg Rocha; Ó, Alan Oliveira do; Santos, Natanael Pereira da Silva; Sena, Luciano Silva; de Sousa Júnior, Antonio.

Ci. Rural ; 47(3): 1-6, 2017. ilus, tab

Article in English | VETINDEX | ID: vti-686955

ABSTRACT

The aim of this study was to estimate the variance components and genetic parameters for marbling in the ribeye area (MRA) and body condition score (BCS) using Bayesian inference via mixed linear and threshold animal models. Data were obtained from Santa Ines breed sheep reared in the Brazilian Mid-North region. Analyses considering the Monte Carlo methods were performed with Markov chains from 500000 cycles onward. A 200000-cycle initial burn-in was considered with values taken at every 250 cycles, in a total of 1200 samples. The Monte Carlo Error deviations were low for the means heritability in all chains by both linear and threshold models. Additive variances estimated by threshold model were higher than those estimated by the linear model. Marble meat from the ribeye area and body condition score can be used as selection criteria to obtain genetic progress in Santa Inês sheep. (AU)

O objetivo deste estudo foi estimar componentes de variância e parâmetros genéticos para marmoreio na área de olho de lombo (MOL) e escore de condição corporal (ECC) usando Inferência Bayesiana por meio dos modelos animais linear misto e de limiar. Os dados foram obtidos em rebanhos de ovinos da raça Santa Inês criados no Meio-Norte do Brasil. As análises, considerando os métodos de Monte Carlo, foram realizadas com cadeias de Markov a partir de 500000 ciclos. Considerou-se burn-in inicial de 200000 ciclos com valores tomados a cada 250 ciclos, obtendo-se 1200 amostras. Os desvios do erro de Monte Carlo foram baixos para as herdabilidades médias em todas as cadeias, tanto pelo modelo linear quanto de limiar. As variâncias aditivas, estimadas pelo modelo de limiar, foram maiores que aquelas obtidas pelo modelo linear. O marmoreio na área de olho de lombo e o escore da condição corporal podem ser usados como critérios de seleção para obter progresso genético em ovinos da raça Santa Inês. (AU)

Subject(s)

Animals , Reference Standards , Sheep , Bayes Theorem , Linear Models , Fats

8.

Estimate of genetic parameters for carcass traits and visual scores inmeat sheep using Bayesian inference via threshold and linear models

Antonio Silva Figueiredo Filho, Luiz; Lindenberg Rocha Sarmento, José; Oliveira do Ó, Alan; Pereira da Silva Santos, Natanael; Silva Sena, Luciano; de Sousa Júnior, Antonio.

Ci. Rural ; 47(3)2017.

Article in English | VETINDEX | ID: vti-710038

ABSTRACT

ABSTRACT: The aim of this study was to estimate the variance components and genetic parameters for marbling in the ribeye area (MRA) and body condition score (BCS) using Bayesian inference via mixed linear and threshold animal models. Data were obtained from Santa Ines breed sheep reared in the Brazilian Mid-North region. Analyses considering the Monte Carlo methods were performed with Markov chains from 500000 cycles onward. A 200000-cycle initial burn-in was considered with values taken at every 250 cycles, in a total of 1200 samples. The Monte Carlo Error deviations were low for the means heritability in all chains by both linear and threshold models. Additive variances estimated by threshold model were higher than those estimated by the linear model. Marble meat from the ribeye area and body condition score can be used as selection criteria to obtain genetic progress in Santa Inês sheep.

RESUMO: O objetivo deste estudo foi estimar componentes de variância e parâmetros genéticos para marmoreio na área de olho de lombo (MOL) e escore de condição corporal (ECC) usando Inferência Bayesiana por meio dos modelos animais linear misto e de limiar. Os dados foram obtidos em rebanhos de ovinos da raça Santa Inês criados no Meio-Norte do Brasil. As análises, considerando os métodos de Monte Carlo, foram realizadas com cadeias de Markov a partir de 500000 ciclos. Considerou-se burn-in inicial de 200000 ciclos com valores tomados a cada 250 ciclos, obtendo-se 1200 amostras. Os desvios do erro de Monte Carlo foram baixos para as herdabilidades médias em todas as cadeias, tanto pelo modelo linear quanto de limiar. As variâncias aditivas, estimadas pelo modelo de limiar, foram maiores que aquelas obtidas pelo modelo linear. O marmoreio na área de olho de lombo e o escore da condição corporal podem ser usados como critérios de seleção para obter progresso genético em ovinos da raça Santa Inês.

9.

Bifactorial Structure of Locus of Control Cross-culturally Invariant across Spain, Chile and United Kingdom

Suárez-Álvarez, Javier; García-Cueto, Eduardo; Pedrosa, Ignacio; Muñiz, José.

Actual. psicol. (Impr.) ; 29(119)dic. 2015.

Article in English | LILACS-Express | LILACS | ID: biblio-1505546

ABSTRACT

Locus of control (LOC) is a variable often studied owing to the important role that it plays in different contexts. Nonetheless, there is no unanimous agreement about how many dimensions make up the factorial structure of the locus of control. The goal of this research was to add new evidence of cross-cultural validity in relation to the bifactorial invariance of the LOC. The test was given to a total of 1781 participants from Spain (697), Chile (890) and The United Kingdom (194). The study of the factorial invariance between the groups was carried out using multigroup confirmatory factor analysis models for ordered-categorical data. The progressive evaluation of factorial invariance confirms that factor loadings, thresholds and error variances are invariant across groups. Relevant cross-cultural differences in LOC between Spain, Chile, and United Kingdom were not found (PS < .50).

El locus de control (LOC) es una variable frecuentemente evaluada debido a que juega un importante papel en diferentes contextos. Sin embargo, no existe un acuerdo unánime sobre cuántas dimensiones componen la estructura factorial del locus de control. El objetivo de esta investigación fue añadir nuevas evidencias de validez transcultural en relación con la invarianza bifactorial del LOC. El test fue aplicado a un total de 1781 participantes procedentes de España (697), Chile (890) y Reino Unido (194). El estudio de la invarianza factorial entre los grupos se realizó utilizando modelos de análisis factorial confirmatorio multigrupo para datos categóricamente ordenados. La evaluación progresiva de la invarianza factorial confirma que las cargas factoriales, los umbrables, y las varianzas de los errores son invariantes a través de los grupos. Finalmente, no se encontraron diferencias transculturales relevantes en LOC entre España, Chile y Reino Unido (PS < .50).

10.

A precarização do emprego na Europa / Precarious employment in Europe / La fragilité de l'emploi en Europe

Oliveira, Luísa; Carvalho, Helena.

Dados rev. ciênc. sociais ; Dados rev. ciênc. sociais;51(3): 541-567, 2008. graf

Article in Portuguese | LILACS | ID: lil-598438

ABSTRACT

This article examines the hypothesis of the emergence of a post-Fordist wage relationship, exploring one of this concept's components: the transformation of permanent employment into precarious employment. Based on Eurostat data, the article analyzes the evolution of the temporary work indicator in the last twenty years and the reasons that lead workers to accept this situation, according to a generational matrix. The article concludes that there has been a structural change to the extent that all countries have moved towards greater flexibilization of employment relations through liberalization of layoffs, expansion of temporary work, or a combination of the two.

Dans cet article, on cherche à poser l'hypothèse du surgissement d'un rapport salarial postfordiste, en examinant l'un des composants de ce concept: la transformation de l'emploi permanent en emploi précaire. À partir des données de l'Eurostat, on examine l'évolution du chiffre du travail temporaire dans les 20 dernières années et les raisons qui mènent les gens à accepter cette situation d'après une matrice générationnelle. Les conclusions vont dans le sens d'un changement structurel, dans la mesure où tous les pays s'orientent vers une plus grande déréglementation des rapports dans l'emploi, soit par la déréglementation des licenciements, soit par l'expansion du travail temporaire, ou par la combinaison des deux.

11.

Mensuração da variação em saúde, por escores ordinais: aspectos técnicos e práticos e proposta de um novo indicador

Ferreira, Mário Luiz Pinto.

Rio de Janeiro; s.n; 2007. 62 p. tab, graf.

Thesis in Portuguese | LILACS, Coleciona SUS, Inca | ID: biblio-934705

ABSTRACT

A mensuração da variação com variáveis ordinais, de uma mesma unidade, entre dois momentos ao longo do tempo (antes e depois), no contexto da saúde,é um tema de muita controvérsia. Os pontos principais desta discussão são: a) asubjetividade (imprecisão) do dado categórico ordinal e b) o tratamento estatístico mais adequado a ser usado nas análises dos resultados.Neste trabalho ressaltam-se os aspectos técnicos e práticos, que afetam diretamente a forma de mensuração da variação e a interpretação de seus resultados decorrente do tratamento estatístico aplicado. Além disso, é apresentado um novo indicador para ser aplicado na mensuração da variação com dados categóricos ordinais.

The measurement of the variation with ordinal variables, of a same unit,between two moments along the time before and later, in the context of health science, is a controversial theme. The main points of this discussion are: the subjectivity of theordinal categorical data and the appropriate statistical treatment to be used in theanalyses of the results.This lecture emphasizes technical and practical aspects that affect theform to measure the variation and also affect the interpretation of the results directly due to the applied statistical treatment. Besides of that this work also presents a new indicator to be applied in the measurement of the variation with ordinal categorical data.

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL