ABSTRACT
To study linguistically coded concepts, researchers often resort to the Property Listing Task (PLT). In a PLT, participants are asked to list properties that describe a concept (e.g., for DOG, subjects may list "is a pet", "has four legs", etc.). When PLT data is collected for many concepts, researchers obtain Conceptual Properties Norms (CPNs), which are used to study semantic content and as a source of control variables. Though the PLT and CPNs are widely used across psychology, only recently a model that describes the listing course of a PLT has been developed and validated. That original model describes the listing course using order of production of properties. Here we go a step beyond and validate the model using response times (RT), i.e., the time from cue onset to property listing. Our results show that RT data exhibits the same regularities observed in the previous model, but now we can also analyze the time course, i.e., dynamics of the PLT. As such, the RT validated model may be applied to study several similar memory retrieval tasks, such as the Free Listing Task, Verbal Fluidity Task, and to research related cognitive processes. To illustrate those kinds of analyses, we present a brief example of the difference in PLT's dynamics between listing properties for abstract versus concrete concepts, which shows that the model may be fruitfully applied to study concepts.
Subject(s)
Memory , Semantics , Humans , Reaction TimeABSTRACT
In this paper, we present a novel algorithm that uses machine learning and natural language processing techniques to facilitate the coding of feature listing data. Feature listing is a method in which participants are asked to provide a list of features that are typically true of a given concept or word. This method is commonly used in research studies to gain insights into people's understanding of various concepts. The standard procedure for extracting meaning from feature listings is to manually code the data, which can be time-consuming and prone to errors, leading to reliability concerns. Our algorithm aims at addressing these challenges by automatically assigning human-created codes to feature listing data that achieve a quantitatively good agreement with human coders. Our preliminary results suggest that our algorithm has the potential to improve the efficiency and accuracy of content analysis of feature listing data. Additionally, this tool is an important step toward developing a fully automated coding algorithm, which we are currently preliminarily devising.
ABSTRACT
In conceptual properties norming studies (CPNs), participants list properties that describe a set of concepts. From CPNs, many different parameters are calculated, such as semantic richness. A generally overlooked issue is that those values are only point estimates of the true unknown population parameters. In the present work, we present an R package that allows us to treat those values as population parameter estimates. Relatedly, a general practice in CPNs is using an equal number of participants who list properties for each concept (i.e., standardizing sample size). As we illustrate through examples, this procedure has negative effects on data's statistical analyses. Here, we argue that a better method is to standardize coverage (i.e., the proportion of sampled properties to the total number of properties that describe a concept), such that a similar coverage is achieved across concepts. When standardizing coverage rather than sample size, it is more likely that the set of concepts in a CPN all exhibit a similar representativeness. Moreover, by computing coverage the researcher can decide whether the CPN reached a sufficiently high coverage, so that its results might be generalizable to other studies. The R package we make available in the current work allows one to compute coverage and to estimate the necessary number of participants to reach a target coverage. We show this sampling procedure by using the R package on real and simulated CPN data.
Subject(s)
Research Design , Semantics , Humans , Sample SizeABSTRACT
Agreement probability p(a) is a homogeneity measure of lists of properties produced by participants in a Property Listing Task (PLT) for a concept. Agreement probability's mathematical properties allow a rich analysis of property-based descriptions. To illustrate, we use p(a) to delve into the differences between concrete and abstract concepts in sighted and blind populations. Results show that concrete concepts are more homogeneous within sighted and blind groups than abstract ones (i.e., exhibit a higher p(a) than abstract ones) and that concrete concepts in the blind group are less homogeneous than in the sighted sample. This supports the idea that listed properties for concrete concepts should be more similar across subjects due to the influence of visual/perceptual information on the learning process. In contrast, abstract concepts are learned based mainly on social and linguistic information, which exhibit more variability among people, thus, making the listed properties more dissimilar across subjects. Relative to abstract concepts, the difference in p(a) between sighted and blind is not statistically significant. Though this is a null result, and should be considered with care, it is expected because abstract concepts should be learned by paying attention to the same social and linguistic input in both, blind and sighted, and thus, there is no reason to expect that the respective lists of properties should differ. Finally, we used p(a) to classify concrete and abstract concepts with a good level of certainty. All these analyses suggest that p(a) can be fruitfully used to study data obtained in a PLT.
ABSTRACT
We use a feature-based association model to fit grouped and individual level category learning and transfer data. The model assumes that people use corrective feedback to learn individual feature to categorization-criterion correlations and combine those correlations additively to produce classifications. The model is an Adaptive Linear Filter (ALF) with logistic output function and Least Mean Squares learning algorithm. Categorization probabilities are computed by a logistic function. Our data span over 31 published data sets. Both at grouped and individual level analysis levels, the model performs remarkably well, accounting for large amounts of available variances. When fitted to grouped data, it outperforms alternative models. When fitted to individual level data, it is able to capture learning and transfer performance with high explained variances. Notably, the model achieves its fits with a very minimal number of free parameters. We discuss the ALF's advantages as a model of procedural categorization, in terms of its simplicity, its ability to capture empirical trends and its ability to solve challenges to other associative models. In particular, we discuss why the model is not equivalent to a prototype model, as previously thought.
Subject(s)
Probability , HumansABSTRACT
Conceptual properties norming studies (CPNs) ask participants to produce properties that describe concepts. From that data, different metrics may be computed (e.g., semantic richness, similarity measures), which are then used in studying concepts and as a source of carefully controlled stimuli for experimentation. Notwithstanding those metrics' demonstrated usefulness, researchers have customarily overlooked that they are only point estimates of the true unknown population values, and therefore, only rough approximations. Thus, though research based on CPN data may produce reliable results, those results are likely to be general and coarse-grained. In contrast, we suggest viewing CPNs as parameter estimation procedures, where researchers obtain only estimates of the unknown population parameters. Thus, more specific and fine-grained analyses must consider those parameters' variability. To this end, we introduce a probabilistic model from the field of ecology. Its related statistical expressions can be applied to compute estimates of CPNs' parameters and their corresponding variances. Furthermore, those expressions can be used to guide the sampling process. The traditional practice in CPN studies is to use the same number of participants across concepts, intuitively believing that practice will render the computed metrics comparable across concepts and CPNs. In contrast, the current work shows why an equal number of participants per concept is generally not desirable. Using CPN data, we show how to use the equations and discuss how they may allow more reasonable analyses and comparisons of parameter values among different concepts in a CPN, and across different CPNs.
Subject(s)
Semantics , HumansABSTRACT
Asking subjects to list semantic properties for concepts is essential for predicting performance in several linguistic and non-linguistic tasks and for creating carefully controlled stimuli for experiments. The property elicitation task and the ensuing norms are widely used across the field, to investigate the organization of semantic memory and design computational models thereof. The contributions of the current Special Topic discuss several core issues concerning how semantic property norms are constructed and how they may be used for research aiming at understanding cognitive processing.
Subject(s)
Linguistics , Semantics , Comprehension , Humans , MemoryABSTRACT
To study concepts that are coded in language, researchers often collect lists of conceptual properties produced by human subjects. From these data, different measures can be computed. In particular, inter-concept similarity is an important variable used in experimental studies. Among possible similarity measures, the cosine of conceptual property frequency vectors seems to be a de facto standard. However, there is a lack of comparative studies that test the merit of different similarity measures when computed from property frequency data. The current work compares four different similarity measures (cosine, correlation, Euclidean and Chebyshev) and five different types of data structures. To that end, we compared the informational content (i.e., entropy) delivered by each of those 4 × 5 = 20 combinations, and used a clustering procedure as a concrete example of how informational content affects statistical analyses. Our results lead us to conclude that similarity measures computed from lower-dimensional data fare better than those calculated from higher-dimensional data, and suggest that researchers should be more aware of data sparseness and dimensionality, and their consequences for statistical analyses.
Subject(s)
Algorithms , Language , Cluster Analysis , HumansABSTRACT
It is generally believed that concepts can be characterized by their properties (or features). When investigating concepts encoded in language, researchers often ask subjects to produce lists of properties that describe them (i.e., the Property Listing Task, PLT). These lists are accumulated to produce Conceptual Property Norms (CPNs). CPNs contain frequency distributions of properties for individual concepts. It is widely believed that these distributions represent the underlying semantic structure of those concepts. Here, instead of focusing on the underlying semantic structure, we aim at characterizing the PLT. An often disregarded aspect of the PLT is that individuals show intersubject variability (i.e., they produce only partially overlapping lists). In our study we use a mathematical analysis of this intersubject variability to guide our inquiry. To this end, we resort to a set of publicly available norms that contain information about the specific properties that were informed at the individual subject level. Our results suggest that when an individual is performing the PLT, he or she generates a list of properties that is a mixture of general and distinctive properties, such that there is a non-linear tendency to produce more general than distinctive properties. Furthermore, the low generality properties are precisely those that tend not to be repeated across lists, accounting in this manner for part of the intersubject variability. In consequence, any manipulation that may affect the mixture of general and distinctive properties in lists is bound to change intersubject variability. We discuss why these results are important for researchers using the PLT.