Search | VHL Regional Portal

1.

Universal and cultural factors shape body part vocabularies.

Tjuka, Annika; Forkel, Robert; List, Johann-Mattis.

Sci Rep ; 14(1): 10486, 2024 05 07.

Article in English | MEDLINE | ID: mdl-38714717

ABSTRACT

Every human has a body. Yet, languages differ in how they divide the body into parts to name them. While universal naming strategies exist, there is also variation in the vocabularies of body parts across languages. In this study, we investigate the similarities and differences in naming two separate body parts with one word, i.e., colexifications. We use a computational approach to create networks of body part vocabularies across languages. The analyses focus on body part networks in large language families, on perceptual features that lead to colexifications of body parts, and on a comparison of network structures in different semantic domains. Our results show that adjacent body parts are colexified frequently. However, preferences for perceptual features such as shape and function lead to variations in body part vocabularies. In addition, body part colexification networks are less varied across language families than networks in the semantic domains of emotion and colour. The study presents the first large-scale comparison of body part vocabularies in 1,028 language varieties and provides important insights into the variability of a universal human domain.

Subject(s)

Language , Semantics , Vocabulary , Humans , Human Body , Culture

2.

Artifact geochemistry demonstrates long-distance voyaging in the Polynesian Outliers.

Hermann, Aymeric; Gutiérrez, Pamela; Chauvel, Catherine; Maury, René; Liorzou, Céline; Willie, Edson; Phillip, Iarawai; Forkel, Robert; Rzymski, Christoph; Bedford, Stuart.

Sci Adv ; 9(16): eadf4487, 2023 Apr 21.

Article in English | MEDLINE | ID: mdl-37083531

ABSTRACT

Although the peopling of Remote Oceania is well-documented as a general process of eastward migrations from Island Southeast Asia and Near Oceania toward the archipelagos of Remote Oceania, the origin and the development of Polynesian societies in the Western Pacific (Polynesian Outliers), far away from the Polynesian triangle, remain unclear. Here, we present a large-scale geochemical sourcing study of stone artifacts excavated from archeological sites in central Vanuatu, the Solomon Islands, and the Caroline Islands and provide unambiguous evidence of multiple long-distance voyages, with exotic stone materials being transported up to 2500 kilometers from their source. Our results emphasize high mobility in the Western Pacific during the last millennium CE and offer insights on the scale and timing of contacts between the Polynesian Outliers, their neighbors in the Western Pacific, and societies of Western Polynesia.

3.

A global analysis of matches and mismatches between human genetic and linguistic histories.

Barbieri, Chiara; Blasi, Damián E; Arango-Isaza, Epifanía; Sotiropoulos, Alexandros G; Hammarström, Harald; Wichmann, Søren; Greenhill, Simon J; Gray, Russell D; Forkel, Robert; Bickel, Balthasar; Shimizu, Kentaro K.

Proc Natl Acad Sci U S A ; 119(47): e2122084119, 2022 11 22.

Article in English | MEDLINE | ID: mdl-36399547

ABSTRACT

Human history is written in both our genes and our languages. The extent to which our biological and linguistic histories are congruent has been the subject of considerable debate, with clear examples of both matches and mismatches. To disentangle the patterns of demographic and cultural transmission, we need a global systematic assessment of matches and mismatches. Here, we assemble a genomic database (GeLaTo, or Genes and Languages Together) specifically curated to investigate genetic and linguistic diversity worldwide. We find that most populations in GeLaTo that speak languages of the same language family (i.e., that descend from the same ancestor language) are also genetically highly similar. However, we also identify nearly 20% mismatches in populations genetically close to linguistically unrelated groups. These mismatches, which occur within the time depth of known linguistic relatedness up to about 10,000 y, are scattered around the world, suggesting that they are a regular outcome in human history. Most mismatches result from populations shifting to the language of a neighboring population that is genetically different because of independent demographic histories. In line with the regularity of such shifts, we find that only half of the language families in GeLaTo are genetically more cohesive than expected under spatial autocorrelations. Moreover, the genetic and linguistic divergence times of population pairs match only rarely, with Indo-European standing out as the family with most matches in our sample. Together, our database and findings pave the way for systematically disentangling demographic and cultural history and for quantifying processes of shifts in language and social identities on a global scale.

Subject(s)

Genetic Variation , Linguistics , Humans , Language , Human Genetics

4.

Linking norms, ratings, and relations of words and concepts across multiple language varieties.

Tjuka, Annika; Forkel, Robert; List, Johann-Mattis.

Behav Res Methods ; 54(2): 864-884, 2022 04.

Article in English | MEDLINE | ID: mdl-34357536

ABSTRACT

Psychologists and linguists collect various data on word and concept properties. In psychology, scholars have accumulated norms and ratings for a large number of words in languages with many speakers. In linguistics, scholars have accumulated cross-linguistic information about the relations between words and concepts. Until now, however, there have been no efforts to combine information from the two fields, which would allow comparison of psychological and linguistic properties across different languages. The Database of Cross-Linguistic Norms, Ratings, and Relations for Words and Concepts (NoRaRe) is the first attempt to close this gap. Building on a reference catalog that offers standardization of concepts used in historical and typological language comparison, it integrates data from psychology and linguistics, collected from 98 data sets, covering 65 unique properties for 40 languages. The database is curated with the help of manual, automated, semi-automated workflows and uses a software API to control and access the data. The database is accessible via a web application, the software API, or using scripting languages. In this study, we present how the database is structured, how it can be extended, and how we control the quality of the data curation process. To illustrate its application, we present three case studies that test the validity of our approach, the accuracy of our workflows, and the integrative potential of the database. Due to regular version updates, the NoRaRe database has the potential to advance research in psychology and linguistics by offering researchers an integrated perspective on both fields.

Subject(s)

Language , Linguistics , Data Management , Databases, Factual , Humans , Reference Standards

5.

Curating and extending data for language comparison in Concepticon and NoRaRe.

Tjuka, Annika; Forkel, Robert; List, Johann-Mattis.

Open Res Eur ; 2: 141, 2022.

Article in English | MEDLINE | ID: mdl-37645322

ABSTRACT

Language comparison requires user-friendly tools that facilitate the standardization of linguistic data. We present two resources built on the basis of a standardized cross-linguistic format and show how the data is curated and extended. The first resource, the Concepticon, is a reference catalog for standardized concepts from linguistic research. While curating the Concepticon, we found that a variety of studies in distinct research fields collected information on word properties. However, until recently, no resource existed that contained these data to enable the comparison of the different word properties across languages. This gap was filled by the Database of Norms, Ratings, and Relations (NoRaRe), which is an extension of the Concepticon. Here, we present the major release of both resources - Concepticon Version 3.0 and NoRaRe Version 1.0 - which represents an important step in our data development. We show that extending and adapting the data curation workflow in Concepticon to NoRaRe is useful for the standardization of cross-linguistic datasets. In addition, combining datasets from different research fields enables studies grounded in language comparison. Concepticon and NoRaRe include lexical data for various languages, tools for test-driven data curation, and the possibility for data reuse. The first major release of NoRaRe is also accompanied by a new web application that allows convenient access to the data.

6.

Automated identification of borrowings in multilingual wordlists.

List, Johann-Mattis; Forkel, Robert.

Open Res Eur ; 1: 79, 2021.

Article in English | MEDLINE | ID: mdl-37645101

ABSTRACT

Although lexical borrowing is an important aspect of language evolution, there have been few attempts to automate the identification of borrowings in lexical datasets. Moreover, none of the solutions which have been proposed so far identify borrowings across multiple languages. This study proposes a new method for the task and tests it on a newly compiled large comparative dataset of 48 South-East Asian languages from Southern China. The method yields very promising results, while it is conceptually straightforward and easy to apply. This makes the approach a perfect candidate for computer-assisted exploratory studies on lexical borrowing in contact areas.

7.

Pofatu, a curated and open-access database for geochemical sourcing of archaeological materials.

Hermann, Aymeric; Forkel, Robert; McAlister, Andrew; Cruickshank, Arden; Golitko, Mark; Kneebone, Brendan; McCoy, Mark; Reepmeyer, Christian; Sheppard, Peter; Sinton, John; Weisler, Marshall.

Sci Data ; 7(1): 141, 2020 05 11.

Article in English | MEDLINE | ID: mdl-32393758

ABSTRACT

Compositional analyses have long been used to determine the geological sources of artefacts. Geochemical "fingerprinting" of artefacts and sources is the most effective way to reconstruct strategies of raw material and artefact procurement, exchange or interaction systems, and mobility patterns during prehistory. The efficacy and popularity of geochemical sourcing has led to many projects using various analytical techniques to produce independent datasets. In order to facilitate access to this growing body of data and to promote comparability and reproducibility in provenance studies, we designed Pofatu, the first online and open-access database to present geochemical compositions and contextual information for archaeological sources and artefacts in a form that can be readily accessed by the scientific community. This relational database currently contains 7759 individual samples from archaeological sites and geological sources across the Pacific Islands. Each sample is comprehensively documented and includes elemental and isotopic compositions, detailed archaeological provenance, and supporting analytical metadata, such as sampling processes, analytical procedures, and quality control.

8.

The Database of Cross-Linguistic Colexifications, reproducible analysis of cross-linguistic polysemies.

Rzymski, Christoph; Tresoldi, Tiago; Greenhill, Simon J; Wu, Mei-Shin; Schweikhard, Nathanael E; Koptjevskaja-Tamm, Maria; Gast, Volker; Bodt, Timotheus A; Hantgan, Abbie; Kaiping, Gereon A; Chang, Sophie; Lai, Yunfan; Morozova, Natalia; Arjava, Heini; Hübler, Nataliia; Koile, Ezequiel; Pepper, Steve; Proos, Mariann; Van Epps, Briana; Blanco, Ingrid; Hundt, Carolin; Monakhov, Sergei; Pianykh, Kristina; Ramesh, Sallona; Gray, Russell D; Forkel, Robert; List, Johann-Mattis.

Sci Data ; 7(1): 13, 2020 01 13.

Article in English | MEDLINE | ID: mdl-31932593

ABSTRACT

Advances in computer-assisted linguistic research have been greatly influential in reshaping linguistic research. With the increasing availability of interconnected datasets created and curated by researchers, more and more interwoven questions can now be investigated. Such advances, however, are bringing high requirements in terms of rigorousness for preparing and curating datasets. Here we present CLICS, a Database of Cross-Linguistic Colexifications (CLICS). CLICS tackles interconnected interdisciplinary research questions about the colexification of words across semantic categories in the world's languages, and show-cases best practices for preparing data for cross-linguistic research. This is done by addressing shortcomings of an earlier version of the database, CLICS2, and by supplying an updated version with CLICS3, which massively increases the size and scope of the project. We provide tools and guidelines for this purpose and discuss insights resulting from organizing student tasks for database updates.

Subject(s)

Databases, Factual , Linguistics , Humans , Language

9.

Emotion semantics show both cultural variation and universal structure.

Jackson, Joshua Conrad; Watts, Joseph; Henry, Teague R; List, Johann-Mattis; Forkel, Robert; Mucha, Peter J; Greenhill, Simon J; Gray, Russell D; Lindquist, Kristen A.

Science ; 366(6472): 1517-1522, 2019 12 20.

Article in English | MEDLINE | ID: mdl-31857485

ABSTRACT

Many human languages have words for emotions such as "anger" and "fear," yet it is not clear whether these emotions have similar meanings across languages, or why their meanings might vary. We estimate emotion semantics across a sample of 2474 spoken languages using "colexification"-a phenomenon in which languages name semantically related concepts with the same word. Analyses show significant variation in networks of emotion concept colexification, which is predicted by the geographic proximity of language families. We also find evidence of universal structure in emotion colexification networks, with all families differentiating emotions primarily on the basis of hedonic valence and physiological activation. Our findings contribute to debates about universality and diversity in how humans understand and experience emotion.

Subject(s)

Anger , Cross-Cultural Comparison , Fear , Language , Humans , Semantics

10.

Cross-Linguistic Data Formats, advancing data sharing and re-use in comparative linguistics.

Forkel, Robert; List, Johann-Mattis; Greenhill, Simon J; Rzymski, Christoph; Bank, Sebastian; Cysouw, Michael; Hammarström, Harald; Haspelmath, Martin; Kaiping, Gereon A; Gray, Russell D.

Sci Data ; 5: 180205, 2018 10 16.

Article in English | MEDLINE | ID: mdl-30325347

ABSTRACT

The amount of available digital data for the languages of the world is constantly increasing. Unfortunately, most of the digital data are provided in a large variety of formats and therefore not amenable for comparison and re-use. The Cross-Linguistic Data Formats initiative proposes new standards for two basic types of data in historical and typological language comparison (word lists, structural datasets) and a framework to incorporate more data types (e.g. parallel texts, and dictionaries). The new specification for cross-linguistic data formats comes along with a software package for validation and manipulation, a basic ontology which links to more general frameworks, and usage examples of best practices.

11.

BEASTling: A software tool for linguistic phylogenetics using BEAST 2.

Maurits, Luke; Forkel, Robert; Kaiping, Gereon A; Atkinson, Quentin D.

PLoS One ; 12(8): e0180908, 2017.

Article in English | MEDLINE | ID: mdl-28796784

ABSTRACT

We present a new open source software tool called BEASTling, designed to simplify the preparation of Bayesian phylogenetic analyses of linguistic data using the BEAST 2 platform. BEASTling transforms comparatively short and human-readable configuration files into the XML files used by BEAST to specify analyses. By taking advantage of Creative Commons-licensed data from the Glottolog language catalog, BEASTling allows the user to conveniently filter datasets using names for recognised language families, to impose monophyly constraints so that inferred language trees are backward compatible with Glottolog classifications, or to assign geographic location data to languages for phylogeographic analyses. Support for the emerging cross-linguistic linked data format (CLDF) permits easy incorporation of data published in cross-linguistic linked databases into analyses. BEASTling is intended to make the power of Bayesian analysis more accessible to historical linguists without strong programming backgrounds, in the hopes of encouraging communication and collaboration between those developing computational models of language evolution (who are typically not linguists) and relevant domain experts.

Subject(s)

Language , Linguistics/methods , Software , Bayes Theorem , Europe , Expert Systems , Humans , Programming Languages

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL