Search | VHL Regional Portal

The Database of Cross-Linguistic Colexifications, reproducible analysis of cross-linguistic polysemies.

Rzymski, Christoph; Tresoldi, Tiago; Greenhill, Simon J; Wu, Mei-Shin; Schweikhard, Nathanael E; Koptjevskaja-Tamm, Maria; Gast, Volker; Bodt, Timotheus A; Hantgan, Abbie; Kaiping, Gereon A; Chang, Sophie; Lai, Yunfan; Morozova, Natalia; Arjava, Heini; Hübler, Nataliia; Koile, Ezequiel; Pepper, Steve; Proos, Mariann; Van Epps, Briana; Blanco, Ingrid; Hundt, Carolin; Monakhov, Sergei; Pianykh, Kristina; Ramesh, Sallona; Gray, Russell D; Forkel, Robert; List, Johann-Mattis.

Sci Data ; 7(1): 13, 2020 01 13.

Article in English | MEDLINE | ID: mdl-31932593

ABSTRACT

Advances in computer-assisted linguistic research have been greatly influential in reshaping linguistic research. With the increasing availability of interconnected datasets created and curated by researchers, more and more interwoven questions can now be investigated. Such advances, however, are bringing high requirements in terms of rigorousness for preparing and curating datasets. Here we present CLICS, a Database of Cross-Linguistic Colexifications (CLICS). CLICS tackles interconnected interdisciplinary research questions about the colexification of words across semantic categories in the world's languages, and show-cases best practices for preparing data for cross-linguistic research. This is done by addressing shortcomings of an earlier version of the database, CLICS2, and by supplying an updated version with CLICS3, which massively increases the size and scope of the project. We provide tools and guidelines for this purpose and discuss insights resulting from organizing student tasks for database updates.

Subject(s)

Databases, Factual , Linguistics , Humans , Language

LexiRumah: An online lexical database of the Lesser Sunda Islands.

Kaiping, Gereon A; Klamer, Marian.

PLoS One ; 13(10): e0205250, 2018.

Article in English | MEDLINE | ID: mdl-30332446

ABSTRACT

The Lesser Sunda Islands in eastern Indonesia cover a longitudinal distance of some 600 kilometres. They are the westernmost place where languages of the Austronesian family come into contact with a family of Papuan languages and constitute an area of high linguistic diversity. Despite its diversity, the Lesser Sundas are little studied and for most of the region, written historical records, as well as archaeological and ethnographic data are lacking. In such circumstances the study of relationships between languages through their lexicon is a unique tool for making inferences about human (pre-)history and tracing population movements. However, the lack of a collective body of lexical data has severely limited our understanding of the history of the languages and peoples in the Lesser Sundas. The LexiRumah database fills this gap by assembling lexicons of Lesser Sunda languages from published and unpublished sources, and making those lexicons available online in a consistent format. This database makes it possible for researchers to explore the linguistic data collated from different primary sources, to formulate hypotheses on how the languages of the two families might be internally related and to compare competing hypotheses about subgroupings and language contact in the region. In this article, we present observations from aggregating lexical data from sources of different type and quality, including fieldwork, and generalize our lessons learned towards practical guidelines for creating a consistent database of comparable lexical items, derived from the design and development of LexiRumah. Databases like this are instrumental in developing theories of language evolution and change in understudied regions where small-scale, pre-industrial, pre-literate societies are the majority. It is therefore vital to follow reliable design choices when creating such databases, as described in this paper.

Subject(s)

Databases, Factual , Human Migration , Language , Linguistics/methods , Research Design , Asian People , Guidelines as Topic , Humans , Indonesia , Linguistics/standards

Cross-Linguistic Data Formats, advancing data sharing and re-use in comparative linguistics.

Forkel, Robert; List, Johann-Mattis; Greenhill, Simon J; Rzymski, Christoph; Bank, Sebastian; Cysouw, Michael; Hammarström, Harald; Haspelmath, Martin; Kaiping, Gereon A; Gray, Russell D.

Sci Data ; 5: 180205, 2018 10 16.

Article in English | MEDLINE | ID: mdl-30325347

ABSTRACT

The amount of available digital data for the languages of the world is constantly increasing. Unfortunately, most of the digital data are provided in a large variety of formats and therefore not amenable for comparison and re-use. The Cross-Linguistic Data Formats initiative proposes new standards for two basic types of data in historical and typological language comparison (word lists, structural datasets) and a framework to incorporate more data types (e.g. parallel texts, and dictionaries). The new specification for cross-linguistic data formats comes along with a software package for validation and manipulation, a basic ontology which links to more general frameworks, and usage examples of best practices.

BEASTling: A software tool for linguistic phylogenetics using BEAST 2.

Maurits, Luke; Forkel, Robert; Kaiping, Gereon A; Atkinson, Quentin D.

PLoS One ; 12(8): e0180908, 2017.

Article in English | MEDLINE | ID: mdl-28796784

ABSTRACT

We present a new open source software tool called BEASTling, designed to simplify the preparation of Bayesian phylogenetic analyses of linguistic data using the BEAST 2 platform. BEASTling transforms comparatively short and human-readable configuration files into the XML files used by BEAST to specify analyses. By taking advantage of Creative Commons-licensed data from the Glottolog language catalog, BEASTling allows the user to conveniently filter datasets using names for recognised language families, to impose monophyly constraints so that inferred language trees are backward compatible with Glottolog classifications, or to assign geographic location data to languages for phylogeographic analyses. Support for the emerging cross-linguistic linked data format (CLDF) permits easy incorporation of data published in cross-linguistic linked databases into analyses. BEASTling is intended to make the power of Bayesian analysis more accessible to historical linguists without strong programming backgrounds, in the hopes of encouraging communication and collaboration between those developing computational models of language evolution (who are typically not linguists) and relevant domain experts.

Subject(s)

Language , Linguistics/methods , Software , Bayes Theorem , Europe , Expert Systems , Humans , Programming Languages

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL