Search | VHL Regional Portal

Translating Akkadian to English with neural machine translation.

Gutherz, Gai; Gordin, Shai; Sáenz, Luis; Levy, Omer; Berant, Jonathan.

PNAS Nexus ; 2(5): pgad096, 2023 May.

Article in English | MEDLINE | ID: mdl-37143863

ABSTRACT

Cuneiform is one of the earliest writing systems in recorded human history (ca. 3,400 BCE-75 CE). Hundreds of thousands of such texts were found over the last two centuries, most of which are written in Sumerian and Akkadian. We show the high potential in assisting scholars and interested laypeople alike, by using natural language processing (NLP) methods such as convolutional neural networks (CNN), to automatically translate Akkadian from cuneiform Unicode glyphs directly to English (C2E) and from transliteration to English (T2E). We show that high-quality translations can be obtained when translating directly from cuneiform to English, as we get 36.52 and 37.47 Best Bilingual Evaluation Understudy 4 (BLEU4) scores for C2E and T2E, respectively. For C2E, our model is better than the translation memory baseline in 9.43, and for T2E, the difference is even higher and stands at 13.96. The model achieves best results in short- and medium-length sentences (c. 118 or less characters). As the number of digitized texts grows, the model can be improved by further training as part of a human-in-the-loop system which corrects the results.

Reading Akkadian cuneiform using natural language processing.

Gordin, Shai; Gutherz, Gai; Elazary, Ariel; Romach, Avital; Jiménez, Enrique; Berant, Jonathan; Cohen, Yoram.

PLoS One ; 15(10): e0240511, 2020.

Article in English | MEDLINE | ID: mdl-33112872

ABSTRACT

In this paper we present a new method for automatic transliteration and segmentation of Unicode cuneiform glyphs using Natural Language Processing (NLP) techniques. Cuneiform is one of the earliest known writing system in the world, which documents millennia of human civilizations in the ancient Near East. Hundreds of thousands of cuneiform texts were found in the nineteenth and twentieth centuries CE, most of which are written in Akkadian. However, there are still tens of thousands of texts to be published. We use models based on machine learning algorithms such as recurrent neural networks (RNN) with an accuracy reaching up to 97% for automatically transliterating and segmenting standard Unicode cuneiform glyphs into words. Therefore, our method and results form a major step towards creating a human-machine interface for creating digitized editions. Our code, Akkademia, is made publicly available for use via a web application, a python package, and a github repository.

Subject(s)

Language/history , Reading , History, Ancient , Humans , Middle East , Natural Language Processing

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL