Search | VHL Regional Portal

Do Programmers Prefer Predictable Expressions in Code?

Casalnuovo, Casey; Lee, Kevin; Wang, Hulin; Devanbu, Prem; Morgan, Emily.

Cogn Sci ; 44(12): e12921, 2020 12.

Article in English | MEDLINE | ID: mdl-33314282

ABSTRACT

Source code is a form of human communication, albeit one where the information shared between the programmers reading and writing the code is constrained by the requirement that the code executes correctly. Programming languages are more syntactically constrained than natural languages, but they are also very expressive, allowing a great many different ways to express even very simple computations. Still, code written by developers is highly predictable, and many programming tools have taken advantage of this phenomenon, relying on language model surprisal as a guiding mechanism. While surprisal has been validated as a measure of cognitive load in natural language, its relation to human cognitive processes in code is still poorly understood. In this paper, we explore the relationship between surprisal and programmer preference at a small granularity-do programmers prefer more predictable expressions in code? Using meaning-preserving transformations, we produce equivalent alternatives to developer-written code expressions and run a corpus study on Java and Python projects. In general, language models rate the code expressions developers choose to write as more predictable than these transformed alternatives. Then, we perform two human subject studies asking participants to choose between two equivalent snippets of Java code with different surprisal scores (one original and transformed). We find that programmers do prefer more predictable variants, and that stronger language models like the transformer align more often and more consistently with these preferences.

Subject(s)

Comprehension , Language , Probability , Programming Languages , Cognition , Humans , Reading , Writing

Status, identity, and language: A study of issue discussions in GitHub.

Liao, Jingxian; Yang, Guowei; Kavaler, David; Filkov, Vladimir; Devanbu, Prem.

PLoS One ; 14(6): e0215059, 2019.

Article in English | MEDLINE | ID: mdl-31199802

ABSTRACT

Successful open source software (OSS) projects comprise freely observable, task-oriented social networks with hundreds or thousands of participants and large amounts of (textual and technical) discussion. The sheer volume of interactions and participants makes it challenging for participants to find relevant tasks, discussions and people. Tagging (e.g., @AmySmith) is a socio-technical practice that enables more focused discussion. By tagging important and relevant people, discussions can be advanced more effectively. However, for all but a few insiders, it can be difficult to identify important and/or relevant people. In this paper we study tagging in OSS projects from a socio-linguistics perspective. First we argue that textual content per se reveals a great deal about the status and identity of who is speaking and who is being addressed. Next, we suggest that this phenomenon can be usefully modeled using modern deep-learning methods. Finally, we illustrate the value of these approaches with tools that could assist people to find the important and relevant people for a discussion.

Subject(s)

Databases, Factual , Language , Deep Learning , Humans , Linguistics , Software , User-Computer Interface

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL