Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
Add more filters










Database
Language
Publication year range
1.
Front Artif Intell ; 5: 903077, 2022.
Article in English | MEDLINE | ID: mdl-35937141

ABSTRACT

Automatic item generation (AIG) has the potential to greatly expand the number of items for educational assessments, while simultaneously allowing for a more construct-driven approach to item development. However, the traditional item modeling approach in AIG is limited in scope to content areas that are relatively easy to model (such as math problems), and depends on highly skilled content experts to create each model. In this paper we describe the interactive reading task, a transformer-based deep language modeling approach for creating reading comprehension assessments. This approach allows a fully automated process for the creation of source passages together with a wide range of comprehension questions about the passages. The format of the questions allows automatic scoring of responses with high fidelity (e.g., selected response questions). We present the results of a large-scale pilot of the interactive reading task, with hundreds of passages and thousands of questions. These passages were administered as part of the practice test of the Duolingo English Test. Human review of the materials and psychometric analyses of test taker results demonstrate the feasibility of this approach for automatic creation of complex educational assessments.

2.
Appl Psychol Meas ; 41(7): 495-511, 2017 Oct.
Article in English | MEDLINE | ID: mdl-29881102

ABSTRACT

Computer adaptive tests provide important measurement advantages over traditional fixed-item tests, but research on the psychological reactions of test takers to adaptive tests is lacking. In particular, it has been suggested that test-taker engagement, and possibly test performance as a consequence, could benefit from the control that adaptive tests have on the number of test items examinees answer correctly. However, previous research on this issue found little support for this possibility. This study expands on previous research by examining this issue in the context of a mathematical ability assessment and by considering the possible effect of immediate feedback of response correctness on test engagement, test anxiety, time on task, and test performance. Middle school students completed a mathematics assessment under one of three test type conditions (fixed, adaptive, or easier adaptive) and either with or without immediate feedback about the correctness of responses. Results showed little evidence for test type effects. The easier adaptive test resulted in higher engagement and lower anxiety than either the adaptive or fixed-item tests; however, no significant differences in performance were found across test types, although performance was significantly higher across all test types when students received immediate feedback. In addition, these effects were not related to ability level, as measured by the state assessment achievement levels. The possibility that test experiences in adaptive tests may not in practice be significantly different than in fixed-item tests is raised and discussed to explain the results of this and previous studies.

3.
Educ Psychol Meas ; 76(5): 787-802, 2016 Oct.
Article in English | MEDLINE | ID: mdl-29795888

ABSTRACT

There are many reasons to believe that open-ended (OE) and multiple-choice (MC) items elicit different cognitive demands of students. However, empirical evidence that supports this view is lacking. In this study, we investigated the reactions of test takers to an interactive assessment with immediate feedback and answer-revision opportunities for the two types of items. Eighth-grade students solved mathematics problems, both MC and OE, with standard instructions and feedback-and-revision opportunities. An analysis of scores based on revised answers in feedback mode revealed gains in measurement precision for OE items but not for MC items. These results are explained through the concept of effortful engagement-the OE format encourages more mindful engagement with the items in interactive mode. This interpretation is supported by analyses of response times and test takers' reports.

4.
Educ Psychol Meas ; 76(6): 1045-1058, 2016 Dec.
Article in English | MEDLINE | ID: mdl-29795900

ABSTRACT

Performance of students in low-stakes testing situations has been a concern and focus of recent research. However, researchers who have examined the effect of stakes on performance have not been able to compare low-stakes performance to truly high-stakes performance of the same students. Results of such a comparison are reported in this article. GRE test takers volunteered to take an additional low-stakes test, of either verbal or quantitative reasoning as part of a research study immediately following their operational high-stakes test. Analyses of performance under the high- and low-stakes situations revealed that the level of effort in the low-stakes situation (as measured by the amount of time on task) strongly predicted the stakes effect on performance (difference between test scores in low- and high-stakes situations). Moreover, the stakes effect virtually disappeared for participants who spent at least one-third of the allotted time in the low-stakes situation. For this group of test takers (more than 80% of the total sample), the correlations between the low- and high-stakes scores approached the upper bound possible considering the reliability of the test.

5.
Appl Psychol Meas ; 39(4): 303-313, 2015 Jun.
Article in English | MEDLINE | ID: mdl-29881010

ABSTRACT

From their earliest origins, automated essay scoring systems strived to emulate human essay scores and viewed them as their ultimate validity criterion. Consequently, the importance (or weight) and even identity of computed essay features in the composite machine score were determined by statistical techniques that sought to optimally predict human scores from essay features. However, it is evident that machine evaluation of essays is fundamentally different from human evaluation and therefore is not likely to measure the same set of writing skills. As a consequence, feature weights of human-prediction machine scores (reflecting their importance in the composite scores) are bound to reflect statistical artifacts. This article suggests alternative feature weighting schemes based on the premise of maximizing reliability and internal consistency of the composite score. The article shows, in the context of a large-scale writing assessment, that these alternative weighting schemes are significantly different from human-prediction weights and give rise to comparable or even superior reliability and validity coefficients.

6.
Psychol Sci ; 24(7): 1151-6, 2013 Jul 01.
Article in English | MEDLINE | ID: mdl-23630221

ABSTRACT

Although "hot hands" in basketball are illusory, the belief in them is so robust that it not only has sparked many debates but may also affect the behavior of players and coaches. On the basis of an entire National Basketball Association season's worth of data, the research reported here shows that even a single successful shot suffices to increase a player's likelihood of taking the next team shot, increase the average distance from which this next shot is taken, decrease the probability that this next shot is successful, and decrease the probability that the coach will replace the player.


Subject(s)
Athletes/psychology , Athletic Performance/psychology , Basketball/psychology , Decision Making , Perception , Probability , Humans , Judgment , Male
SELECTION OF CITATIONS
SEARCH DETAIL
...