David vs. Goliath: comparing conventional machine learning and a large language model for assessing students' concept use in a physics problem.

Kieser, Fabian; Tschisgale, Paul; Rauh, Sophia; Bai, Xiaoyu; Maus, Holger; Petersen, Stefan; Stede, Manfred; Neumann, Knut; Wulff, Peter

Kieser, Fabian; Tschisgale, Paul; Rauh, Sophia; Bai, Xiaoyu; Maus, Holger; Petersen, Stefan; Stede, Manfred; Neumann, Knut; Wulff, Peter.

Afiliação

Kieser F; Physics and Physics Education Research, Heidelberg University of Education, Heidelberg, Germany.
Tschisgale P; Department of Physics Education, Leibniz Institute for Science and Mathematics Education, Kiel, Germany.
Rauh S; Applied Computational Linguistics, University of Potsdam, Potsdam, Germany.
Bai X; Applied Computational Linguistics, University of Potsdam, Potsdam, Germany.
Maus H; Department of Physics Education, Leibniz Institute for Science and Mathematics Education, Kiel, Germany.
Petersen S; Department of Physics Education, Leibniz Institute for Science and Mathematics Education, Kiel, Germany.
Stede M; Applied Computational Linguistics, University of Potsdam, Potsdam, Germany.
Neumann K; Department of Physics Education, Leibniz Institute for Science and Mathematics Education, Kiel, Germany.
Wulff P; Physics and Physics Education Research, Heidelberg University of Education, Heidelberg, Germany.

Front Artif Intell ; 7: 1408817, 2024.

Article em En | MEDLINE | ID: mdl-39359648

ABSTRACT

ABSTRACT

Large language models have been shown to excel in many different tasks across disciplines and research sites. They provide novel opportunities to enhance educational research and instruction in different ways such as assessment. However, these methods have also been shown to have fundamental limitations. These relate, among others, to hallucinating knowledge, explainability of model decisions, and resource expenditure. As such, more conventional machine learning algorithms might be more convenient for specific research problems because they allow researchers more control over their research. Yet, the circumstances in which either conventional machine learning or large language models are preferable choices are not well understood. This study seeks to answer the question to what extent either conventional machine learning algorithms or a recently advanced large language model performs better in assessing students' concept use in a physics problem-solving task. We found that conventional machine learning algorithms in combination outperformed the large language model. Model decisions were then analyzed via closer examination of the models' classifications. We conclude that in specific contexts, conventional machine learning can supplement large language models, especially when labeled data is available.

Palavras-chave

explainable AI; large language models; machine learning; natural language processing; problem solving

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: Front Artif Intell Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Alemanha País de publicação: Suíça

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google