Joint Item Response Models for Manual and Automatic Scores on Open-Ended Test Items

Bengs, Daniel; Brefeld, Ulf; Kroehne, Ulf; Zehner, Fabian

doi:10.48548/pubdata-3371

Journal ArticleParallel publicationPublished versionDOI: 10.48548/pubdata-3371

Joint Item Response Models for Manual and Automatic Scores on Open-Ended Test Items

Downloads

Bengs_et_al_Joint_Item_Response_Models_for_Manual_and_Automatic_Scores.pdf4 MB

Open Access

Chronological data

Date of first publication2025-06-16

Date of publication in PubData 2026-04-20

Language of the resource

English

Related external resources

Variant form of

DOI: 10.1017/psy.2025.10018
Bengs, D., Brefeld, U., Kroehne, U., & Zehner, F. (2025). Joint Item Response Models for Manual and Automatic Scores on Open-Ended Test Items. Psychometrika, 90(4), 1346–1367.

Published in

ISSN: 1860-0980
Psychometrika

Author

Abstract

Testitems using open-ended response formats can increase an instrument’s construct validity. However, traditionally, their application in educational testing requires human coders to score the responses. Manual scoring not only increases operational costs but also prohibits the use of evidence from open-ended items to inform routing decisions in adaptive designs. Using machine learning and natural language processing, automatic scoring provides classifiers that can instantly assign scores to text responses. Although optimized for agreement with manual scores, automatic scoring is not perfectly accurate and introduces an additional source of error into the response process, leading to a misspecification of the measurement model used with the manual score. We propose two joint models for manual and automatic scores of automatically scored open-ended items. Our models extend a given model from Item Response Theory for the manual scores by a component for the automatic scores, accounting for classification errors. The models were evaluated using data from the Programme for International Student Assessment (2012) and simulated data, demonstrating their capacity to mitigate the impact of classification errors on ability estimation compared to a baseline that disregards classification errors.

Keywords

Automatic Scoring; Item Response Modeling; Large-scale Assessment

Leuphana Institution

Fakultät Management und Technologie

More information

Creation Context

Research

Collections

Literaturpublikationen

Joint Item Response Models for Manual and Automatic Scores on Open-Ended Test Items

Chronological data

Language of the resource

Related external resources

Editor

Author

Case provider

Other contributors

Abstract

Keywords

Leuphana Institution

More information

DDC

Creation Context

Collections