Human–GenAI Score Alignment in Rubric-Constrained Essay Assessment: Procedural Convergence Without Pedagogical Equivalence

Muhamad Akda Fathul Barri; Rizki Hikmawan

doi:10.20961/joive.v9i2.3338

Authors

Muhamad Akda Fathul Barri Universitas Pendidikan Indonesia Author https://orcid.org/0009-0009-2505-2089
Rizki Hikmawan Universitas Pendidikan Indonesia Author

DOI:

https://doi.org/10.20961/joive.v9i2.3338

Keywords:

Generative Artificial Intelligence, Rubric-Based Assessment , Formative Assessment, Human–AI Alignment, Secondary Education

Abstract

The increasing availability of Generative Artificial Intelligence (GenAI) systems has raised critical questions regarding their role in educational assessment, particularly in evaluating open-ended student responses that traditionally rely on professional teacher judgment. This study investigates the extent to which GenAI-generated scores align with teacher assessments when both are guided by an explicit and standardized rubric framework. Using a parallel scoring design, written responses from junior secondary school students were independently evaluated by an Informatics teacher and a GenAI system (ChatGPT) based on an identical rubric encompassing progressively increasing cognitive demands from lower-order to higher-order thinking skills. To examine scoring alignment, an Intraclass Correlation Coefficient (ICC) analysis with a two-way mixed-effects model and absolute agreement approach was employed. The results indicate a meaningful level of score alignment between teacher and GenAI assessments, suggesting that GenAI can apply rubric-based evaluative criteria in a procedurally consistent manner. However, qualitative analysis of written feedback reveals substantive differences in pedagogical depth. While GenAI feedback demonstrates high structural consistency and transparent rubric justification, teacher feedback exhibits greater contextual sensitivity, incorporating instructional intent, student misconceptions, and classroom dynamics. These findings suggest that GenAI systems hold potential as assessment support tools capable of enhancing scoring consistency and efficiency in formative assessment contexts. Nevertheless, score alignment should not be interpreted as pedagogical equivalence. The study concludes that GenAI is best positioned as an augmentative decision-support system operating under teacher supervision, rather than as an autonomous assessor. This research contributes empirical evidence to ongoing discussions on Human–AI score alignment and highlights the critical role of rubric design in mediating responsible GenAI integration within educational assessment practices.

Human–GenAI Score Alignment in Rubric-Constrained Essay Assessment: Procedural Convergence Without Pedagogical Equivalence

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Main Menu

ACCREDITATION

TEMPLATE

Journal Archive

COLLABORATION

recommended tools

Information

Visitor Statistics

Journal of Informatics and Vocational Education