When rater reliability is not enough: Teacher observation systems and a case for the G-study


Citation:

Hill HC, Charalambous CY, Kraft MA. When rater reliability is not enough: Teacher observation systems and a case for the G-study. Educational Researcher. 2012;41 (2) :56-64.

Abstract:

In recent years, interest has grown in using classroom observation as a means to several ends, including teacher development, teacher evaluation, and impact evaluations of classroom-based interventions. While educational practitioners and researchers have developed numerous observational instruments for these purposes, many fail to specify important criteria regarding their use. In this paper, we argue that for classroom observation to succeed in its aims, improved observational systems must be developed. These systems should include not only observational instruments, but also scoring designs capable of producing reliable and cost-efficient scores and processes for rater recruitment, training and certification. To illustrate how such a system might be developed and improved, we provide an empirical example that applies Generalizability Theory to data from a mathematics observational instrument.