ABSTRACT

There can be no doubt that evaluating Intelligent Tutoring Systems (ITSs) is costly, frustrating, and time-consuming. In fact, in our own work to build PROUST, one component of an ITS for introductory programming students, evaluation has consumed nearly as much effort as the design of PROUST itself. If evaluation of ITSs is so costly, why do it at all? Wouldn’t it be better just to finish one ITS and then build the next one, perhaps letting the marketplace determine survival? On the contrary: Our experience with PROUST has taught us that, far from being a useless burden, evaluation pays off by helping to answer two evaluation questions that are central to cognitive science, Artificial Intelligence (AI), and education:

Evaluation Question 1: What is the educational impact of an ITS on students?

Evaluation Question 2: What is the relationship between the architecture of an ITS and its behavior?