This chapter makes explicit some of the evaluative decision making that goes into the patching and redesign of our automatic debuggers. It describes the performance tests to which we submitted the instructional debugger, PROUST, the results of those tests, and a discussion of what we believe are the design implications of the test results. CHIRON embodies some answers to questions that have appeared as a result of tests performed on PROUST. The chapter examines PROUST's educational effectiveness by asking whether PROUST globally improves programming performance on an exam, whether PROUST helps students to find and correct bugs, and whether correcting bugs in homework improves subsequent ability to correct bugs. Students who had access to PROUST's bug identification reports while doing programming assignments performed better on a midterm examination specifically designed to exercise bug identification and repair skills.