The quality of any test depends on the quality of the questions or items that comprise the test. The first step to ensure item quality is creating each item to be accurate, clearly written, free of irrelevant or misleading content, free of bias, and aligned to the test specifications or blueprint. Even when items appear to meet these criteria, it is possible that there are flaws that cannot be identified at the time they are written and reviewed. Once the items are assembled into a test, an important second step is to administer the test to see how the items function when people taking the test respond to them.
After a test is administered to a group of examinees, the results can be used to generate a set of item statistics. These item statistics help test developers to identify questions that may have problems. Item statistics enable us to evaluate each question in a test in several ways:
How difficult is the question?
How well do all the answer choices attract responses?
How well does the question distinguish those who perform well from those who don’t?
How fair is the question when considering student characteristics such as gender, race/ethnicity or disability?
This kind of information can help a test developer identify potential problems and where they exist, refine test questions, and adjust which questions comprise the test.