Standard setting is defined as the process of determining the level of performance on a test that is required to be classified into a given performance category (e.g. Cizek 2012a). The numerical point on the score scale associated with the level is known as a cut score. In other words, standard setting is the process of determining recommended cut scores that will classify examinees.

There are various existing standard setting methods that can be implemented, most of which entail a neutral facilitator training a group of subject matter experts to perform a rating task that produces a recommended cut score. For example, there are standard setting methods that involve rating test items (e.g. Angoff, Bookmark); methods that involve rating examinees (e.g. Borderline Group, Contrasting Groups); methods that involve reviewing work products (e.g. Body of Work method) methods that involve rating profiles of scores (e.g. Judgmental Policy Capturing Method); methods that rely on cognitive item response demands (Item-Descriptor Matching); compromise methods that unite absolute and relative information (e.g. Hofstee), and methods that incorporate policy values in advance of the standard setting meeting to strive for policy alignment with the resulting cut scores (e.g. Briefing Book). When determining the most appropriate method to implement, practitioners must consider many factors, including the type of items on the exam (e.g. multiple-choice items, performance assessments), the amount of available data, the amount of time the task will take, the amount of time available to conduct the meeting, the consequences of the cut scores, risk of legal action, and precedent of existing standard setting methods being implemented for the testing programme.

Designing and conducting a standard setting study to determine recommended cut scores requires careful consideration before, during, and after the standard setting meeting occurs. Prior to the meeting, a plan must be defined, a facilitator must be selected, and panellists recruited. During the meeting, in this general order, panellists must be oriented to the purpose of the study, experience the exam if possible, understand the concept of the borderline examinee, conceptualise performance level descriptions, complete training on the standard setting task (including practice), complete the standard setting task (typically multiple rounds), and complete evaluation surveys throughout the meeting. After the meeting, the facilitator prepares a technical report for the policy board that decides the final cut scores.

Gathering validity evidence throughout the standard setting study is critical. Internal validity evidence demonstrates the quality of the ratings that are collected within a standard setting meeting. External validity evidence demonstrates that the resulting cut scores are related as expected to external outcomes. Procedural validity evidence demonstrates that the standard setting meeting was implemented as intended. Documentation of the standard setting procedures and decisions throughout the entire process – before, during, and after – is essential to the success of any standard setting initiative.