Standard Setting is the business of setting passing scores, (or what is referred to as 'cut scores'), for exams. The cut score(s) serves to classify candidates into categories; those who score above the cut score are judged to possess the minimum level of proficiency required for inclusion into the category, while those who score below the cut score are deemed not to possess the minimum level of proficiency required, and are classified accordingly.
Standard Setting is most often a judgmental process. It typically involves a panel of subject matter experts and stakeholders who must estimate the difficulty of each question for so-called minimally-competent, or borderline candidates. Those judgments are then aggregated in order to arrive at a passing score across sets of questions. Standard setting methods differ in the way that those judgments are made by the SMEs, and also in the way that question-level judgments are aggregated to create a passing score. One of the most common methods of standard setting is the Modified Angoff method.
The central task of the Modified Angoff method is for SMEs to estimate the percentage of minimally competent candidates who would answer each question correctly. The instructions to panelists would be to examine the question carefully, both in terms of the 'structure' of the question, and of the difficulty of the competency being tested. This information is used to make a judgment regarding the expected performance on that question of the minimally competent candidate. Two types of judgments are common, either the probability that any single candidate would answer the question correctly, or the number out of 100 minimally competent candidates that would answer the question correctly.
These judgments are summed for each SME to create a recommendation for a passing score. This makes sense because the question-level judgments are themselves pass-level judgments per question. For example, if a SME provided judgments of 0.7, or 70% for each and every question, the passing standard would logically be 70% for the entire exam.
Typically these judgments are made over multiple rounds after which the judgments become increasingly refined. Between rounds, various types of information can be provided to the SMEs regarding the reasonableness of their judgments. A common type of information is impact data, or the number and percentages of candidates who would pass based upon the average or median passing score recommendation by panelists. Other types of feedback include the difficulty of each question for the candidates or the consistency of each of the SMEs' judgements.
As shown in the figure below, the changes in question-level ratings impact the overall recommended passing score. The passing score recommendation made in the final round is the one that is ultimately adopted.
For more detailed information on standard setting procedures and methods, refer to the book, Setting Performance Standards: Concepts, Methods, and Perspectives, edited by Gregory Cizek and Robert Sternberg.