Guidelines for Ensuring the Technical Quality of Assessments Affecting English Language Learners and Students with Disabilities: Development and Implementation of Regulations
Standard Setting
Setting defensible cut scores and establishing meaningful performance levels are key concerns for state departments of education. While there are a number of standard setting methods used across states, there is no agreed-upon best method for setting standards (Berk, 1986; Linn, 2003).
Here are general descriptions of several standard setting methods:
- Reasoned judgment: The full range of possible scores (score scale) is divided into categories determined by experts. Exemplars and decision rules are used to connect descriptors with
student work (Kingston, Kahl, Sweeney, & Bay, 2001).
- Contrasting groups: Comparisons are made between the expected performance and actual performance of different ability groups. Prior to testing, teachers familiar with the students separate students into pre-defined ability groups. The distribution of test scores across the
groups is then examined (Livingston & Zeikey, 1982).
- Modified Angoff: Experts examine the test items and estimate the percentage of students at the bottom of the score range who will be able to pass each item. The estimates are summed and result in an overall percentage of items correct that correspond to the minimum passing score for a given level. This is typically used with multiple-choice items (Berk, 1986).
- Bookmarking: Experts review an ordered item booklet that contains test items arranged in order of difficulty. The experts are asked to mark the places in the booklet (i.e., between sequential items) where the skill range for one level ends and the next begins (Lewis, Mitzel, & Green, 1996).
- Body of work: Experts examine all student work and use this information to place the student in a performance level. Standard-setters are given a set of papers that exemplify the complete range of possible scores from low to high. Thus, for a given student, standard-setters determine which performance level placement most reasonably reflects the work of that student (Kingston, Kahl, Sweeney, & Bay, 2001).
As mentioned previously, there is no agreed-upon best method for setting standards, and research suggests that there is considerable variability in the standards set across methods due to, for example, variability across groups of standard-setters as well as variability due to the methods themselves (Jaeger, 1989).
Therefore, the use of multiple standard setting methods, with the results of the different methods considered together to determine cut scores (Jaeger, 1989) seems apt. Although the use of multiple methods may be cost prohibitive, such practice warrants consideration, given the consequences associated with the results of standard setting efforts.
Resources for standard setting
Guidelines and criteria are available for the selection and implementation of a standard setting method or methods. The following resources contain such guidelines and considerations for general education assessments.
Hambleton, R. K. (2001). Setting performance standards on educational assessments
and criteria for evaluating the process. In G. Cizek (Ed.), Setting performance standards: Concepts, methods, and perspectives (pp. 89–116). Mahwah, NJ: Lawrence Erlbaum.
Kane, M. T. (2001). So much remains the same: Conception and status of validation in setting standards. In G. Cizek (Ed.), Setting performance standards: Concepts, methods, and perspectives (pp. 53–88). Mahwah, NJ: Lawrence Erlbaum.
Raymond, M. R., & Reid, J. B. (2001). Who made thee judge? Selecting and training participants for standard setting. In G. Cizek (Ed.), Setting performance standards: Concepts, methods, and perspectives (pp. 119–157). Mahwah, NJ: Lawrence Erlbaum.
The following resources offer guidelines and considerations for defensible adaptation of traditional general education standard setting methods for tests for students with disabilities.
Olson, B., Mead, R., & Payne, D. (2002). A report of a standard setting method for alternate assessments for students with significant disabilities (NCEO Synthesis Report 47). Minneapolis: University of Minnesota, National Center on Educational Outcomes.
Roeber, E. (2002). Setting standards on alternate assessments (NCEO Synthesis Report 42). Minneapolis: University of Minnesota, National Center on Educational Outcomes.
Thurlow, M. L., & Ysseldyke, J. E. (2001). Standard-setting challenges for special populations. In G. Cizek, (Ed.), Setting performance standards: Concepts, methods, and perspectives (pp. 387–410). Mahwah, NJ: Lawrence Erlbaum.
Additional resources relevant to standard setting are as follows:
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: AERA.
Cizek, G. (2001). Setting performance standards: Concepts, methods, and perspectives. Mahwah, NJ: Lawrence Erlbaum.
Mitzel, H. C. (2005). Consistency for state achievement standards under NCLB. Paper presented to CAS SCASS Study Group. Washington, DC: Council of Chief State School Officers.
Note: Additional resources will be provided as they become available and are reviewed using the AACC vetting criteria.
Related Links
|