![]() |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Guidelines for Ensuring the Technical Quality of Assessments Affecting English Language Learners and Students with Disabilities:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Critical Element | Examples of Acceptable Evidence | Examples of Incomplete Evidence |
|---|---|---|
4.1 For each assessment, including alternate assessment(s), has the State documented the issue of validity (in addition to the alignment of the assessment with the content standards), as described in the Standards for Educational and Psychological Testing (AERA/APA/NCME, 1999), with respect to all of the following categories: (a) Has the State specified the purposes of the assessments, delineating the types of uses and decisions most appropriate to each? and (b) Has the State ascertained that the assessments, including alternate assessments, are measuring the knowledge and skills described in its academic content standards and not knowledge, skills, or other characteristics that are not specified in the academic content standards or grade level expectations? and (c) Has the State ascertained that its assessment items are tapping the intended cognitive processes and that the items and tasks are at the appropriate grade level? and (d) Has the State ascertained that the scoring and reporting structures are consistent with the sub-domain structures of its academic content standards (i.e., are item interrelationships consistent with the framework from which the test arises)? and (e) Has the State ascertained that test and item scores are related to outside variables as intended (e.g., scores are correlated strongly with relevant measures of academic achievement and are weakly correlated, if at all, with irrelevant characteristics, such as demographics)? and (f) Has the State ascertained that the decisions based on the results of its assessments are consistent with the purposes for which the assessments were designed? and (g) Has the State ascertained whether the assessment produces intended and unintended consequences? |
For each assessment, including alternate assessment(s), the State has documented the existing validity evidence in each of the categories and has taken steps to address any deficiencies either in validity or in its approach to establishing and documenting validity evidence. Possible Evidence
|
The State has not provided evidence in all categories (a) – (g) or has not taken steps to address any deficiencies either in validity or in its approach to establishing and documenting validity evidence. |
| Critical Element | Examples of Acceptable Evidence |
|---|---|
| 3.1 (c) ELP standards are linked to State content and achievement standards in reading/language arts, math, and science (science in 2005–2006) | Acceptable evidence includes a process and documentation for linkage and alignment, findings from linkage and alignment studies, and state responses to findings. |
| 3.2 (c) ELP assessments are aligned to ELP standards | |
| 3.2 (d) ELP assessments are of high technical quality, including being valid, reliable, and fair | Acceptable evidence includes technical manuals for ELP assessment(s), including scoring guides, and other documents that describe the ELP assessment(s). |
| Critical Element | Examples of Acceptable Evidence | Examples of Incomplete Evidence |
|---|---|---|
4.2 For each assessment, including alternate assessment(s), has the State considered the issue of reliability, as described in the Standards for Educational and Psychological Testing (AERA/APA/NCME, 1999), with respect to all of the following categories: (a) Has the State determined the reliability of the scores it reports, based on data for its own student population and each reported subpopulation? and (b) Has the State quantified and reported within the technical documentation for its assessments the conditional standard error of measurement and student classification that are consistent at each cut score specified in its academic achievement standards? and (c) Has the State reported evidence of generalizability for all relevant sources, such as variability of groups, internal consistency of item responses, variability among schools, consistency from form to form of the test, and inter-rater consistency in scoring? |
For each assessment, including alternate assessment(s), the State has documented reliability evidence in each of the categories and has taken steps to address any deficiencies either in reliability or in the State’s approach to establishing and documenting reliability evidence. Possible Evidence
|
The State has not provided evidence in all categories (a) – (c) or has not taken steps to address any deficiencies either in reliability or in the State’s approach to establishing and documenting reliability evidence. |
| Critical Element | Examples of Acceptable Evidence |
|---|---|
| 3.2 (d) ELP assessments are of high technical quality, including being valid, reliable, and fair | Acceptable evidence includes technical manuals for ELP assessment(s), including scoring guides, and other documents that describe the ELP assessment(s). |
| Critical Element | Examples of Acceptable Evidence | Examples of Incomplete Evidence |
|---|---|---|
4.3 Has the State ensured that its assessment system is fair and accessible to all students, including students with disabilities and students with limited English proficiency, with respect to each of the following issues: (a) Has the State ensured that the assessments provide an appropriate variety of accommodations for students with disabilities? and (b) Has the State ensured that the assessments provide an appropriate variety of linguistic accommodations for students with limited English proficiency? and (c) Has the State taken steps to ensure fairness in the development of the assessments? and (d) Does the use of accommodations and/or alternate assessments yield meaningful scores? |
The State has taken appropriate judgmental (e.g., committee review) and data-based (e.g., bias studies) steps to ensure that its assessment system is fair and accessible to all students. Review committees have included representation of identified subgroups. The State assessment system must be designed to be valid and accessible for use by the widest possible range of students. The State is conducting studies to determine the appropriateness of accommodations and the impact on test scores. Possible Evidence
|
The State has conducted data-based bias studies but has not convened committees of stakeholders to review its assessment items. The State has convened committees of stakeholders to review its assessment items but these committees have not included representation of identified subgroups. The State assessment system is not designed to be valid and accessible for use by the widest possible range of students. The State does not have a policy on the appropriate selection and use of accommodations and alternate assessments. The State does not train or monitor personnel at the school, LEA, and State levels with regard to the appropriate selection and use of accommodations and alternate assessments. There are no appropriate accommodations for students with particular disabilities (e.g., no allowable accommodations on the regular assessment or alternate assessments for students who are visually impaired and need large print or Braille or for students who are significantly physically impaired and need assistive technology.) |
| Critical Element | Examples of Acceptable Evidence |
|---|---|
| 3.2 (d) ELP assessments are of high technical quality, including being valid, reliable, and fair | Acceptable evidence includes technical manuals for ELP assessment(s), including scoring guides, and other documents that describe the ELP assessment(s). |
| Critical Element | Examples of Acceptable Evidence | Examples of Incomplete Evidence |
|---|---|---|
4.4 When different test forms or formats are used, the State must ensure that the meaning and interpretation of results are consistent. (a) Has the State taken steps to ensure consistency of test forms over time? (b) If the State administers both an online and paper and pencil test, has the State documented the comparability of the electronic and paper forms of the test? |
The State has conducted appropriate equating or linking studies and has presented data that support the success of the equating or linking. Possible Evidence
|
The State has not conducted or documented equating studies to establish whether test forms are comparable across time. |
| Critical Element | Examples of Acceptable Evidence |
|---|---|
| 3.4 (b) If State plans to transition to a new ELP assessment, plan for doing so, including: How State plans to address “comparability” (relationship between old and new ELP assessment (i.e., use of double-testing, bridge studies, judgment procedures, data analysis, or other method). | Acceptable evidence includes plan for establishing comparability (e.g., use of double-testing, bridge studies, judgment procedures, data analysis, or other method), results if available, and plan for developing new AMAOs, if applicable. |
| Critical Element | Examples of Acceptable Evidence | Examples of Incomplete Evidence |
|---|---|---|
4.5 Has the State established clear criteria for the administration, scoring, analysis, and reporting components of its assessment system, including alternate assessment(s) and does the State have a system for monitoring and improving the on-going quality of its assessment system? |
The State developed a set of management controls or standards for each of these components and has communicated these criteria to its contractor(s), LEAs, and schools. It requires its contractor(s) to provide specific information on the degree to which each criterion is met. The State uses an extensive system of training and monitoring to ensure that each person who is responsible for handling or administering any portion of its assessments does so in a way that protects the security of the assessments and maintains equivalence of administration conditions across students and schools. Possible Evidence
|
The State does not have a test security policy. The State does not train or monitor personnel at the school, LEA, and State levels with regard to its test administration procedures and security policy. The State provides no criteria to its contractor(s) regarding the quality control and security measures it requires for its assessment system. The State provides no criteria to its contractor(s) to ensure that the procedures for scoring of open-ended tasks meet industry standards for accuracy. |
| Critical Element | Examples of Acceptable Evidence |
|---|---|
3.2 (e) If multiple ELP assessments are being used, data can be aggregated for comparison and reporting purposes |
Acceptable evidence includes description of how the State ensures that data can be aggregated for comparison and reporting purposes. |
3.3 (a) (b) (c) Has the State established and implemented clear criteria for the administration, scoring, analysis, and reporting components of its ELP assessments, and does the State have a system for monitoring and improving the ongoing quality of its assessment systems? (Critical Element 3.3) (a) ELP assessments are administered in a uniform manner statewide. (b) Methods for administration, scoring, analysis, and reporting have been established. (c) The state monitors ELP assessment administration practices. |
Acceptable evidence includes:
|
| Critical Element | Examples of Acceptable Evidence | Examples of Incomplete Evidence |
|---|---|---|
4.6 Has the State evaluated its use of accommodations? (a) How has the State ensured that appropriate accommodations are available to students with disabilities and that these accommodations are used in a manner that is consistent with instructional approaches for each student, as determined by a student’s IEP or 504 plan? (b) How has the State determined that scores for students with disabilities that are based on accommodated administration conditions will allow for valid inferences about these students’ knowledge and skills and can be combined meaningfully with scores from non-accommodated administration conditions? (c) How has the State ensured that appropriate accommodations are available to limited English proficient students and that these accommodations are used as necessary to yield accurate and reliable information about what limited English proficient students know and can do? (d) How has the State determined that scores for limited English proficiency students that are based on accommodated administration circumstances will allow for valid inferences about these students’ knowledge and skills and can be combined meaningfully with scores from non-accommodated administration circumstances? |
The State provides for the use of appropriate accommodations and has conducted studies to ensure that scores based on accommodated administrations can be meaningfully combined with scores based on the standard administrations. Possible Evidence
|
No analyses have been carried out to determine whether specific accommodations produce the effect intended. The State does not require that decisions about how students with disabilities will participate in the assessment system be made on an individual basis or specify that these decisions must be consistent with the routine instructional approaches as identified by each student’s IEP and/or 504 plan. The State uses the same accommodations for limited English proficient students as it uses for students with disabilities. |
| Critical Element |
|---|
| Per Title III OELA Monitoring Reports, if accommodations are provided on the ELP assessment to students with disabilities, then the state should provide documentation of which accommodations were provided, the method for determining accommodations, and the number and percentage of students receiving such accommodations |
Notes: Table 9 provides another overview of technical criteria for evaluating the quality of assessments. It lists validated technical criteria by type (validity, reliability, bias and sensitivity) and evidence/method elements one would expect to see in support of each type vis-à-vis the various aspects of test development (e.g., test design and development, item level, test level). These criteria are cross-referenced with the critical elements for technical quality identified in Standards and Assessment Peer Review Guidance (USED, 2004). An “X” indicates evidence that state officials might consider in order to support the technical quality (per Standards and Assessment Peer Review Guidance) of their assessments for special student populations. For more information about the technical criteria presented here, see the document titled Evaluation of the Technical Evidence of Assessments for Special Student Populations (PDF).
| TECHNICAL CRITERIA | PEER REVIEW CRITICAL ELEMENTS: TECHNICAL QUALITY |
|||||||
|---|---|---|---|---|---|---|---|---|
| 4.1 | 4.2 | 4.3 | 4.4 | 4.5 | 4.6 | |||
| TYPE | ELEMENT: EVIDENCE/METHOD | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
| Test Design and Development | ||||||||
| Item/Test level | Construct validity | Test purpose | X | |||||
| Population/classification | X | X | X | X | X | |||
| Theoretical foundation/framework | X | |||||||
| Universal design | X | X | ||||||
| Readability | X | X | X | |||||
| Test Design and Development | ||||||||
| Item level | Content validity | Alignment (items-to-standards) | X | X | ||||
| Linkage (items-to-standards, standards-to-standards) |
X | X | ||||||
| Expert judgment | X | X | ||||||
| p-values/point biserials | X | X | X | |||||
| IRT/item fit | X | X | ||||||
| Structural equation modeling | X | X | ||||||
| t-tests | X | X | ||||||
| ANOVA | X | X | ||||||
| Factor analysis | X | X | ||||||
| Test Design and Development | ||||||||
| Test level | Construct validity | Equivalence/comparability | X | X | X | |||
| Multi-trait/multi-method/subtest inter-correlation |
X | X | X | |||||
| Content validity | Test blueprint | X | ||||||
| Alignment (test form-to-blueprint) | X | X | ||||||
| Content validity | Descriptive statistics (e.g., central tendency, variation) |
X | X | X | ||||
| IRT/test fit | X | X | ||||||
| Linking/equating | X | X | ||||||
| Criterion validity (predictive/concurrent) |
Cross tabulations | X | X | |||||
| Pearson correlation | X | X | ||||||
| Consequential validity | Use of results | X | X | X | X | X | ||
| Test Design and Development | ||||||||
| Administration | Construct validity | Accommodation | X | X | X | X | X | X |
| Fidelity | X | X | X | |||||
| Standardization | X | X | X | |||||
| Test Design and Development | ||||||||
| Item/Test level | Reliability — Stability & consistency |
Standard error of measurement/ confidence intervals |
X | X | ||||
| Test-retest | X | X | ||||||
| Alternate form | X | X | X | |||||
| Reliability — Internal consistency |
Coefficient alpha | X | X | |||||
| KR-21 | X | X | ||||||
| Test length/power estimates | X | X | ||||||
| Split-half | X | X | ||||||
| Reliability — Generalizability |
G-coefficient | X | X | |||||
| Reliability — Classification consistency |
Correlation coefficient | X | X | |||||
| Percent correspondence | X | X | ||||||
| Classification error | X | X | ||||||
| Bias and sensitivity — Linguistic |
Expert review | X | X | X | ||||
| DIF analysis | X | |||||||
| Bias and sensitivity — Ethnicity/race |
Expert review | X | X | X | ||||
| DIF analysis | X | |||||||
| Bias and sensitivity — Cultural/religious |
Expert review | X | X | X | ||||
| Bias and sensitivity — Geographic |
Expert review | X | X | X | ||||
| DIF analysis | X | |||||||
| Bias and sensitivity — SES |
Expert review | X | X | X | ||||
| DIF analysis | X | |||||||
| Bias and sensitivity — Disability |
Expert review | X | X | X | ||||
| DIF analysis | X | |||||||
| Bias and sensitivity — Gender |
Expert review | X | X | X | ||||
| DIF analysis | X | |||||||
| Field Testing | ||||||||
| Content validity | Blueprint | X | ||||||
| Sampling | X | X | ||||||
| Norming | X | X | X | |||||
| Scoring | ||||||||
| Content validity | Rubric | X | X | X | ||||
| Scale | X | X | X | |||||
| Standard setting (cut score and proficiency levels) |
X | X | X | X | ||||
| Training of scorers/scoring protocol | X | X | ||||||
Reliability — |
Correlation (Kappa) | X | X | |||||
| Percent correspondence | X | X | ||||||
| Reporting | ||||||||
| Consequential validity | Reporting category | X | X | X | ||||
| N | X | X | X | |||||
| Central tendency/variation | X | X | X | |||||
| Effect size | X | X | X | |||||
| Security | ||||||||
| Consequential validity | Protocols | X | X | X | X | |||
Home
|
About Us
|
NCLB
|
Resources
|
Events
|
Contact Us
|
Login