Assessment & Accountability Comprehensive Center at WestEd
   

Guidelines for Ensuring the Technical Quality of Assessments Affecting English Language Learners and Students with Disabilities:
Development and Implementation of Regulations

Assessments for Special Student Populations: Technical Quality

According to the Standards for Educational and Psychological Testing (AERA, APA, & NCME, 1999) there are multiple elements that contribute to the technical quality of high-quality assessments. Key elements contributing to technical quality include validity, reliability, and freedom from bias. Each of these key elements is discussed below.

Validity

According to the Standards, a primary consideration in determining validity is whether the state has evidence that the assessment results can be interpreted in a manner consistent with the assessment’s intended purpose(s). Construct validity is the extent to which an assessment measures what it is intended to measure as well as the extent to which inferences and actions made on the basis of test scores are appropriate and accurate.

There are four broad categories of evidence that can be used to support validity (AERA, APA, & NCME, 1999; Kane, 2002; Messick, 1989):

  1. Test content: the degree to which the standards and the assessment (items and forms) align.
  2. The assessment’s relation to other variables: the relationship between the assessment and other measures known to be accurate indicators of student knowledge/ability.
  3. Student response processes: the degree to which factors that contribute to assessment ambiguity and inaccuracy have been eliminated or minimized such that assessment results accurately reflect student knowledge/ability vis-à-vis the tested content.
  4. Internal structure: the degree to which a variety of statistical techniques have been applied to the test to determine its validity and reliability and to ensure a balanced assessment in terms of breadth and depth of knowledge, skills, and content assessed.

Tables 3–8 present examples of evidence that state officials can consider when documenting the validity of their assessments.

Additionally, according to Messick (1989), consideration also must be given to the consequences of the test’s interpretations and uses. The validity and accuracy of test interpretation and use are critical because misinterpretation and misuse could result in unintended and negative consequences.

The validity and accuracy of test interpretation and use are critical because misinterpretation and misuse could result in unintended and negative consequences.

State officials should address and document the validity of each of the state’s assessments, including alternate assessments, in all of the following key areas (based on USED, 2004, Critical Element 4.1):

  1. Specify the purposes of the assessments, delineating the types of uses and decisions most appropriate to each.
  2. Ascertain that the assessments, including alternate assessments, are measuring the knowledge and skills described in the state’s academic content standards and not knowledge, skills, or other characteristics that are not specified in the academic content standards or grade level expectations.
  3. Ascertain that the state’s assessment items are tapping the intended cognitive processes and that the items and tasks are at the appropriate grade level.
  4. Ascertain that the scoring and reporting structures are consistent with the sub-domain structures of its academic content standards (i.e., item interrelationships are consistent with the framework from which the test arises).
  5. Ascertain that test and item scores are related to outside variables as intended (e.g., scores are correlated strongly with relevant measures of academic achievement and are weakly correlated, if at all, with irrelevant characteristics, such as demographics).
  6. Ascertain that decisions that are based on the results of the state’s assessments are consistent with the purposes for which the assessments were designed.
  7. Determine what are the intended and unintended consequences that result from the state’s assessments.

Reliability

Reliability refers to the consistency with which an assessment yields results that are dependable and consistent indicators of particular student knowledge/skills. Such consistency can exist over time, across raters, or across different items/tasks intended to measure the same content. Test reliability has implications for test validity because sources of error that lead to unwanted variation in assessment results may distort the interpretation and use of the results (AERA, APA, & NCME, 1999; Anastasi, 1988; Berkowitz, Wolkowitz, Fitch, & Kopriva, 2000).

There are three major sources of error:

  • Factors in the test itself;
  • Factors in the students taking the test; and
  • Scoring factors.

State officials should address and document the reliability of each of the state’s assessments, including alternate assessments, in all of the following ways (based on USED, 2004, Critical Element 4.2):

  1. Based on data for the state’s own student population and each reported subpopulation, determine the reliability of the scores that the state reports.
  2. Quantify and report within the technical documentation for the state’s assessments the conditional standard errors of measurement and student classification that are consistent at each cut score specified in the state’s academic achievement standards.
  3. Report evidence of generalizability for all relevant sources, such as variability of groups, internal consistency of item responses, variability among schools, consistency from form to form of the test, and inter-rater consistency in scoring.

Tables 3–8 present examples of evidence that state officials can consider when documenting the reliability of their assessments.

Bias

Bias is the presence of information in a test or a condition of the test that unfairly advantages or disadvantages a student (or group of students) such that the student is unable to accurately demonstrate what he or she knows and can do vis-à-vis the tested content. Consequently, test results might underestimate the student’s achievement or reflect abilities that are not related to the intended test content (Abedi & Lord, 2001; AERA, APA, & NCME, 1999; Kopriva, 2000).

Sources of bias include:

  • Gender;
  • Racial/ethnic;
  • Cultural;
  • Geographic;
  • Disability; and
  • Linguistic.

Bias can be introduced during various phases of a test’s development and use (AERA, APA, & NCME, 1999):

  • Design/development: The items or tasks do not provide an equal opportunity for all students to fully demonstrate their knowledge and skills.
  • Administration: The assessments are not administered in ways that ensure fairness.
  • Reporting: The results are not reported in ways that ensure fairness.
  • Interpretation: The results are not interpreted or used in ways that lead to equal treatment.

Additionally, bias could be attributed to the insufficient opportunity of students to access and learn the standards.

Therefore, states must ensure that during each stage of their assessments’ development and use, potential sources of bias are identified and efforts are made to reduce or eliminate the effects of bias on student performance. For all assessments in the state’s assessment system, state officials should ensure that the assessments are fair and accessible to all

States must ensure that during each stage of their assessments’ development and use, potential sources of bias are identified and efforts are made to reduce or eliminate the effects of bias on student performance.

students, including SWDs and ELLs, in the following manner (based on USED, 2004, Critical Element 4.3):

  1. Ensure that the assessments provide an appropriate variety of accommodations for students with disabilities.
  2. Ensure that the assessments provide an appropriate variety of linguistic accommodations for students with limited English proficiency.
  3. Take steps to ensure fairness in the development of the assessments.
  4. Ensure that the use of accommodations and/or alternate assessments yields meaningful scores.

Tables 3–8 present examples of evidence that state officials can consider when documenting the manner in which they have controlled for bias in the state’s assessments.

Aspects of validity, reliability, and bias often are interrelated, and each is affected by a number of factors.

Additional factors impacting assessment validity, reliability, and freedom from bias

Aspects of validity, reliability, and bias often are interrelated, and each element is affected by a number of factors. In addition to the factors described above, state officials ought to consider the following (based on USED, 2004, Critical Elements 4.4, 4.5, and 4.6):

  1. When different test forms or formats are used, state officials must ensure that the meaning and interpretation of results are consistent.
    1. Ensure consistency of test forms over time.
    2. If the state administers both an online and paper-and-pencil test, document the comparability of these two forms of the test.
  2. Establish clear criteria for the administration, scoring, analysis, and reporting components of the state’s assessment system, including alternate assessment(s), and maintain a system for monitoring and improving the ongoing quality of the state’s assessment system.
  3. Evaluate the state’s use of accommodations.
    1. Ensure that appropriate accommodations are available to students with disabilities and that these accommodations are used in a manner that is consistent with instructional approaches for each student, as determined by the student’s IEP or 504 plan.
    2. Determine that scores for students with disabilities that are based on accommodated administration conditions will allow for valid inferences about these students’ knowledge and skills and can be combined meaningfully with scores from non-accommodated administration conditions.
    3. Ensure that appropriate accommodations are available to limited English proficient students and that these accommodations are used as necessary to yield accurate and reliable information about what limited English proficient students know and can do.
    4. Determine that scores for limited English proficient students that are based on accommodated administration circumstances will allow for valid inferences about these students’ knowledge and skills and can be combined meaningfully with scores from non-accommodated administration circumstances.

Validation efforts should occur during each phase of an assessment’s development and use, and state officials should carefully gather and document evidence of their assessments’ validity, reliability, and freedom from bias.

Validation efforts should occur during each phase of an assessment’s development and use, and state officials should carefully gather and document evidence of their assessments’ validity, reliability, and freedom from bias.

Tables 3–9 provide relevant information from three key resources in order to assist state officials in their consideration of the evidence that they need to establish the technical quality of their assessments. The three main sources for these tables are:

  • Standards and Assessments Peer Review Guidance (USED, 2004)
    In response to NCLB legislation (Sec. 111[b][3]) and regulations (Sec. 200.2), the U.S. Department of Education (USED) has provided states with guidance regarding the evidence that can be used to demonstrate state compliance with NCLB requirements. See Tables 3a–8a for examples of acceptable and incomplete evidence of technical quality.
  • Title III OELA Monitoring Reports (OELA, 2006)
    The Office of English Language Acquisition, Language Enhancement, and Academic Achievement for Limited English Proficient Students (OELA) has issued guidance for its grantees to use in preparing annual reports. This guidance includes descriptions of critical elements for English Language Proficiency standards and assessments as well as acceptable evidence for these elements. Many of the elements and evidence presented in this OELA document are similar to those in the USED’s Standards and Assessment Peer Review Guidance. Therefore, Tables 3b–8b also present information from the OELA document that is related to the critical elements identified by the Federal Peer Review.
  • Evaluation of the Technical Evidence of Assessments for Special Student Populations (AACC, 2007)
    The AACC offers a comprehensive set of criteria validated by a team with expertise in assessment, linguistics, and English language development, based on those developed by Rabinowitz and Sato (2005, 2006) to evaluate the technical evidence associated with assessments for ELLs in particular and special student populations in general. These technical criteria are sensitive to the unique characteristics of the student population, the particular purposes of the assessments, and the stage of development and maturity of the assessments. Technical criteria can be found in the document titled Evaluation of the Technical Evidence of Assessments for Special Student Populations (PDF).

See Table 9 for a crosswalk between these technical criteria and the critical elements for technical quality identified in the USED’s Standards and Assessment Peer Review Guidance.

Table 3a. Standards and Assessment Peer Review Guidance Section 4: Technical Quality—
Critical Element 4.1 (USED, 2004)

Critical Element Examples of Acceptable Evidence Examples of Incomplete Evidence

4.1 For each assessment, including alternate assessment(s), has the State documented the issue of validity (in addition to the alignment of the assessment with the content standards), as described in the Standards for Educational and Psychological Testing (AERA/APA/NCME, 1999), with respect to all of the following categories:

(a) Has the State specified the purposes of the assessments, delineating the types of uses and decisions most appropriate to each? and

(b) Has the State ascertained that the assessments, including alternate assessments, are measuring the knowledge and skills described in its academic content standards and not knowledge, skills, or other characteristics that are not specified in the academic content standards or grade level expectations? and

(c) Has the State ascertained that its assessment items are tapping the intended cognitive processes and that the items and tasks are at the appropriate grade level? and

(d) Has the State ascertained that the scoring and reporting structures are consistent with the sub-domain structures of its academic content standards (i.e., are item interrelationships consistent with the framework from which the test arises)? and

(e) Has the State ascertained that test and item scores are related to outside variables as intended (e.g., scores are correlated strongly with relevant measures of academic achievement and are weakly correlated, if at all, with irrelevant characteristics, such as demographics)? and

(f) Has the State ascertained that the decisions based on the results of its assessments are consistent with the purposes for which the assessments were designed? and

(g) Has the State ascertained whether the assessment produces intended and unintended consequences?

For each assessment, including alternate assessment(s), the State has documented the existing validity evidence in each of the categories and has taken steps to address any deficiencies either in validity or in its approach to establishing and documenting validity evidence.

Possible Evidence

  • For category (a), existing written documentation, such as minutes or policies of the State Board of Education or state legislative code, that defines the purpose(s) of the State’s assessment system.
  • For each of the categories (b) – (g), documentation of the studies that provide evidence in support of the validity of using results from the State’s assessment system for their stated purpose(s).

The State has not provided evidence in all categories (a) – (g) or has not taken steps to address any deficiencies either in validity or in its approach to establishing and documenting validity evidence.


Table 3b. Critical Elements from Title III OELA Monitoring Reports for ELL Assessments (2006) Related to Validity

Critical Element Examples of Acceptable Evidence
3.1 (c) ELP standards are linked to State content and achievement standards in reading/language arts, math, and science (science in 2005–2006) Acceptable evidence includes a process and documentation for linkage and alignment, findings from linkage and alignment studies, and state responses to findings.
3.2 (c) ELP assessments are aligned to ELP standards
3.2 (d) ELP assessments are of high technical quality, including being valid, reliable, and fair Acceptable evidence includes technical manuals for ELP assessment(s), including scoring guides, and other documents that describe the ELP assessment(s).


Table 4a. Standards and Assessment Peer Review Guidance Section 4: Technical Quality—
Critical Element 4.2 (USED, 2004)

Critical Element Examples of Acceptable Evidence Examples of Incomplete Evidence

4.2 For each assessment, including alternate assessment(s), has the State considered the issue of reliability, as described in the Standards for Educational and Psychological Testing (AERA/APA/NCME, 1999), with respect to all of the following categories:

(a) Has the State determined the reliability of the scores it reports, based on data for its own student population and each reported subpopulation? and

(b) Has the State quantified and reported within the technical documentation for its assessments the conditional standard error of measurement and student classification that are consistent at each cut score specified in its academic achievement standards? and

(c) Has the State reported evidence of generalizability for all relevant sources, such as variability of groups, internal consistency of item responses, variability among schools, consistency from form to form of the test, and inter-rater consistency in scoring?

For each assessment, including alternate assessment(s), the State has documented reliability evidence in each of the categories and has taken steps to address any deficiencies either in reliability or in the State’s approach to establishing and documenting reliability evidence.

Possible Evidence

  • For each of the categories (a) – (c), documentation of the studies that support the reliability of each of the State’s assessments with the State’s own student population.
  • Documentation of the precision of the assessments at cut scores and evidence of a systematic process for addressing any deficiencies identified in these studies.
  • Documentation of consistency of student level classification and evidence of a systematic process for addressing any deficiencies identified in these studies.
The State has not provided evidence in all categories (a) – (c) or has not taken steps to address any deficiencies either in reliability or in the State’s approach to establishing and documenting reliability evidence.

Table 4b. Critical Elements from Title III OELA Monitoring Reports for ELL Assessments (2006) Related to Reliability

Critical Element Examples of Acceptable Evidence
3.2 (d) ELP assessments are of high technical quality, including being valid, reliable, and fair Acceptable evidence includes technical manuals for ELP assessment(s), including scoring guides, and other documents that describe the ELP assessment(s).


Table 5a. Standards and Assessment Peer Review Guidance Section 4: Technical Quality—
Critical Element 4.3 (USED, 2004)

Critical Element Examples of Acceptable Evidence Examples of Incomplete Evidence

4.3 Has the State ensured that its assessment system is fair and accessible to all students, including students with disabilities and students with limited English proficiency, with respect to each of the following issues:

(a) Has the State ensured that the assessments provide an appropriate variety of accommodations for students with disabilities? and

(b) Has the State ensured that the assessments provide an appropriate variety of linguistic accommodations for students with limited English proficiency? and

(c) Has the State taken steps to ensure fairness in the development of the assessments? and

(d) Does the use of accommodations and/or alternate assessments yield meaningful scores?

The State has taken appropriate judgmental (e.g., committee review) and data-based (e.g., bias studies) steps to ensure that its assessment system is fair and accessible to all students. Review committees have included representation of identified subgroups.

The State assessment system must be designed to be valid and accessible for use by the widest possible range of students.

The State is conducting studies to determine the appropriateness of accommodations and the impact on test scores.

Possible Evidence

  • Existing written documents describe how the principles of universal design and/or appropriate language simplification were incorporated into each of the State’s assessments.
  • Evidence that students with disabilities were included in the test development process.
  • Existing written documentation of the State’s policies and procedures for the selection and use of accommodations and alternate assessments, including evidence of training for educators who administer these assessments.

The State has conducted data-based bias studies but has not convened committees of stakeholders to review its assessment items.

The State has convened committees of stakeholders to review its assessment items but these committees have not included representation of identified subgroups.

The State assessment system is not designed to be valid and accessible for use by the widest possible range of students.

The State does not have a policy on the appropriate selection and use of accommodations and alternate assessments.

The State does not train or monitor personnel at the school, LEA, and State levels with regard to the appropriate selection and use of accommodations and alternate assessments.

There are no appropriate accommodations for students with particular disabilities (e.g., no allowable accommodations on the regular assessment or alternate assessments for students who are visually impaired and need large print or Braille or for students who are significantly physically impaired and need assistive technology.)


Table 5b. Critical Elements from Title III OELA Monitoring Reports for ELL Assessments (2006) Related to Fairness

Critical Element Examples of Acceptable Evidence
3.2 (d) ELP assessments are of high technical quality, including being valid, reliable, and fair Acceptable evidence includes technical manuals for ELP assessment(s), including scoring guides, and other documents that describe the ELP assessment(s).


Table 6a. Standards and Assessment Peer Review Guidance Section 4: Technical Quality—
Critical Element 4.4 (USED, 2004)

Critical Element Examples of Acceptable Evidence Examples of Incomplete Evidence

4.4 When different test forms or formats are used, the State must ensure that the meaning and interpretation of results are consistent.

(a) Has the State taken steps to ensure consistency of test forms over time?

(b) If the State administers both an online and paper and pencil test, has the State documented the comparability of the electronic and paper forms of the test?

The State has conducted appropriate equating or linking studies and has presented data that support the success of the equating or linking.

Possible Evidence

  • Documentation describing the State’s approach to ensuring comparability of assessments and assessment results across groups and time.
  • Documentation of equating studies that confirm the comparability of the State’s assessments and assessment results across groups and across time, as well as follow-up documentation describing how the State has addressed any deficiencies.

The State has not conducted or documented equating studies to establish whether test forms are comparable across time.


Table 6b. Critical Elements from Title III OELA Monitoring Reports for ELL Assessments (2006) Related to Comparability

Critical Element Examples of Acceptable Evidence
3.4 (b) If State plans to transition to a new ELP assessment, plan for doing so, including: How State plans to address “comparability” (relationship between old and new ELP assessment (i.e., use of double-testing, bridge studies, judgment procedures, data analysis, or other method). Acceptable evidence includes plan for establishing comparability (e.g., use of double-testing, bridge studies, judgment procedures, data analysis, or other method), results if available, and plan for developing new AMAOs, if applicable.


Table 7a. Standards and Assessment Peer Review Guidance Section 4: Technical Quality—Critical Element 4.5 (USED, 2004)

Critical Element Examples of Acceptable Evidence Examples of Incomplete Evidence

4.5 Has the State established clear criteria for the administration, scoring, analysis, and reporting components of its assessment system, including alternate assessment(s) and does the State have a system for monitoring and improving the on-going quality of its assessment system?

The State developed a set of management controls or standards for each of these components and has communicated these criteria to its contractor(s), LEAs, and schools. It requires its contractor(s) to provide specific information on the degree to which each criterion is met.

The State uses an extensive system of training and monitoring to ensure that each person who is responsible for handling or administering any portion of its assessments does so in a way that protects the security of the assessments and maintains equivalence of administration conditions across students and schools.

Possible Evidence

  • The State’s criteria for administration, scoring, analysis, and reporting are communicated to its contractor(s).
  • The State’s test security policy and consequences for violation are communicated to the public and to local educators.
  • Existing written documentation of the State’s plan for training and monitoring assessment administration conditions across the State, even when its assessment system is comprised of only local assessments.
  • Documentation that the tests clearly delineate which accommodations may be used for specific sections of the test (e.g., specify the items/sections for which a calculator may be used without invalidating the test).

The State does not have a test security policy.

The State does not train or monitor personnel at the school, LEA, and State levels with regard to its test administration procedures and security policy.

The State provides no criteria to its contractor(s) regarding the quality control and security measures it requires for its assessment system.

The State provides no criteria to its contractor(s) to ensure that the procedures for scoring of open-ended tasks meet industry standards for accuracy.


Table 7b. Critical Elements from Title III OELA Monitoring Reports for ELL Assessments (2006) Related to Test Administration, Scoring, and Reporting

Critical Element Examples of Acceptable Evidence

3.2 (e) If multiple ELP assessments are being used, data can be aggregated for comparison and reporting purposes

Acceptable evidence includes description of how the State ensures that data can be aggregated for comparison and reporting purposes.

3.3 (a) (b) (c) Has the State established and implemented clear criteria for the administration, scoring, analysis, and reporting components of its ELP assessments, and does the State have a system for monitoring and improving the ongoing quality of its assessment systems? (Critical Element 3.3)

(a) ELP assessments are administered in a uniform manner statewide.

(b) Methods for administration, scoring, analysis, and reporting have been established.

(c) The state monitors ELP assessment administration practices.

Acceptable evidence includes:

  • Test administration manuals;
  • Evidence of training on test administration, scoring guides, or other documentation that ELP assessments are administered in a uniform manner Statewide;
  • If accommodations were provided on the ELP assessment to students with disabilities, which accommodations, method for determining accommodations, and number and percentage of students receiving such accommodations;
  • Procedure used by State to ensure that criteria for administration, scoring, analysis, and reporting have been communicated to LEAs;
  • Evidence that the State monitors LEA/school administration of ELP assessments, including process for monitoring assessment administration; and
  • Documentation of the State’s plan for training and monitoring assessment administration conditions.


Table 8a. Standards and Assessment Peer Review Guidance Section 4: Technical Quality—
Critical Element 4.6 (USED, 2004)

Critical Element Examples of Acceptable Evidence Examples of Incomplete Evidence

4.6 Has the State evaluated its use of accommodations?

(a) How has the State ensured that appropriate accommodations are available to students with disabilities and that these accommodations are used in a manner that is consistent with instructional approaches for each student, as determined by a student’s IEP or 504 plan?

(b) How has the State determined that scores for students with disabilities that are based on accommodated administration conditions will allow for valid inferences about these students’ knowledge and skills and can be combined meaningfully with scores from non-accommodated administration conditions?

(c) How has the State ensured that appropriate accommodations are available to limited English proficient students and that these accommodations are used as necessary to yield accurate and reliable information about what limited English proficient students know and can do?

(d) How has the State determined that scores for limited English proficiency students that are based on accommodated administration circumstances will allow for valid inferences about these students’ knowledge and skills and can be combined meaningfully with scores from non-accommodated administration circumstances?

The State provides for the use of appropriate accommodations and has conducted studies to ensure that scores based on accommodated administrations can be meaningfully combined with scores based on the standard administrations.

Possible Evidence

  • The State has analyzed the use of specific accommodations for different groups of students with disabilities and has provided training to support sound decisions by IEP teams.
  • The State routinely monitors the extent to which test accommodations are consistent with those provided during instruction.
  • The State has analyzed the effect of specific accommodations for students with limited English proficiency and has shared results with LEAs and schools.
  • Documentation of the quality and consistency of the accommodations it offers for limited English proficient students (e.g., training of translators, simplified English, standardized translation of instructions for test administration that are comparable to the regular assessment).

No analyses have been carried out to determine whether specific accommodations produce the effect intended.

The State does not require that decisions about how students with disabilities will participate in the assessment system be made on an individual basis or specify that these decisions must be consistent with the routine instructional approaches as identified by each student’s IEP and/or 504 plan.

The State uses the same accommodations for limited English proficient students as it uses for students with disabilities.


Table 8b. Critical Elements from Title III OELA Monitoring Reports for ELL Assessments (2006) Related to Accommodations

Critical Element
Per Title III OELA Monitoring Reports, if accommodations are provided on the ELP assessment to students with disabilities, then the state should provide documentation of which accommodations were provided, the method for determining accommodations, and the number and percentage of students receiving such accommodations


Table 9. Crosswalk Between Critical Elements Identified in Standards and Assessment Peer Review Guidance (USED, 2004) and Evaluation of the Technical Evidence of Assessments for Special Student Populations (AACC, 2007)

Notes: Table 9 provides another overview of technical criteria for evaluating the quality of assessments. It lists validated technical criteria by type (validity, reliability, bias and sensitivity) and evidence/method elements one would expect to see in support of each type vis-à-vis the various aspects of test development (e.g., test design and development, item level, test level). These criteria are cross-referenced with the critical elements for technical quality identified in Standards and Assessment Peer Review Guidance (USED, 2004). An “X” indicates evidence that state officials might consider in order to support the technical quality (per Standards and Assessment Peer Review Guidance) of their assessments for special student populations. For more information about the technical criteria presented here, see the document titled Evaluation of the Technical Evidence of Assessments for Special Student Populations (PDF).

TECHNICAL CRITERIA PEER REVIEW CRITICAL ELEMENTS:
TECHNICAL QUALITY
4.1 4.2 4.3 4.4 4.5 4.6
  TYPE ELEMENT: EVIDENCE/METHOD Validity Reliability Fairness/Access Comparability Administration, Scoring, Analysis, Reporting Accomodations
Test Design and Development
Item/Test level Construct validity Test purpose X          
Population/classification X X X X X  
Theoretical foundation/framework X          
Universal design X   X      
Readability X   X   X  
Test Design and Development
Item level Content validity Alignment (items-to-standards) X       X  
Linkage (items-to-standards,
standards-to-standards)
X       X  
Expert judgment X       X  
p-values/point biserials X X     X  
IRT/item fit X       X  
Structural equation modeling X       X  
t-tests X       X  
ANOVA X       X  
Factor analysis X       X  
Test Design and Development
Test level Construct validity Equivalence/comparability X     X   X
Multi-trait/multi-method/subtest
inter-correlation
X     X X  
Content validity Test blueprint X          
Alignment (test form-to-blueprint) X       X  
Content validity Descriptive statistics
(e.g., central tendency, variation)
X X     X  
IRT/test fit X       X  
Linking/equating X     X    
Criterion validity
(predictive/concurrent)
Cross tabulations X       X  
Pearson correlation X       X  
Consequential validity Use of results X X X X X  
Test Design and Development
Administration Construct validity Accommodation X X X X X X
Fidelity X     X   X
Standardization   X X   X  
Test Design and Development
Item/Test level Reliability —
Stability & consistency
Standard error of measurement/
confidence intervals
  X     X  
Test-retest   X     X  
Alternate form   X   X X  
Reliability —
Internal consistency
Coefficient alpha   X     X  
KR-21   X     X  
Test length/power estimates   X     X  
Split-half   X     X  
Reliability —
Generalizability
G-coefficient   X     X  
Reliability —
Classification consistency
Correlation coefficient   X     X  
Percent correspondence   X     X  
Classification error   X     X  
Bias and sensitivity —
Linguistic
Expert review X   X X    
DIF analysis         X  
Bias and sensitivity —
Ethnicity/race
Expert review X   X X    
DIF analysis         X  
Bias and sensitivity —
Cultural/religious
Expert review X   X X    
Bias and sensitivity —
Geographic
Expert review X   X X    
DIF analysis         X  
Bias and sensitivity —
SES
Expert review X   X X    
DIF analysis         X  
Bias and sensitivity —
Disability
Expert review X   X X    
DIF analysis         X  
Bias and sensitivity —
Gender
Expert review X   X X    
DIF analysis         X  
Field Testing
  Content validity Blueprint X          
Sampling X X        
Norming X     X X  
Scoring
  Content validity Rubric X X     X  
Scale X X     X  
Standard setting
(cut score and proficiency levels)
X X X   X  
Training of scorers/scoring protocol   X     X  

Reliability —
Inter-rater

Correlation (Kappa)   X     X  
Percent correspondence   X     X  
Reporting
  Consequential validity Reporting category X       X X
N X X     X  
Central tendency/variation X X     X  
Effect size X     X X  
Security
  Consequential validity Protocols X X     X X

Related Links

Please cite as: Sato, E., Rabinowitz, S., Worth, P., Gallagher, C., Lagunoff, R., & McKeag, H. (2007). Guidelines for Ensuring the Technical Quality of Assessments Affecting English Language Learners and Students with Disabilities: Development and Implementation of Regulations. (Assessment and Accountability Comprehensive Center report). San Francisco: WestEd.

© 2007 WestEd. All rights reserved.

Home | About Us | NCLB | Resources | Events | Contact Us| Login