Evaluating Four Methods for Detecting Differential Item Functioning in Large-Scale Assessments with More Than Two Groups

doi:10.48550/arXiv.2408.11922

Evaluating Four Methods for Detecting Differential Item Functioning in Large-Scale Assessments with More Than Two Groups

This study evaluated four multi-group differential item functioning (DIF) methods (the root mean square deviation approach, Wald-1, generalized logistic regression procedure, and generalized Mantel-Haenszel method) via Monte Carlo simulation of controlled testing conditions. These conditions varied in the number of groups, the ability and sample size of the DIF-contaminated group, the parameter associated with DIF, and the proportion of DIF items. When comparing Type-I error rates and powers of the methods, we showed that the RMSD approach yielded the best Type-I error rates when it was used with model-predicted cutoff values. Also, this approach was found to be overly conservative when used with the commonly used cutoff value of 0.1. Implications for future research for educational researchers and practitioners were discussed.

Publication:

arXiv e-prints

Pub Date:

August 2024

DOI:

10.48550/arXiv.2408.11922

arXiv:

arXiv:2408.11922

Bibcode:

2024arXiv240811922K

Keywords:

Statistics - Applications

E-Print:

preprint, 16 pages (excluding figures, references, and title page)

NASA/ADS

Evaluating Four Methods for Detecting Differential Item Functioning in Large-Scale Assessments with More Than Two Groups

Abstract