Session Overview
Session
IS3: Recent Methodological Developments for Testing Measurement Invariance
Time:
Friday, 24/Jul/2015:
9:45am - 11:15am

Session Chair: Carolin Strobl
Location: KO2-F-180 (Ⅵ)
capacity: 372

Presentations

Recent methodological developments for testing measurement invariance

Chair(s): Carolin Strobl (Universität Zürich, Switzerland)

This symposium gives an overview over recent methodological developments for testing measurement invariance in item response theory, factor analysis, and cognitive diagnosis models.
 

Presentations of the Symposium

 

Detecting violations of measurement invariance in item response theory

Carolin Strobl1, Julia Kopf2, Basil Abou El-Komboz2, Achim Zeileis3; carolin.strobl@uzh.chcarolin.strobl@uzh.ch
1Universität Zürich, Switzerland, 2LMU München, Germany, 3Universität Innsbruck, Austria

The main aim of educational and psychological testing is to provide a means for objective and fair comparisons between different test takers by establishing measurement invariance. However, in practical test development measurement, invariance is often violated by differential item functioning (DIF), which can lead to an unfair advantage or disadvantage for certain groups of test takers. A variety of statistical methods has been suggested for detecting DIF in item response theory (IRT) models, such as the Rasch model, that are increasingly used in educational and psychological testing. However, most of these methods are designed for the comparison of pre-specified focal and reference groups, such as females vs. males, whereas in reality the group of disadvantaged test takers may be formed by a complex combination of several covariates, such as females only up to a certain age. In this talk, a new framework for DIF detection based on model-based recursive partitioning is presented that can detect groups of test takers exhibiting DIF in a data-driven way. The talk outlines the statistical methodology behind the new approach as well as its practical application for binary and polytomous IRT models.
 

Score-based tests of measurement invariance with respect to continuous and ordinal variables

Achim Zeileis1, Edgar C. Merkle2, Ting Wang2; Achim.Zeileis@uibk.ac.atAchim.Zeileis@uibk.ac.at
1Universität Innsbruck, Austria, 2University of Missouri, USA

The issue of measurement invariance commonly arises in psychometric models and is typically assessed via likelihood ratio tests, Lagrange multiplier tests, and Wald tests, all of which require advance definition of the number of groups, group membership, and offending model parameters. We present a family of recently-proposed measurement invariance tests that are based on the scores of a fitted model (i.e., observation-wise derivatives of the log-likelihood with respect to the model parameters). This family can be used to test for measurement invariance w.r.t. a continuous auxiliary variable, without pre-specification of subgroups. Moreover, the family can be used when one wishes to test for measurement invariance w.r.t. an ordinal auxiliary variable, yielding test statistics that are sensitive to violations that are monotonically related to the ordinal variable (and less sensitive to non-monotonic violations). The tests can be viewed as generalizations of the Lagrange multiplier (or score) test and they are especially useful for identifying subgroups of individuals that violate measurement invariance (without prespecified thresholds) as well as identifying specific parameters impacted by measurement invariance violations. We illustrate how the tests can be applied in practice in factor-analytic contexts using the R packages "lavaan" for model estimation and "strucchange" for carrying out the tests and visualization of the results.
 

Exact versus approximate measurement invariance. Theoretical overview and empirical examples

Jan Cieciuch, Eldad Davidov, René Algesheimer; jancieciuch@gazeta.pljancieciuch@gazeta.pl
Universität Zürich, Switzerland

Measurement invariance is a necessary condition for conducting meaningful comparisons of means and relationships between variables across groups (Vandenberg & Lance, 2000). Measurement invariance implies that the parameters of a measurement model (factor loadings, intercepts) are equal across groups. One of the most frequently used procedures for measurement invariance testing is multigroup confirmatory factor analysis (MGCFA) which compares the fit indices between models with parameters constrained to be equal across groups and those with freely estimated parameters. Three levels of measurement invariance are usually distinguished: configural (the same items load on the same factors in each group), metric (factor loadings are constrained to be exactly equal across groups) and scalar (factor loadings and the intercepts are constrained to be exactly equal across groups). Establishing measurement invariance in this approach is very difficult and this method has been criticized as being unrealistic and too strict. Muthén and Asparouhov (2013) recently proposed a new approach to test for approximate rather than exact measurement invariance using Bayesian MGCFA. Approximate measurement invariance permits small differences between parameters (loadings and intercepts) otherwise constrained to be equal in the classical exact approach. In the presentation we will discuss the main differences between the exact and approximate approaches to test for measurement invariance. Furthermore, we will compare results obtained in both approaches while testing for the measurement invariance of the Portrait Value Questionnaire developed by Schwartz and colleagues (2001, 2012) to measure values. The results suggest that the approximate measurement invariance seems to be more likely than the exact approach to establish measurement invariance which enables meaningful cross-group comparisons.
 

Differential item functioning in cognitive diagnosis models

Michel Philipp1, Carolin Strobl1, Achim Zeileis2; m.philipp@psychologie.uzh.chm.philipp@psychologie.uzh.ch
1Universität Zürich, Switzerland, 2Universität Innsbruck, Austria

Cognitive diagnosis models (CDMs) are a family of psychometric models for analyzing dichotomous response data. They provide detailed information about mastery or non-mastery of predefined skills, which are required to solve the tested items, and can thus reflect the strengths and weaknesses of the examinees in the form of a skills profile. In the context of educational testing,  this means that the students can be given detailed feedback, which particular skills they need to practice more, rather than only being reported their overall test performance. However, for reliable interpretation and fair comparisons these models also rely on measurement invariance, which may be violated in practice by differential item functioning (DIF). Taking the simplest version of a CDM, the non-compensatory DINA model, as an example, the talk introduces the general principles of CDMs, explains what DIF means in this context and presents an overview over recent approaches for detecting DIF in CDMs.