Item-level equivalence in the PISA 2022 creative thinking assessment: Gender-related differential item functioning using MIMIC modeling


SÖZER BOZ E., Akbaş D.

Thinking Skills and Creativity, cilt.61, 2026 (SSCI, Scopus) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 61
  • Basım Tarihi: 2026
  • Doi Numarası: 10.1016/j.tsc.2026.102211
  • Dergi Adı: Thinking Skills and Creativity
  • Derginin Tarandığı İndeksler: Social Sciences Citation Index (SSCI), Scopus, Psycinfo
  • Anahtar Kelimeler: Creative thinking, Differential item functioning, Item-level, MIMIC, PISA 2022
  • Çanakkale Onsekiz Mart Üniversitesi Adresli: Evet

Özet

This study examined item-level measurement equivalence in the PISA 2022 Creative Thinking Assessment, focusing on the presence of gender-related Differential Item Functioning (DIF) and identifying potential sources of DIF effects. The data were drawn from the PISA 2022 international dataset, which included 142,415 students with valid responses; from this full sample, three subsamples of 2000 students were randomly selected for further analyses. A sequential analytic procedure was employed, beginning with Confirmatory Factor Analysis (CFA), followed by the Multiple Indicator Multiple Causes (MIMIC) model for DIF detection, and a mediated MIMIC approach incorporating proficiency scores (mathematics, science, and reading) as mediators. In the full sample, DIF effects were generally trivial, whereas in the random samples, two items demonstrated gender-related DIF. For both items, the DIF effect was small and favored male students. Results from mediated MIMIC models indicated that mathematics and science scores did not significantly account for the observed DIF, whereas reading scores partially mediated the group differences in item responses. These findings suggest that reading proficiency alone does not fully account for the observed gender-related DIF in creative thinking items. Overall, this study provides item-level information relevant to the interpretation of gender-related score differences in large-scale assessments.