A semiannual International Research Journal

A Bias Guideline Preempts Experimental DIF Detection

Document Type : Original Article

Author

Assistant professor of TEFL, Department of English Translation, Hazrat-e Masoumeh University, Qom, Iran

Abstract
The present article reports on a judgmental analysis of the items in the English subtest of Iranian University Entrance Exam (2009) investigating possible biased items. Plenty of statistical techniques, such as Logistic Regression, Mantel-Haenszel method, and IRT approaches, are developed to detect differential item functioning (DIF) and biased items. They require a pilot study of the test with a sizeable number of subjects. However, sometimes pre-testing is not possible and only subjective and judgmental analysis can be used to detect potential biased items. Off course, judgments should be informed by research findings and experts’ opinions. This study suggests that research findings and experts’ opinion can be combined to create a bias guideline for language test development. If research shed enough light on the issue of biased items and tests, a bias guideline may preempt the experimental DIF detection. This study utilized previous research findings, experts’ opinion on bias, and the author’s intuition to propose a bias guideline and figure out the possible biased items or bundles (groups of items) in the Iranian University Entrance Exam.

Keywords


Bachman, L.F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.
Bachman, L., & Palmer, A. (1996). Language testing in practice. Oxford: Oxford University Press.
Brown, A., & Iwashita, N. (1996). Language background and item difficulty: The development of a computer adaptive test of Japanese. System, 24, 199-206.
Brown, J. D. (1996). Testing in language programs. Prentice Hall Regents.
Chapelle, C., & Roberts, C. (1986). Ambiguity tolerance and field dependence as predictors of proficiency in English as a second language. Language Learning, 36, 27–45.
Chen, Z., & Henning, G. (1985). Linguistic and cultural bias in language proficiency tests. Language Testing, 2, 155–63.
Elder, C. (1996). The effect of language background on foreign language test performance: the case of Chinese, Italian, and modern Greek. Language Learning, 46, 233–82.
Elder, C. (1997). What does test bias have to do with fairness? Language Testing, 14, 261–277.
Farhady, H. (1982). Measures of language proficiency from the learner’s perspective. TESOL Quarterly, 16, 43–59.
Gipps, C. V. (1995). Beyond Testing. London: The Falmer Press.
Hale, G.A. (1988). Student major field and text content: interactive effects on reading comprehension in the Test of English as a Foreign Language. Language Testing, 5, 49-61.
Hansen, J., & Stanfield, C. (1981). The relationship between field dependence- independence cognitive styles and foreign language achievement. Language Learning, 31, 349–367.
Hansen, L. (1984). Field dependence-independence and language testing: evidence from six Pacific island cultures. TESOL Quarterly, 18, 311–324.
Hellekant, J. (1994). Are multiple-choice tests unfair to girls? System, 22, 349-352.
Henning, G. (1987). A Guide to Language Testing. Newbury House Publishers.
Kim, M. (2001). Detecting DIF across the different language groups in a speaking test. Language Testing, 18, 89–114.
Kunnan, A.J. (1990). DIF in native language and gender groups in an ESL placement test. TESOL Quarterly, 24, 741–746.
Lee, Y. W., Breland, H., & Muraki, E. (2004). Comparability of TOEFL CBT writing  prompts for different native language groups (TOEFL Research Reports No. RR–77). Princeton, NJ: Educational Testing Service. Retrieved February 22, 2006, from http://www.ets.org/Media/Research/pdf/RR-04-24.pdf
McNamara, T.F., & Roever, C. (2007). Language testing: the social dimension. Blackwell Publication.
Pae, T.-I. (2004). DIF for examinees with different academic backgrounds. Language Testing 21, 53–73.
Pae, T.-I., & Park, G-P. (2006). Examining the relationship between differential item functioning and differential test functioning. Language Testing, 23, 475– 496.
Richards, J. C., & Schmidt, R. (2002). Dictionary of Language teaching and Applied Linguistics. Pearson Education Limited.
Ryan, K., & Bachman, L-F. (1992) Differential item functioning on two tests of EFL  proficiency. Language Testing, 9, 12–29.
Shohamy, E. (1997). Testing methods, testing consequences: are they ethical? Are they fair? Language Testing, 14, 340–349.
Spurling, S., & Ilyin, D. (1985). The impact of learner variables on language test  performance. TESOL Quarterly, 19, 283–301.
Stanfield, C., & Hansen, J. (1983). Field dependence-independence as a variable in second language cloze test performance. TESOL Quarterly, 17, 29–38.
Takala, S., & Kaftandjieva, F. (2000). Test fairness: A DIF analysis of an L2 vocabulary test. Language Testing, 17, 323–40.
Zeinder, M. (1986). Are English language aptitude test biased towards culturally different minority groups? Some Israeli findings. Language Testing, 3, 80–95.
Zeinder, M. (1987).A comparison of ethnic, sex, and age biases in the predictive validity of English language aptitude tests: Some Israeli data. Language Testing, 4, 55–71.
Zumbo, B. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and likert-type (ordinal) item scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense. Retrieved February 22, 2006, from http:/www.educ.ubc.ca./faculty/zumbo/DIF/index.html.
Zumbo, B. (2003). Does item-level DIF manifest itself in scale-level analyses? Implications for translating language tests. Language Testing, 20, 136–47.
Volume 1, Issue 1
March 2018
Pages 147-172

  • Receive Date 17 April 2024