An Example of Empirical and Model Based Methods for Performance Descriptors: English Proficiency Test

Serkan Arıkan; Sevilay Kilmen; Mehmet Abi; Eda Üstünel

doi:10.21031/epod.477857

Araştırma Makalesi

Yıl 2019, Cilt: 10 Sayı: 3, 219 - 234, 04.09.2019

Serkan Arıkan Sevilay Kilmen Mehmet Abi Eda Üstünel

https://doi.org/10.21031/epod.477857

Öz

Kaynakça

Arıkan, S., & Kilmen, S. (2018). Sınıf İçi Ölçme ve Değerlendirmede Puanlara Anlam Kazandırma: %70 Doğru Yanıt Yöntemi. İlköğretim Online, 17(2), 888-908.
Beaton, A. E., & Allen, N. L. (1992). Interpreting scales through scale anchoring. Journal of Educational Statistics, 17(2), 191-204
Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 137–162). Newbury Park, CA: Sage.
Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling: A Multidisciplinary Journal, 9, 233–255. doi:10.1207/S15328007SEM0902_5.
Demirtaşlı, N. (2009). Eğitimde niteliği sağlamak: ölçme ve değerlendirme sistemi örneği olarak CİTO Türkiye öğrenci izleme sistemi (ÖİS). Cito Eğitim: Kuram ve Uygulama, 3, 25-38.
Draney, K., & Wilson, M. (2009). Selecting cut scores with a composite of item types: The ConstructMapping procedure. In E. V. Smith Jr. & G. E. Stone (Eds.), Criterion referenced testing: Practice analysis to score reporting using Rasch measurement models (pp. 276–293). Maple Grove, MN: JAM Press
Embretson, S. E., & Reise, S. P. (2000). Item Response Theory for Psychologists. London: Lawrence Elbaum Associates, Publishers.
George, D., & Mallery, P. (2003). SPSS for Windows step by step: A simple guide and reference. 11.0 update (4th ed.). Boston: Allyn ve Bacon.
Goodman, D. P., & Hambleton, R. K. (2004). Student test score reports and interpretive guides: Review of current practices and suggestions for future research. Applied Measurement in Education, 17(2), 145-220
Hambleton, R. K., & Jones, R. W. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 12(3), 38-47. doi:10.1111/j.1745-3992.1993.tb00543.x
Hambleton, R. K., Swaminathan, H., & Rogers, H. (1991). Fundamentals of Item Response Theory. Newbury Park CA: Sage.
Hu, L.-T. & Bentler, P.M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives, Structural Equation Modeling, 6, 1–55. doi:10.1080/10705519909540118
Huynh, H. (2006). A clarification on the response probability criterion RP67 for standard settings based on bookmark and item mapping. Educational Measurement: Issues and Practice, 25(2), 19-20.
Jodoin, M. G., & Gierl, M. J. (2001). Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education, 14(4), 329- 349.
Karantonis, A. (2017). Using Exemplar Items to Define Performance Categories: A Comparison of Item Mapping Methods. (Unpublished doctoral dissertation). University of Massachusetts, Amherst.
Karantonis, A., & Sireci, S. G. (2006). The bookmark standard‐setting method: A literature review. Educational Measurement: Issues and Practice, 25(1), 4-12.
Kennedy, C. A., Wilson, M. R., Draney, K., Tutunciyan, S., & Vorp, R. (2010). ConstructMap 4.6. [computer software]. Berkeley, California.
Kolstad, A., Cohen, J., Baldi, S., Chan, T., DeFur, E., & Angeles, J. (1998). The response probability convention used in reporting data from IRT assessment scales: Should NCES adopt a standard? Washington, DC: American Institutes for Research.
Muthen, B. O., & Muthen, L. K. (2015). Mplus (Version 7.4). California. Los Angeles.
Shulman, L. S. (2009). Assessment of teaching or assessment for teaching? Reflections on the invitational conference. In G. H. Gitomer (Ed.), Measurement issues and assessment for teaching quality. Thousand Oaks, CA: Sage Publications.
Ullman, J. B. (2001). Structural equation modeling. In B. Tabachnick & L. S. Fidell (Eds.), Using multivariate statistics (4th ed., pp.653-771). Boston: Allyn & Bacon.
Van de Vijver, F. J. R. (2017). Capturing bias in structural equation modeling. In E. Davidov, P. Schmidt, & J. Billiet (Eds.), Cross-cultural analysis. Methods and applications (2nd, revised edition). New York, NY: Routledge.
Zumbo, B. D. (1999). A Handbook on the Theory and Methods of Differential Item Functioning (DIF): Logistic Regression Modeling as a Unitary Framework for Binary and Likert-Type (Ordinal) Item Scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.
Zwick, R., Senturk, D., Wang, J., & Loomis, S. C. (2001). An investigation of alternative methods for item mapping in the National Assessment of Educational Progress. Educational Measurement: Issues and Practice, 20(2), 15-25.

An Example of Empirical and Model Based Methods for Performance Descriptors: English Proficiency Test

Yıl 2019, Cilt: 10 Sayı: 3, 219 - 234, 04.09.2019

Serkan Arıkan Sevilay Kilmen Mehmet Abi Eda Üstünel

https://doi.org/10.21031/epod.477857

Öz

Great
emphasis is given to the development of high-stake tests all around the world
and in Turkey. However, limited emphasis is given to adequate score reporting.
Too much emphasis on rankings and almost no emphasis on performance level
descriptors (meaning of the scores) have leaded a “ranking culture” in Turkey.
There is an immense need to raise awareness about score reporting and
performance level descriptions in Turkey. This study aims to raise awareness
about the use of performance level descriptors in a high-stake exam in Turkey,
an English proficiency exam. The study sample is consisted of 630 undergraduate
students who took the 2016-2017 English proficiency exam of a public university
in the southwest of the Turkey. In order to identify the potential exemplars,
two types of item mapping methods (i.e. experimental based method and
model-based method) were used in the present study. Item grouping for
performance level descriptors provided hierarchical and interpretable
structure. Using these performance level descriptors, it is possible to give
criterion referenced feedback to each student about his/her reading abilities.

Anahtar Kelimeler

Criterion referenced assessment, performance level descriptors, empirical method, model based method, construct map

Kaynakça

Arıkan, S., & Kilmen, S. (2018). Sınıf İçi Ölçme ve Değerlendirmede Puanlara Anlam Kazandırma: %70 Doğru Yanıt Yöntemi. İlköğretim Online, 17(2), 888-908.
Beaton, A. E., & Allen, N. L. (1992). Interpreting scales through scale anchoring. Journal of Educational Statistics, 17(2), 191-204
Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 137–162). Newbury Park, CA: Sage.
Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling: A Multidisciplinary Journal, 9, 233–255. doi:10.1207/S15328007SEM0902_5.
Demirtaşlı, N. (2009). Eğitimde niteliği sağlamak: ölçme ve değerlendirme sistemi örneği olarak CİTO Türkiye öğrenci izleme sistemi (ÖİS). Cito Eğitim: Kuram ve Uygulama, 3, 25-38.
Draney, K., & Wilson, M. (2009). Selecting cut scores with a composite of item types: The ConstructMapping procedure. In E. V. Smith Jr. & G. E. Stone (Eds.), Criterion referenced testing: Practice analysis to score reporting using Rasch measurement models (pp. 276–293). Maple Grove, MN: JAM Press
Embretson, S. E., & Reise, S. P. (2000). Item Response Theory for Psychologists. London: Lawrence Elbaum Associates, Publishers.
George, D., & Mallery, P. (2003). SPSS for Windows step by step: A simple guide and reference. 11.0 update (4th ed.). Boston: Allyn ve Bacon.
Goodman, D. P., & Hambleton, R. K. (2004). Student test score reports and interpretive guides: Review of current practices and suggestions for future research. Applied Measurement in Education, 17(2), 145-220
Hambleton, R. K., & Jones, R. W. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 12(3), 38-47. doi:10.1111/j.1745-3992.1993.tb00543.x
Hambleton, R. K., Swaminathan, H., & Rogers, H. (1991). Fundamentals of Item Response Theory. Newbury Park CA: Sage.
Hu, L.-T. & Bentler, P.M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives, Structural Equation Modeling, 6, 1–55. doi:10.1080/10705519909540118
Huynh, H. (2006). A clarification on the response probability criterion RP67 for standard settings based on bookmark and item mapping. Educational Measurement: Issues and Practice, 25(2), 19-20.
Jodoin, M. G., & Gierl, M. J. (2001). Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education, 14(4), 329- 349.
Karantonis, A. (2017). Using Exemplar Items to Define Performance Categories: A Comparison of Item Mapping Methods. (Unpublished doctoral dissertation). University of Massachusetts, Amherst.
Karantonis, A., & Sireci, S. G. (2006). The bookmark standard‐setting method: A literature review. Educational Measurement: Issues and Practice, 25(1), 4-12.
Kennedy, C. A., Wilson, M. R., Draney, K., Tutunciyan, S., & Vorp, R. (2010). ConstructMap 4.6. [computer software]. Berkeley, California.
Kolstad, A., Cohen, J., Baldi, S., Chan, T., DeFur, E., & Angeles, J. (1998). The response probability convention used in reporting data from IRT assessment scales: Should NCES adopt a standard? Washington, DC: American Institutes for Research.
Muthen, B. O., & Muthen, L. K. (2015). Mplus (Version 7.4). California. Los Angeles.
Shulman, L. S. (2009). Assessment of teaching or assessment for teaching? Reflections on the invitational conference. In G. H. Gitomer (Ed.), Measurement issues and assessment for teaching quality. Thousand Oaks, CA: Sage Publications.
Ullman, J. B. (2001). Structural equation modeling. In B. Tabachnick & L. S. Fidell (Eds.), Using multivariate statistics (4th ed., pp.653-771). Boston: Allyn & Bacon.
Van de Vijver, F. J. R. (2017). Capturing bias in structural equation modeling. In E. Davidov, P. Schmidt, & J. Billiet (Eds.), Cross-cultural analysis. Methods and applications (2nd, revised edition). New York, NY: Routledge.
Zumbo, B. D. (1999). A Handbook on the Theory and Methods of Differential Item Functioning (DIF): Logistic Regression Modeling as a Unitary Framework for Binary and Likert-Type (Ordinal) Item Scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.
Zwick, R., Senturk, D., Wang, J., & Loomis, S. C. (2001). An investigation of alternative methods for item mapping in the National Assessment of Educational Progress. Educational Measurement: Issues and Practice, 20(2), 15-25.

Toplam 24 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Bölüm	Makaleler
Yazarlar	Serkan Arıkan Sevilay Kilmen 0000-0002-5432-7338 Mehmet Abi 0000-0002-4976-5173 Eda Üstünel 0000-0003-2137-1671
Yayımlanma Tarihi	4 Eylül 2019
Kabul Tarihi	30 Haziran 2019
Yayımlandığı Sayı	Yıl 2019 Cilt: 10 Sayı: 3

Kaynak Göster

APA	Arıkan, S., Kilmen, S., Abi, M., Üstünel, E. (2019). An Example of Empirical and Model Based Methods for Performance Descriptors: English Proficiency Test. Journal of Measurement and Evaluation in Education and Psychology, 10(3), 219-234. https://doi.org/10.21031/epod.477857

Kapak Resmi İndir

Makale Dosyaları

Tam Metin