Araştırma Makalesi
BibTex RIS Kaynak Göster
Yıl 2019, Cilt: 10 Sayı: 3, 219 - 234, 04.09.2019
https://doi.org/10.21031/epod.477857

Öz

Kaynakça

  • Arıkan, S., & Kilmen, S. (2018). Sınıf İçi Ölçme ve Değerlendirmede Puanlara Anlam Kazandırma: %70 Doğru Yanıt Yöntemi. İlköğretim Online, 17(2), 888-908.
  • Beaton, A. E., & Allen, N. L. (1992). Interpreting scales through scale anchoring. Journal of Educational Statistics, 17(2), 191-204
  • Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 137–162). Newbury Park, CA: Sage.
  • Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling: A Multidisciplinary Journal, 9, 233–255. doi:10.1207/S15328007SEM0902_5.
  • Demirtaşlı, N. (2009). Eğitimde niteliği sağlamak: ölçme ve değerlendirme sistemi örneği olarak CİTO Türkiye öğrenci izleme sistemi (ÖİS). Cito Eğitim: Kuram ve Uygulama, 3, 25-38.
  • Draney, K., & Wilson, M. (2009). Selecting cut scores with a composite of item types: The ConstructMapping procedure. In E. V. Smith Jr. & G. E. Stone (Eds.), Criterion referenced testing: Practice analysis to score reporting using Rasch measurement models (pp. 276–293). Maple Grove, MN: JAM Press
  • Embretson, S. E., & Reise, S. P. (2000). Item Response Theory for Psychologists. London: Lawrence Elbaum Associates, Publishers.
  • George, D., & Mallery, P. (2003). SPSS for Windows step by step: A simple guide and reference. 11.0 update (4th ed.). Boston: Allyn ve Bacon.
  • Goodman, D. P., & Hambleton, R. K. (2004). Student test score reports and interpretive guides: Review of current practices and suggestions for future research. Applied Measurement in Education, 17(2), 145-220
  • Hambleton, R. K., & Jones, R. W. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 12(3), 38-47. doi:10.1111/j.1745-3992.1993.tb00543.x
  • Hambleton, R. K., Swaminathan, H., & Rogers, H. (1991). Fundamentals of Item Response Theory. Newbury Park CA: Sage.
  • Hu, L.-T. & Bentler, P.M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives, Structural Equation Modeling, 6, 1–55. doi:10.1080/10705519909540118
  • Huynh, H. (2006). A clarification on the response probability criterion RP67 for standard settings based on bookmark and item mapping. Educational Measurement: Issues and Practice, 25(2), 19-20.
  • Jodoin, M. G., & Gierl, M. J. (2001). Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education, 14(4), 329- 349.
  • Karantonis, A. (2017). Using Exemplar Items to Define Performance Categories: A Comparison of Item Mapping Methods. (Unpublished doctoral dissertation). University of Massachusetts, Amherst.
  • Karantonis, A., & Sireci, S. G. (2006). The bookmark standard‐setting method: A literature review. Educational Measurement: Issues and Practice, 25(1), 4-12.
  • Kennedy, C. A., Wilson, M. R., Draney, K., Tutunciyan, S., & Vorp, R. (2010). ConstructMap 4.6. [computer software]. Berkeley, California.
  • Kolstad, A., Cohen, J., Baldi, S., Chan, T., DeFur, E., & Angeles, J. (1998). The response probability convention used in reporting data from IRT assessment scales: Should NCES adopt a standard? Washington, DC: American Institutes for Research.
  • Muthen, B. O., & Muthen, L. K. (2015). Mplus (Version 7.4). California. Los Angeles.
  • Shulman, L. S. (2009). Assessment of teaching or assessment for teaching? Reflections on the invitational conference. In G. H. Gitomer (Ed.), Measurement issues and assessment for teaching quality. Thousand Oaks, CA: Sage Publications.
  • Ullman, J. B. (2001). Structural equation modeling. In B. Tabachnick & L. S. Fidell (Eds.), Using multivariate statistics (4th ed., pp.653-771). Boston: Allyn & Bacon.
  • Van de Vijver, F. J. R. (2017). Capturing bias in structural equation modeling. In E. Davidov, P. Schmidt, & J. Billiet (Eds.), Cross-cultural analysis. Methods and applications (2nd, revised edition). New York, NY: Routledge.
  • Zumbo, B. D. (1999). A Handbook on the Theory and Methods of Differential Item Functioning (DIF): Logistic Regression Modeling as a Unitary Framework for Binary and Likert-Type (Ordinal) Item Scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.
  • Zwick, R., Senturk, D., Wang, J., & Loomis, S. C. (2001). An investigation of alternative methods for item mapping in the National Assessment of Educational Progress. Educational Measurement: Issues and Practice, 20(2), 15-25.

An Example of Empirical and Model Based Methods for Performance Descriptors: English Proficiency Test

Yıl 2019, Cilt: 10 Sayı: 3, 219 - 234, 04.09.2019
https://doi.org/10.21031/epod.477857

Öz

Great
emphasis is given to the development of high-stake tests all around the world
and in Turkey. However, limited emphasis is given to adequate score reporting.
Too much emphasis on rankings and almost no emphasis on performance level
descriptors (meaning of the scores) have leaded a “ranking culture” in Turkey.
There is an immense need to raise awareness about score reporting and
performance level descriptions in Turkey. This study aims to raise awareness
about the use of performance level descriptors in a high-stake exam in Turkey,
an English proficiency exam. The study sample is consisted of 630 undergraduate
students who took the 2016-2017 English proficiency exam of a public university
in the southwest of the Turkey. In order to identify the potential exemplars,
two types of item mapping methods (i.e. experimental based method and
model-based method) were used in the present study. Item grouping for
performance level descriptors provided hierarchical and interpretable
structure. Using these performance level descriptors, it is possible to give
criterion referenced feedback to each student about his/her reading abilities.

Kaynakça

  • Arıkan, S., & Kilmen, S. (2018). Sınıf İçi Ölçme ve Değerlendirmede Puanlara Anlam Kazandırma: %70 Doğru Yanıt Yöntemi. İlköğretim Online, 17(2), 888-908.
  • Beaton, A. E., & Allen, N. L. (1992). Interpreting scales through scale anchoring. Journal of Educational Statistics, 17(2), 191-204
  • Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 137–162). Newbury Park, CA: Sage.
  • Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling: A Multidisciplinary Journal, 9, 233–255. doi:10.1207/S15328007SEM0902_5.
  • Demirtaşlı, N. (2009). Eğitimde niteliği sağlamak: ölçme ve değerlendirme sistemi örneği olarak CİTO Türkiye öğrenci izleme sistemi (ÖİS). Cito Eğitim: Kuram ve Uygulama, 3, 25-38.
  • Draney, K., & Wilson, M. (2009). Selecting cut scores with a composite of item types: The ConstructMapping procedure. In E. V. Smith Jr. & G. E. Stone (Eds.), Criterion referenced testing: Practice analysis to score reporting using Rasch measurement models (pp. 276–293). Maple Grove, MN: JAM Press
  • Embretson, S. E., & Reise, S. P. (2000). Item Response Theory for Psychologists. London: Lawrence Elbaum Associates, Publishers.
  • George, D., & Mallery, P. (2003). SPSS for Windows step by step: A simple guide and reference. 11.0 update (4th ed.). Boston: Allyn ve Bacon.
  • Goodman, D. P., & Hambleton, R. K. (2004). Student test score reports and interpretive guides: Review of current practices and suggestions for future research. Applied Measurement in Education, 17(2), 145-220
  • Hambleton, R. K., & Jones, R. W. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 12(3), 38-47. doi:10.1111/j.1745-3992.1993.tb00543.x
  • Hambleton, R. K., Swaminathan, H., & Rogers, H. (1991). Fundamentals of Item Response Theory. Newbury Park CA: Sage.
  • Hu, L.-T. & Bentler, P.M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives, Structural Equation Modeling, 6, 1–55. doi:10.1080/10705519909540118
  • Huynh, H. (2006). A clarification on the response probability criterion RP67 for standard settings based on bookmark and item mapping. Educational Measurement: Issues and Practice, 25(2), 19-20.
  • Jodoin, M. G., & Gierl, M. J. (2001). Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education, 14(4), 329- 349.
  • Karantonis, A. (2017). Using Exemplar Items to Define Performance Categories: A Comparison of Item Mapping Methods. (Unpublished doctoral dissertation). University of Massachusetts, Amherst.
  • Karantonis, A., & Sireci, S. G. (2006). The bookmark standard‐setting method: A literature review. Educational Measurement: Issues and Practice, 25(1), 4-12.
  • Kennedy, C. A., Wilson, M. R., Draney, K., Tutunciyan, S., & Vorp, R. (2010). ConstructMap 4.6. [computer software]. Berkeley, California.
  • Kolstad, A., Cohen, J., Baldi, S., Chan, T., DeFur, E., & Angeles, J. (1998). The response probability convention used in reporting data from IRT assessment scales: Should NCES adopt a standard? Washington, DC: American Institutes for Research.
  • Muthen, B. O., & Muthen, L. K. (2015). Mplus (Version 7.4). California. Los Angeles.
  • Shulman, L. S. (2009). Assessment of teaching or assessment for teaching? Reflections on the invitational conference. In G. H. Gitomer (Ed.), Measurement issues and assessment for teaching quality. Thousand Oaks, CA: Sage Publications.
  • Ullman, J. B. (2001). Structural equation modeling. In B. Tabachnick & L. S. Fidell (Eds.), Using multivariate statistics (4th ed., pp.653-771). Boston: Allyn & Bacon.
  • Van de Vijver, F. J. R. (2017). Capturing bias in structural equation modeling. In E. Davidov, P. Schmidt, & J. Billiet (Eds.), Cross-cultural analysis. Methods and applications (2nd, revised edition). New York, NY: Routledge.
  • Zumbo, B. D. (1999). A Handbook on the Theory and Methods of Differential Item Functioning (DIF): Logistic Regression Modeling as a Unitary Framework for Binary and Likert-Type (Ordinal) Item Scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.
  • Zwick, R., Senturk, D., Wang, J., & Loomis, S. C. (2001). An investigation of alternative methods for item mapping in the National Assessment of Educational Progress. Educational Measurement: Issues and Practice, 20(2), 15-25.
Toplam 24 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Bölüm Makaleler
Yazarlar

Serkan Arıkan

Sevilay Kilmen 0000-0002-5432-7338

Mehmet Abi 0000-0002-4976-5173

Eda Üstünel 0000-0003-2137-1671

Yayımlanma Tarihi 4 Eylül 2019
Kabul Tarihi 30 Haziran 2019
Yayımlandığı Sayı Yıl 2019 Cilt: 10 Sayı: 3

Kaynak Göster

APA Arıkan, S., Kilmen, S., Abi, M., Üstünel, E. (2019). An Example of Empirical and Model Based Methods for Performance Descriptors: English Proficiency Test. Journal of Measurement and Evaluation in Education and Psychology, 10(3), 219-234. https://doi.org/10.21031/epod.477857