Enhancing content validity assessment with item response theory modeling

  1. Rodrigo Schames Kreitchmann
  2. Pablo Nájera
  3. Susana Sanz
  4. Miguel Ángel Sorrel
Revista:
Psicothema

ISSN: 0214-9915 1886-144X

Año de publicación: 2024

Volumen: 36

Número: 2

Páginas: 145-153

Tipo: Artículo

DOI: 10.7334/PSICOTHEMA2023.208 DIALNET GOOGLE SCHOLAR lock_openAcceso abierto editor

Otras publicaciones en: Psicothema

Resumen

Antecedentes: Garantizar la validez de evaluaciones requiere un examen exhaustivo del contenido de una prueba. Es común emplear expertos en la materia (EM) para evaluar la relevancia, representatividad y adecuación de los ítems. Este artículo propone integrar la teoría de respuesta al ítem (TRI) en las evaluaciones hechas por EM. La TRI ofrece parámetros de discriminación y umbral de los EM, evidenciando su desempeño al diferenciar ítems relevantes/ irrelevantes, detectando desempeños subóptimos, mejorando también la estimación de la relevancia de los ítems. Método: Se comparó el uso de la TRI frente a índices tradicionales (índice de validez de contenido y V de Aiken) en ítems de responsabilidad. Se evaluó la precisión de los EM al discriminar si los ítems medían responsabilidad o no, y si sus evaluaciones permitían predecir los pesos factoriales de los ítems. Resultados: Las puntuaciones de TRI identificaron bien los ítems de responsabilidad (R2 = 0,57) y predijeron sus cargas factoriales (R2 = 0,45). Además, mostraron validez incremental, explicando entre 11% y 17% más de varianza que los índices tradicionales. Conclusiones: La TRI en las evaluaciones de los EM mejora la alineación de ítems y predice mejor los pesos factoriales, mejorando validez del contenido de los instrumentos.

Referencias bibliográficas

  • Abad, F. J., Sorrel, M. A., Garcia, L. F., & Aluja, A. (2018). Modeling general, specific, and method variance in personality measures: Results for ZKA-PQ and NEO-PI-R. Assessment, 25(8), 959–977. https://doi.org/10.1177/1073191116667547
  • Aiken, L. R. (1980). Content validity and reliability of single items or questionnaires. Educational and Psychological Measurement, 40(4), 955959. https://doi.org/10.1177/001316448004000419
  • Almanasreh, E., Moles, R., & Chen, T. F. (2019). Evaluation of methods used for estimating content validity. Research in Social and Administrative Pharmacy, 15(2), 214–221. https://doi.org/10.1016/j.sapharm.2018.03.066
  • American Educational Research Association [AERA], American Psychological Association [APA], & National Council on Measurement in Education [NCME] (Eds.). (2014). Standards for educational and psychological testing (14th ed.). American Educational Research Association.
  • Bhola, D. S., Impara, J. C., & Buckendahl, C. W. (2003). Aligning tests with States’ content standards: Methods and issues. Educational Measurement: Issues and Practice, 22(3), 21–29. https://doi.org/10.1111/j.1745-3992.2003.tb00134.x
  • Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29. https://doi.org/10.18637/jss.v048.i06
  • Collado, S., Corraliza, J. A., & Sorrel, M. A. (2015). Spanish version of the Children’s Ecological Behavior (CEB) scale. Psicothema, 27(1), 82–87. https://doi.org/10.7334/psicothema2014.117
  • Costa, P. T., Jr., & McCrae, R. R. (1992). Revised NEO Personaliry Inventory (NEO-PI-R) and NEO Five-Factor Inventory (NEO-FFI) professional manual. Psychological Assessment Resources.
  • Fitzpatrick, A. R. (1983). The meaning of content validity. Applied Psychological Measurement, 7(1), 3–13. https://doi.org/10.1177/014662168300700102
  • García, P. E., Díaz, J. O., & Torre, J. de la. (2014). Application of cognitive diagnosis models to competency-based situational judgment tests. Psicothema, 26(3), 372–377. https://doi.org/10.7334/psicothema2013.322
  • Gómez-Benito, J., Sireci, S., & Padilla, J.-L. (2018). Differential item functioning: Beyond validity evidence based on internal structure. Psicothema, 30, 104–109. https://doi.org/10.7334/psicothema2017.183
  • Jennrich, R. I., & Bentler, P. M. (2011). Exploratory bi-factor analysis. Psychometrika, 76(4), 537–549. https://doi.org/10.1007/s11336-0119218-4
  • Kreitchmann, R. S., Abad, F. J., Ponsoda, V., Nieto, M. D., & Morillo, D. (2019). Controlling for response biases in self-report scales: Forced-choice vs. psychometric modeling of likert items. Frontiers in Psychology, 10, Article 2309. https://doi.org/10.3389/fpsyg.2019.02309
  • Li, X., & Sireci, S. G. (2013). A new method for analyzing content validity data using multidimensional scaling. Educational and Psychological Measurement, 73(3), 365–385. https://doi.org/10.1177/0013164412473825
  • Lunz, M. E., Stahl, J. A., & Wright, B. D. (1994). Interjudge reliability and decision reproducibility. Educational and Psychological Measurement, 54(4), 913–925. https://doi.org/10.1177/0013164494054004007
  • Lunz, M. E., Wright, B. D., & Linacre, J. M. (1990). Measuring the impact of judge severity on examination scores. Applied Measurement in Education, 3(4), 331-345. https://doi.org/10.1207/s15324818ame0304_3
  • Martone, A., & Sireci, S. G. (2009). Evaluating alignment between curriculum, assessment, and instruction. Review of Educational Research, 79(4), 1332-1361. https://doi.org/10.3102/0034654309341375
  • Martuza, V. R. (1977). Applying norm-referenced and criterion-referenced measurement in education. Allyn and Bacon.
  • Mastaglia, B., Toye, C., & Kristjanson, L. J. (2003). Ensuring content validity in instrument development: Challenges and innovative approaches. Contemporary Nurse, 14(3), 281–291. https://doi.org/10.5172/conu.14.3.281
  • McCoach, D. B., Gable, R. K., & Madura, J. P. (2013). Instrument development in the affective domain: School and corporate applications. Springer. https://doi.org/10.1007/978-1-4614-7135-6
  • Nájera, P., Abad, F. J., & Sorrel, M. A. (2021). Determining the number of attributes in cognitive diagnosis modeling. Frontiers in Psychology, 12, Article 614470.
  • Nieto, M. D., Abad, F. J., Hernández-Camacho, A., Garrido, L. E., Barrada, J. R., Aguado, D., & Olea, J. (2017). Calibrating a new item pool to adaptively assess the Big Five. Psicothema, 29(3), 390–395. https://doi.org/10.7334/psicothema2016.391
  • Oltmanns, J. R., & Widiger, T. A. (2020). The five-factor personality inventory for ICD-11: A aacet-level assessment of the ICD-11 trait model. Psychological Assessment, 32(1), 60–71. https://doi.org/10.1037/pas0000763
  • Penfield, R. D., & Giacobbi, Jr., Peter R. (2004). Applying a score confidence interval to Aiken’s item content-relevance index. Measurement in Physical Education and Exercise Science, 8(4), 213–225. https://doi.org/10.1207/s15327841mpee0804_3
  • Polit, D. F., & Beck, C. T. (2006). The content validity index: Are you sure you know what’s being reported? critique and recommendations. Research in Nursing & Health, 29(5), 489–497. https://doi.org/10.1002/nur.20147
  • Porter, A. C. (2002). Measuring the content of instruction: Uses in research and practice. Educational Researcher, 31(7), 3–14. https://doi.org/10.3102/0013189X031007003
  • R Core Team. (2023). R: A language and environment for statistical computing [Computer software]. R Foundation for Statistical Computing. https://www.R-project.org/
  • Rios, J., & Wells, C. (2014). Validity evidence based on internal structure. Psicothema, 26(1), 108–116. https://doi.org/10.7334/psicothema2013.260
  • Robitzsch, A., & Steinfeld, J. (2018). Item response models for human ratings: Overview, estimation methods, and implementation in R. Psychological Test and Assessment Modeling, 60(1), 101–138.
  • Rovinelli, R. J., & Hambleton, R. K. (1977). On the use of content specialists in the assessment of criterion-referenced test item validity. Dutch Journal of Educational Research, 2, 49–60.
  • Rubio, D. M., Berg-Weger, M., Tebb, S. S., Lee, E. S., & Rauch, S. (2003). Objectifying content validity: Conducting a content validity study in social work research. Social Work Research, 27(2), 94–104. https://doi.org/10.1093/swr/27.2.94
  • Samejima, F. (1968). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, 34(4, Pt. 2), 100–100.
  • Sireci, S. G. (1998a). Gathering and analyzing content validity data. Educational Assessment, 5(4), 299–321. https://doi.org/10.1207/s15326977ea0504_2
  • Sireci, S. G. (1998b). The construct of content validity. Social Indicators Research, 45(1/3), 83–117.
  • Sireci, S., & Benítez, I. (2023). Evidence for test validation: A guide for practitioners. Psicothema, 35(3), 217-226. https://doi.org/10.7334/psicothema2022.477
  • Sireci, S. G., & Faulkner-Bond, M. (2014). Validity evidence based on test content. Psicothema, 26(1), 100–107. https://doi.org/10.7334/psicothema2013.256
  • Tatsuoka, K. K. (1983). Rule Space: An Approach for Dealing with Misconceptions Based on Item Response Theory. Journal of Educational Measurement, 20(4), 345–354.
  • Thissen, D., & Wainer, H. (1982). Some standard errors in item response theory. Psychometrika, 47(4), 397-412. https://doi.org/10.1007/BF02293705
  • Waugh, M. H., McClain, C. M., Mariotti, E. C., Mulay, A. L., DeVore, E. N., Lenger, K. A., Russell, A. N., Florimbio, A. R., Lewis, K. C., Ridenour, J. M., & Beevers, L. G. (2021). Comparative content analysis of self-report scales for level of personality functioning. Journal of Personality Assessment, 103, 161–173. https://doi.org/10.1080/00223891.2019.1705464
  • Webb, N. L. (2007). Issues related to judging the alignment of curriculum standards and assessments. Applied Measurement in Education, 20(1), 7–25. https://doi.org/10.1080/08957340709336728
  • Wu, M. (2017). Some IRT-based analyses for interpreting rater effects. Psychological Test and Assessment Modeling, 59(4), 453–470.