Enhancing computational semantic representations in distributional models of language for Automated Summary Evaluation
- José Antonio León Cascón Director
- Ricardo Olmos Albacete Director
Universidade de defensa: Universidad Autónoma de Madrid
Fecha de defensa: 23 de outubro de 2020
- Juan Botella Ausina Presidente/a
- Rafael Marcos Ruiz Rodríguez Secretario
- Paul van den Broek Vogal
Tipo: Tese
Resumo
The general aim of the present doctoral dissertation is to compare and to analyze the performance of two computational methods, the cosine based similarity method and the Inbuilt Rubric method, in different Automated Summary Evaluation tasks. Although both methods use the same knowledge corpus, they generate qualitatively different vector representations. The cosine based similarity method is a classic and standard measure in distributional models of language that generates an holistic evaluation of texts. On the contrary, the Inbuilt Rubric method transforms the latent semantic space and generates a multi-vector evaluation that is more analytic. In this doctoral dissertation, the performance of both methods is compared with respect to human evaluations of expert judges. Then, we aim to obtain different validity evidences about the Inbuilt Rubric method as an evaluation tool for constructed responses (in our case, student summaries) through a progressively and systematic approach to the combination of computational methods and psychometrics using criteria such as convergent and discriminant validity, exploratory factor analysis or structural equation models. In this doctoral dissertation, the performance of both computational methods has been compared in a total of 1,236 summaries of 726 highschool and undergraduate students. These summaries have been distributed among four empirical chapters. In the first empirical chapter, we conducted a between-subjects study with 166 participants to analyze some relevant parameters of the Inbuilt Rubric method (number of lexical descriptors per concept, and weighting meaningful dimensions by abstract dimensions) in comparison to the Golden Summary method (Martínez-Huertas, Jastrzebska, Mencu, Moraleda, Olmos, & León, 2018). In the second empirical chapter, we conducted a within-subject study with 100 participants to analyze the performance of Inbuilt Rubric and Golden Summary methods and to obtain different validity evidences such as the sensitivity to academic levels of the students using human criteria based on assessment rubrics and multiple-choice tests (Martínez-Huertas, Jastrzebska, Olmos, & León, 2019). In the third empirical chapter, we conducted a between-subject study with 255 participants to analyze the convergent and discriminant validity of the specific scores of the Inbuilt Rubric method in comparison to the partial contents similarity method, and the multicollinearity and similarity of the semantic representations of both computational methods were also compared (Martínez-Huertas, Olmos, & León, submitted). In the fourth empirical chapter, we conducted a within-subject study with 205 participants to analyze the scores of Inbuilt Rubric method from a perspective centered in the usefulness of the measurement model of factor scores and we showed the usefulness of using latent factors to improve the convergent and discriminant validity of computational scores through exploratory factor analysis and structural equation models. Moreover, in this study, we developed and validated an alternative version of Inbuilt Rubric method that does not require to select lexical descriptors. Results of the different studies of the present doctoral dissertation show that the performance of Inbuilt Rubric method is higher than the performance of the cosine based similarity method in Automated Summary Evaluation. Moreover, we analyzed this method using a validity-centered approach. This approach is closely related to classic psychometric concepts that are not very common in computational science (such as convergent and discriminant validity or construct validity via factor analysis). This can lead to design better psychoeducational assessments using computational models of language. We studied here student summaries, but the findings can be useful to measure different psychological constructs. Thus, we think that the theoretical and methodological framework of this doctoral dissertation can lead to new research lines that generate assessment tools of relevant psychological constructs through the combination of computational semantics and psychometrics.