E-assessment of free-text answers based on domain specific sublanguages and knowledge representation

González Barbone, Víctor

E-assessment of free-text answers based on domain specific sublanguages and knowledge representation

González Barbone, Víctor

Supervised by:

Martín Llamas Nistal Director

Defence university: Universidade de Vigo

Fecha de defensa: 17 June 2013

Committee:

Manuel Alonso Castro Gil Chair
Luis E. Anido Rifón Secretary
Pablo Belzarena García Committee member

Type: Thesis

Teseo: 338162 DIALNET Investigo editor

Abstract

Free-text answers can broadly be assessed from two viewpoints: correctness of knowledge content and quality of writing. Content evaluation ultimately results in a comparison of students' answers against a reference answer or model. The reference may be obtained from teachers' answers, manually marked students' answers, a corpus of reference texts, or structures such as a semantic network. Techniques used for free-text assessment include keyword analysis, latent semantic analysis (LSA), surface linguistic features, text categorization, information extraction, and clustering. Many different systems exist, applied to different subject areas, using different evaluation metrics, which makes them very difficult to compare. The main weakness of most of these systems is the lack of a large enough corpus of marked answers or some other reliable reference. Dispersion among teachers has also proved to be a factor of distortion. In this work, the reference model is a semantic network inferred from text written in an easy to use sublanguage, built by the students themselves as a learning activity. This is expected to improve on the drawbacks of existing systems, since no traning set is required, the reference is agreed upon by teachers and students, texts are written in an predictable way, terms are uniformly used, and the assessment system works in the same way the reference was created. Moreover, the proposal in this thesis is conceived mainly as a learning activity, supported by a learning tool, which can be used in assessment. The construction of a sublanguage usable by secondary school students is discussed, and some experimental sublanguages developed. Answers written in the sublanguage can be converted into a semantic network, which can be compared against a reference semantic network. The reference semantic network has been gradually put together by the students themselves, as an activity in the learning process, similar to taking notes, writing a summary or building a concept map. A proof of concept prototype includes facilities for guiding the writing in a sublanguage, transforming the resulting text into a semantic network, and colouring the reference semantic network to show what parts of the answer have been recognized, together with a mark. Assessment is prepared in a learning activity, marking is automatic, and feedback to students immediate. Several lexical resources are identified, tested and used to build a general purpose lexicon apt for different domains. Several versions of generative grammars are proposed and analyzed. A methodology for sublanguage development is proposed. Knowledge Representation is studied in its applicability to Education and Learning, identifying the most promising techniques. Several possible schemes of sublanguage and knowledge representation are analyzed, each targeted to a different purpose. A complete example of a syntax based sublanguage and a semantic network as a knowledge representation model is presented, and tested using the prototype application developed. Original contributions of this thesis result from the combination of assessment, sublanguages and knowledge representation models in a practicable proposal that effectively integrates assessment into the learning process, and a satisfactory end to end testing of the system that proves its feasibility.