Adverse drug reaction extraction on electronic health records written in spanish

SANTISO GONZALEZ, SARA

Adverse drug reaction extraction on electronic health records written in spanish

SANTISO GONZALEZ, SARA

Supervised by:

Alicia Pérez Ramírez Director
Arantza Casillas Rubio Director

Defence university: Universidad del País Vasco - Euskal Herriko Unibertsitatea

Fecha de defensa: 13 June 2019

Committee:

Raquel Martínez Unanue Chair
Arantza Díaz de Ilarraza Sánchez Secretary
Lluís Padró Cirera Committee member

Type: Thesis

Teseo: 149860 DIALNET ADDI editor

Abstract

This work focuses on the automatic extraction of Adverse Drug Reactions (ADRs) in Electronic HealthRecords (EHRs). That is, extracting a response to a medicine which is noxious and unintended and whichoccurs at doses normally used. From Natural Language Processing (NLP) perspective, this wasapproached as a relation extraction task in which the drug is the causative agent of a disease, sign orsymptom, that is, the adverse reaction.ADR extraction from EHRs involves major challenges. First, ADRs are rare events. That is, relationsbetween drugs and diseases found in an EHR are seldom ADRs (are often unrelated or, instead, related astreatment). This implies the inference from samples with skewed class distribution. Second, EHRs arewritten by experts often under time pressure, employing both rich medical jargon together with colloquialexpressions (not always grammatical) and it is not infrequent to find misspells and both standard andnon-standard abbreviations. All this leads to a high lexical variability.We explored several ADR detection algorithms and representations to characterize the ADR candidates.In addition, we have assessed the tolerance of the ADR detection model to external noise such as theincorrect detection of implied medical entities implied in the ADR extraction, i.e. drugs and diseases. Westtled the first steps on ADR extraction in Spanish using a corpus of real EHRs.