Learning-based detection of scientific terms in patient information

Publication type
P1
Publication status
Published
Authors
Hoste, V., Lefever, E., Vanopstal, K., & Delaere, I.
Series
LREC 2008 : sixth international conference on language resources and evaluation
Pagination
585-591
Publisher
European Language Resources Association (ELRA) (Paris, France)
Conference
6th International conference on Language Resources and Evaluation (LREC 2008) (Marrakech, Morocco)
Download
(.pdf)
View in Biblio
(externe link)

Abstract

In this paper, we investigate the use of a machine-learning based approach to the specific problem of scientific term detection in patient information. Lacking lexical databases which differentiate between the scientific and popular nature of medical terms, we used local context, morphosyntactic, morphological and statistical information to design a learner which accurately detects scientific medical terms. This study is the first step towards the automatic replacement of a scientific term by its popular counterpart, which should have a beneficial effect on readability. We show a F-score of 84% for the prediction of scientific terms in an English and Dutch EPAR corpus. Since recasting the term extraction problem as a classification problem leads to a large skewedness of the resulting data set, we rebalanced the data set through the application of some simple TF-IDF-based and Log-likelihood-based filters. We show that filtering indeed has a beneficial effect on the learner’s performance. However, the results of the filtering approach combined with the learning-based approach remain below those of the learning-based approach.