Noise or music? Investigating the usefulness of normalisation for robust sentiment analysis on social media data

Publication type
A2
Publication status
Published
Authors
Van Hee, C., Van de Kauter, M., De Clercq, O., Lefever, E., Desmet, B., & Hoste, V.
Journal
TRAITEMENT AUTOMATIQUE DES LANGUES
Volume
58
Issue
1
Pagination
63-87
Download
(.pdf)
View in Biblio
(externe link)

Abstract

In the past decade, sentiment analysis research has thrived, especially on social media. While this data genre is suitable to extract opinions and sentiment, it is known to be noisy. Complex normalisation methods have been developed to transform noisy text into its standard form, but their effect on tasks like sentiment analysis remains underinvestigated. Sentiment analysis approaches mostly include spell checking or rule-based normalisation as preprocess- ing and rarely investigate its impact on the task performance. We present an optimised sentiment classifier and investigate to what extent its performance can be enhanced by integrating SMT-based normalisation as preprocessing. Experiments on a test set comprising a variety of user-generated content genres revealed that normalisation improves sentiment classification performance on tweets and blog posts, showing the model’s ability to generalise to other data genres.