Word level language identification in online multilingual communication

Publication type
C1
Publication status
Published
Authors
Nguyen, D., & Doğruöz, A.S.
Editor
David Yarowsky, Timothy Baldwin, Anna Korhonen, Karen Livescu and Steven Bethard
Series
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing
Pagination
857-862
Publisher
Association for Computational Linguistics (ACL) (Seattle, Washington, USA)
Conference
Conference on Empirical Methods inNatural Language Processing (EMNLP 2013) (Seattle, Washington, USA)
Download
(.pdf)
View in Biblio
(externe link)

Abstract

Multilingual speakers switch between languages in online and spoken communication. Analyses of large scale multilingual data re- quire automatic language identification at the word level. For our experiments with multilingual online discussions, we first tag the language of individual words using language models and dictionaries. Secondly, we incorporate context to improve the performance. We achieve an accuracy of 98%. Besides word level accuracy, we use two new metrics to evaluate this task.