APARSIN : a multi-variety sentiment and translation benchmark for Iranic languages

Publication type: C1
Publication status: Published
Authors: Jafari, Sadegh, Azin, T., Roodi, F., Dehghani Tafti, Z., Ghadrdan, M., Vatankhahan Esfahani, E., Naebzadeh, A., Shahhosseini, M., Khan, G., Forghani, K., Namazi, D., Hossein Hashemi, M., Farsi, F., Osoolian, M., Mohammadi, M., Erfan Zare, M., Hasnain Khan, M., Hussain, M., Zaki, N., Mohammadi, J., Bali, S., Javad Ranjbar, M., Lefever, E., & Hoste, V.
Series: The Proceedings of the First Workshop on NLP and LLMs for the Iranian Language Family
Pagination: 83-97
Publisher: Association for Computational Linguistics (ACL)
Conference: Association for Computational Linguistics (Rabat, Morocco)
Download
View in Biblio

Abstract

The Iranic language family includes many underrepresented languages and dialects that remain largely unexplored in modern NLP research. We introduce APARSIN, a multi-variety benchmark covering 14 Iranic languages, dialects, and accents, designed for sentiment analysis and machine translation. The dataset includes both high and low-resource varieties, several of which are endangered, capturing linguistic variation across them. We evaluate a set of instruction-tuned Large Language Models (LLMs) on these tasks and analyze their performance across the varieties. Our results highlight substantial performance gaps between standard Persian and other Iranic languages and dialects, demonstrating the need for more inclusive multilingual and dialectally diverse NLP benchmarks.

June 8, 2026	20 years of LT3
May 31, 2026	PhD Defense Quanqi Du
May 20, 2026	📢 PhD Position
Dec. 17, 2025	On how GPT-4o, Gemini-2.5 and DeepSeek-R1 have been used in lexicography
Oct. 31, 2025	PhD Defense Sofie