top of page
u8736419749_A_background_image_with_holographic_AI_diagrams_n_143576f7-0c9e-42cd-806a-3c12

Publications

Proceedings Article

EMNLP 2025

Suzhou, China

Sep 2025

We Politely Insist: Your LLM Must Learn the Persian Art of Taarof

Nikta Gohari Sadr, Sahar Heidariasl, Karine Megerdoomian, Laleh Seyyed-Kalantari, Ali Emami

Large language models (LLMs) struggle to navigate culturally specific communication norms, limiting their effectiveness in global contexts. We focus on Persian taarof, a social norm in Iranian interactions, which is a sophisticated system of ritual politeness that emphasizes deference, modesty, and indirectness, yet remains absent from existing cultural benchmarks. We introduce TaarofBench, the first benchmark for evaluating LLM understanding of taarof, comprising 450 role-play scenarios covering 12 common social interaction topics, validated by native speakers. Our evaluation of five frontier LLMs reveals substantial gaps in cultural competence, with accuracy rates 40-48% below native speakers when taarof is culturally appropriate. Performance varies between interaction topics, improves with Persian-language prompts, and exhibits gender-based asymmetries. We also show that responses rated "polite" by standard metrics often violate taarof norms, indicating the limitations of Western politeness frameworks. Through supervised fine-tuning and Direct Preference Optimization, we achieve 21.8% and 42.3% improvement in model alignment with cultural expectations. Our human study with 33 participants (11 native Persian, 11 heritage, and 11 non-Iranian speakers) forms baselines in varying degrees of familiarity with Persian norms. This work lays the foundation for developing diverse and culturally aware LLMs, enabling applications that better navigate complex social interactions.

Presentation

4th North American Conference on Iranian Linguistics

University of Toronto Mississauga

May 2025

AI-Enabled Narrative Analytics for Persian and Kurdish

Karine Megerdoomian and Emmanuel Garcia

Narratives are foundational to human expression across cultures. They are in the stories we tell, in folktales, news reports, memoirs, podcasts and visual media. The linguist Bill Labov describes the Narrative as "a recounting of things that have happened, involving a sequence of events meaningfully connected in a temporal and often causal relation, typically structured with a beginning, middle, and end". We used prompt engineering to develop Large Language Models or LLMs that identify the structural elements of a narrative--in other words, the system automatically extracts the information from a text to answer who did what to whom, where and when and why. We found that LLMs perform quite well on this task for Persian and Sorani Kurdish, especially in inferring implicit information and discontinuous elements, without requiring the integration of NLP pipeline components or structured resources such as WordNet or a Treebank. However, these systems are inconsistent (especially for Kurdish) and don't perform as well in complex analyses such as coreference resolution. For researchers working on endangered or minority languages, this finding opens exciting doors.

Download
bottom of page