
Publications

Proceedings Article
EMNLP 2025
Suzhou, China
Sep 2025
We Politely Insist: Your LLM Must Learn the Persian Art of Taarof
Nikta Gohari Sadr, Sahar Heidariasl, Karine Megerdoomian, Laleh Seyyed-Kalantari, Ali Emami
Large language models (LLMs) struggle to navigate culturally specific communication norms, limiting their effectiveness in global contexts. We focus on Persian taarof, a social norm in Iranian interactions, which is a sophisticated system of ritual politeness that emphasizes deference, modesty, and indirectness, yet remains absent from existing cultural benchmarks. We introduce TaarofBench, the first benchmark for evaluating LLM understanding of taarof, comprising 450 role-play scenarios covering 12 common social interaction topics, validated by native speakers. Our evaluation of five frontier LLMs reveals substantial gaps in cultural competence, with accuracy rates 40-48% below native speakers when taarof is culturally appropriate. Performance varies between interaction topics, improves with Persian-language prompts, and exhibits gender-based asymmetries. We also show that responses rated "polite" by standard metrics often violate taarof norms, indicating the limitations of Western politeness frameworks. Through supervised fine-tuning and Direct Preference Optimization, we achieve 21.8% and 42.3% improvement in model alignment with cultural expectations. Our human study with 33 participants (11 native Persian, 11 heritage, and 11 non-Iranian speakers) forms baselines in varying degrees of familiarity with Persian norms. This work lays the foundation for developing diverse and culturally aware LLMs, enabling applications that better navigate complex social interactions.

Presentation
4th North American Conference on Iranian Linguistics
University of Toronto Mississauga
May 2025
AI-Enabled Narrative Analytics for Persian and Kurdish
Karine Megerdoomian and Emmanuel Garcia
Narratives are foundational to human expression across cultures. They are in the stories we tell, in folktales, news reports, memoirs, podcasts and visual media. The linguist Bill Labov describes the Narrative as "a recounting of things that have happened, involving a sequence of events meaningfully connected in a temporal and often causal relation, typically structured with a beginning, middle, and end". We used prompt engineering to develop Large Language Models or LLMs that identify the structural elements of a narrative--in other words, the system automatically extracts the information from a text to answer who did what to whom, where and when and why. We found that LLMs perform quite well on this task for Persian and Sorani Kurdish, especially in inferring implicit information and discontinuous elements, without requiring the integration of NLP pipeline components or structured resources such as WordNet or a Treebank. However, these systems are inconsistent (especially for Kurdish) and don't perform as well in complex analyses such as coreference resolution. For researchers working on endangered or minority languages, this finding opens exciting doors.