top of page

What Are the Iranian Languages and Why They Matter for the Future of AI

The Iranian or Iranic languages stretch across a vast geography and an even vaster history. They appear in the poetry of Hafez, the epics of Ferdowsi, the stories of Kurdish dengbêj singers, the conversations of Tajik bazaars, the melodies of Luri lullabies, and the modern newsfeeds of millions. Yet despite this rich tapestry, the digital world often treats them as footnotes; or worse, as languages too complicated, too “low-resource,” or too culturally specific to matter for AI.


At Zoorna Institute, we argue the opposite: the Iranian (Iranic) languages are essential for building truly global, culturally intelligent AI. And to understand why, we must first understand the family itself.


The distribution of the languages of the Iranian linguistic family (courtesy of Maps on the Web)
The distribution of the languages of the Iranian linguistic family (courtesy of Maps on the Web)

A Family, Not a Single Language

When people hear “Iranian languages,” they often think only of Persian (Farsi). But Persian is just one branch of a family that spans continents and centuries.


This linguistic family includes:

Western Iranian

  • Persian (Farsi, Dari, Tajik)

  • Kurdish (Sorani, Kurmanji)

  • Luri

  • Bakhtiari

  • Gilaki

  • Mazandarani

Eastern Iranian

  • Pashto

  • Ossetic

  • Wakhi

  • Shughni

  • Yaghnobi

  • Baluchi (bridging West and East)


These languages are not dialects of one another. They differ in:

  • phonology and rhythm

  • morphology and verb systems

  • pragmatics and politeness norms

  • vocabulary, idioms, and metaphor

  • historical scripts (Persian-Arabic, Cyrillic, Latin)

  • narrative traditions and cultural logic

They share deep connections, but each has its own linguistic features.


For a list of the Iranian languages, visit the Iranic family chart.


Living History in Linguistic Form

The Iranian languages are carriers of:

  • the oldest continuous written Indo-European tradition

  • Silk Road literary exchange

  • epic storytelling and mysticism, as well as writings on medicine, science, and philosophy

  • layered politeness systems

  • tribal, regional, and diasporic identities

  • resilience in the face of colonial suppression, marginalization, and displacement


When AI systems fail to represent these languages, entire histories, archives, voices, and communities disappear from the digital sphere. This is not a technical gap; it is a cultural loss.


Why Iranian Languages Challenge AI

Modern large language models were trained mostly on English and a handful of other high-resource languages. Iranian languages contain features that strain those assumptions:


1. Rich Morphology

Sorani’s verb chains, Pashto’s inflection, or Luri’s clitics carry essential meaning that LLMs sometimes find challenging.

2. Pragmatic Subtlety

Politeness, hierarchy, and social distance shape every utterance. TaarofBench showed how dramatically AI misreads this.

3. Sparse Training Data

Even Persian, with millions of speakers, is underrepresented; Kurdish, Luri, and Gilaki barely appear in mainstream datasets.

4. Script Diversity

Iranic languages are represented in distinct writing systems including Persian-Arabic, Cyrillic (e.g., Tajik), and Latin-script Kurdish.

5. Cultural Narrative Forms

Indirectness, metaphor, and narrative softening are woven into daily discourse. LLMs often misinterpret them.


The Missing Context: Language Policy and Digital Absence

For many communities across Iran and the broader region, the Iranian languages have lived more in homes, music, and memory than in official schools or publications. Limited institutional support for teaching, documenting, or standardizing these languages has made preservation challenging over generations. That absence carries into the digital world as well: without widespread educational use, formal corpora, or sustained media presence, many Iranian languages remain underrepresented in the datasets that modern AI systems rely on. This structural gap is one of the primary reasons AI struggles with these languages today.


Why This Matters for Scholars and Communities

Many Iranologists, linguists, and heritage speakers approach AI cautiously—and understandably. Technologies built without cultural awareness can flatten nuance, misinterpret meaning, or reinforce stereotypes.


But responsible AI research can support Iranian-language communities in powerful ways:

  • Preserving endangered languages

Documenting low-density languages before they fade in younger generations.

  • Supporting education for heritage learners

Tools like our Persian AI Tutor can help diaspora families maintain language ties.

  • Improving access to historical and cultural archives

AI-assisted search across manuscripts, oral histories, and textual corpora.

  • Empowering regional scholarship

Kurdish, Luri, Gilaki, or Tajik scholars deserve tools that work for their languages — not tools that treat them as afterthoughts.

  • Building a community of practice

Iranian languages have been understudied in NLP not because they lack value, but because they lack representation.

This is a moment of opportunity.


Zoorna Institute is building a research ecosystem dedicated to the languages of Iran and the Caucasus. This includes:

  • narrative analytics for Persian and Kurdish

  • Tajik–Farsi transliteration systems

  • cultural-pragmatic benchmarks like TaarofBench

  • open calls for datasets, corpora, and linguistic insight

  • the SilkRoadNLP 2026 workshop — the first of its kind


We believe AI should amplify linguistic knowledge, not replace it. And we know that any meaningful progress requires collaboration with those who have lived, studied, and loved these languages for decades.


Why Iranian Languages Matter for the Future of AI

As AI grows more global, superficial fluency is no longer enough. We need systems that:

  • understand cultural nuance

  • respect social norms

  • interpret context and hierarchy

  • navigate indirectness

  • learn from scholars and communities

  • preserve linguistic diversity rather than erase it


In other words:

To build global AI, we must build multilingual, culturally intelligent AI. And the Iranian languages are a vital part of that future.

 
 
 

Comments


ZI banner 3.jpeg
bottom of page