top of page

Complex Predicates in Persian and Beyond

A long-term research program on the structure, semantics, and computation of complex predicates.

Persian complex predicates (CPrs)—constructions combining light verbs with nominal, adjectival, or prepositional elements—represent one of the most intricate and debated phenomena in Iranian linguistics. They challenge traditional notions of wordhood, argument structure, and compositional meaning, blurring the boundaries between syntax and lexicon. This long-term initiative revisits the study of CPrs through both linguistic theory and computational modeling, aiming to create the most comprehensive survey and analytical framework for Persian and its sister Iranian languages.
Overview

Persian complex predicates (CPrs)—constructions combining light verbs with nominal, adjectival, or prepositional elements—represent one of the most intricate and debated phenomena in Iranian linguistics. They challenge traditional notions of wordhood, argument structure, and compositional meaning, blurring the boundaries between syntax and lexicon. This long-term initiative revisits the study of CPrs through both linguistic theory and computational modeling, aiming to create the most comprehensive survey and analytical framework for Persian and its sister Iranian languages.

Methodology

We are combining formal linguistic analysis with computational experimentation:
- Literature survey and synthesis of existing theoretical accounts (lexicalist, syntactic, and construction-based approaches).
- Corpus annotation of Persian CPrs with semantic roles, event structure, and argument composition.
- Computational modeling using machine learning and rule-based techniques for automatic detection and classification of CPrs.
- Cross-linguistic extension to other Iranian languages (Kurdish, Balochi, Gilaki, etc.) to identify shared morphosyntactic patterns and divergences.
- Collaborative research network bringing together linguists and computational modelers to build shared resources and standards.

Preliminary Results

The project is in its foundational phase, focusing on literature review and corpus design. Initial analyses confirm the need for a unified theoretical typology that captures both syntactic flexibility and semantic regularities. Early computational tests using LLMs show promise in recognizing frequent CPrs but struggle with idiomatic or compositional variants—underscoring the challenge of mapping linguistic subtlety to model behavior.

Use Case

This program will serve as:
- A reference database and survey for scholars working on Persian and Iranian morphosyntax.
- A computational toolkit for automatic detection and classification of complex predicates.
- A training and evaluation resource for improving LLM understanding of predicate structure and aspectual composition.
- The foundation for a future monograph or digital corpus advancing both linguistic theory and computational methods in Iranian linguistics.

Team

Karine Megerdoomian with Simin Karimi

Latest publication or presentation

(if available)

bottom of page