📊 Scientometrics: Mapping the Fields of NLP and Linguistics
Tracking the evolution, divergence, and intersections of linguistic and AI research communities.

Overview
As AI and Natural Language Processing (NLP) have advanced rapidly over the past two decades, their relationship to traditional linguistics has shifted. This project explores the intellectual landscape of both fields using scientometric techniques—examining how the two domains align, diverge, and influence one another.
By analyzing publications, citation patterns, co-authorship networks, and thematic clusters, we aim to understand not only where the fields are headed, but also how inclusive and interdisciplinary they remain—especially in regard to low-resource languages and theoretical depth.
Methodology
We employ a range of bibliometric and network analysis techniques to capture evolving research trends:
- Corpus Construction: Collecting metadata and full-text data from ACL Anthology, arXiv, Linguist List, and other relevant databases
- Thematic Classification: Using topic modeling and keyword clustering to map shifts in focus (e.g., syntax vs. embeddings, morphology vs. transformers)
- Network Analysis: Constructing co-authorship and citation networks to identify research hubs, isolated subfields, and bridge authors
- Temporal Analysis: Tracking changes in methods, evaluation metrics, and theoretical references across time
- Equity & Representation: Annotating publications to evaluate how underrepresented languages and regions appear in the academic discourse
Preliminary Results
The gap between theoretical linguistics and NLP has widened, especially post-2018 with the rise of LLMs
- Most NLP publications cluster around a small number of languages (English, Chinese, Arabic) with low-resource regions severely underrepresented
- Some cross-disciplinary collaborations exist but tend to be ad hoc rather than systemic
- Authors working on morphologically complex or minority languages tend to have lower citation counts and fewer institutional resources
- Formal semantics and syntax are decreasingly cited in top NLP venues, though pragmatics is resurging due to interest in alignment and hallucination
Use Case
- Researchers and funding agencies can use these insights to identify neglected topics or communities in need of support
- Educators can better design curricula that bridge NLP and linguistics
- Policy and DEI initiatives in research institutions can be guided by these structural inequities
- Cross-disciplinary conferences can leverage this mapping to foster meaningful collaborations between AI and linguistic theory
Team
Ahmed Alkuraydis
Latest publication or presentation
(if available)