Research2026-07-01

Cross-lingual Relation Extraction with Large Language Models: Zero-Shot, Few-Shot, and Fine-Tuned Evaluation on Romanian

Originally published byArxiv CS.AI

arXiv:2606.31718v1 Announce Type: cross Abstract: Relation extraction (RE) for low-resource languages is typically constrained by the lack of annotated corpora. We investigate the feasibility of cross-lingual RE for Romanian by combining automatic dataset translation with large language model (LLM)...

This new research from arXiv (2606.31718v1) tackles a persistent bottleneck in Natural Language Processing (NLP): the scarcity of annotated data for low-resource languages. By focusing on Romanian and using Relation Extraction (RE) as the testbed, the study systematically evaluates how Large Language Models (LLMs) perform when the training data must be translated from a high-resource language (likely English) rather than curated natively.

What Happened

The researchers investigated cross-lingual RE for Romanian by combining automatic dataset translation with LLMs. They benchmarked performance across three common deployment scenarios: zero-shot (no Romanian examples seen), few-shot (a handful of translated examples), and fine-tuned (full translated dataset used for training). The core method involves taking an existing English RE dataset, machine-translating it into Romanian, and then using that synthetic corpus to train or prompt an LLM. The evaluation then measures how well the model extracts relations from authentic Romanian text.

Why It Matters

This work is significant for two reasons. First, it directly addresses the "annotation bottleneck." Manually labeling relation triples (e.g., Person-WorksAt-Organization) is expensive and requires expert linguists. For the vast majority of the world’s 7,000+ languages, such datasets simply do not exist. If automatic translation can serve as a viable proxy, it dramatically lowers the barrier to entry for building NLP systems in underserved languages.

Second, the study provides a practical stress test for LLM generalization. Romanian is a Romance language with complex morphology (cases, gendered nouns, verb conjugations) that differs structurally from English. A model that performs well on translated Romanian data demonstrates genuine cross-lingual understanding, not just pattern matching on surface forms. The results will inform whether practitioners can trust LLMs to "transfer" knowledge across language families without extensive native fine-tuning.

Implications for AI Practitioners

For engineers building multilingual applications, this research offers a clear cost-benefit analysis. The zero-shot and few-shot results will indicate whether a simple prompt with a few translated examples suffices for production use, or if full fine-tuning on a translated corpus is necessary. If few-shot performance approaches fine-tuned performance, it suggests that practitioners can skip the expensive step of creating large translated datasets and instead rely on prompt engineering.

However, there is a critical caveat: translation quality. Automatic translation systems often introduce artifacts—unnatural phrasing, loss of idiomatic meaning, or incorrect entity boundaries. A model trained on "translationese" may fail on real-world Romanian text that contains regional dialects, code-switching, or domain-specific jargon. Practitioners must validate that the synthetic data does not create a brittle model that only works on machine-translated inputs.

Finally, this research reinforces the importance of evaluation metrics beyond simple F1 scores. A model might correctly extract relations from translated data but fail on native constructions (e.g., Romanian postpositional articles or clitic pronouns). Any deployment should include a small, human-annotated test set of authentic Romanian text to catch such failures.

Key Takeaways

Translation as a data augmentation strategy is a promising, cost-effective path for RE in low-resource languages, but its success depends heavily on the quality of the machine translation system used.
The gap between few-shot and fine-tuned performance will be the decisive factor for practitioners: a small gap favors prompt-based approaches, while a large gap justifies investment in full dataset translation.
Evaluation must use native, not translated, test data to avoid overestimating model capability on synthetic inputs.
Romanian’s morphological complexity makes it a strong benchmark; success here suggests the method may generalize to other moderately resourced Indo-European languages.

Read Original Article on Arxiv CS.AI

arxivpapersfine-tuning