Evaluating Large Language Models for Diacritic Restoration in Romanian Texts: A Comparative Study (2025)
Authors Mihai Nadǎş, Laura Dioşan Abstract Automatic diacritic restoration is crucial for text processing in languages with rich diacritical marks, such as Romanian. This study evaluates the performance of several large language models (LLMs) in restoring diacritics in Romanian texts. Using a comprehensive corpus, we tested models including OpenAI’s GPT-3.5, GPT-4, GPT-4o, Google’s Gemini…