Synthetic Data Generation Using Large Language Models: Advances in Text and Code (2025)
IEEE Access Authors Mihai Nadǎş, Laura Dioşan, Andreea Tomescu Abstract This survey reviews how large language models (LLMs) are transforming synthetic training data generation in both natural language and code domains. By producing artificial but task-relevant examples, these models can significantly augment or even substitute for real-world datasets, particularly in scenarios where labeled data…