TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models (2025)
arXiv.org Authors Mihai Nadǎş, Laura Dioşan, Andrei Piscoran, Andreea Tomescu Abstract Moral stories are a time-tested vehicle for transmitting values, yet modern NLP lacks a large, structured corpus that couples coherent narratives with explicit ethical lessons. We close this gap with TF1-EN-3M, the first open dataset of three million English-language fables generated exclusively by…