Github metadata for 78518 projects
The CiteSeerX 4217 dataset for metadata extraction Dataset + LyricWikia + MusicBrainz
Connecting the Dataset to LyricWikia (lyrics) and MusicBrainz (additional track information).
The nevelok LaTeX package
LaTeX package for automatic definite articles for Hungarian.
Toy datasets for semi-supervised learning
4 datasets
Porter stemming algorithm in Flex
The Porter stemming algorithm ( implemented using Flex (Fast Lexical Analyzer).