Software

Github metadata for 78518 projects
The CiteSeerX 4217 dataset for metadata extraction
Last.fm Dataset + LyricWikia + MusicBrainz
Connecting the Last.fm Dataset to LyricWikia (lyrics) and MusicBrainz (additional track information).
The nevelok LaTeX package
LaTeX package for automatic definite articles for Hungarian.
Toy datasets for semi-supervised learning
4 datasets
Porter stemming algorithm in Flex
The Porter stemming algorithm (http://tartarus.org/martin/PorterStemmer/) implemented using Flex (Fast Lexical Analyzer).