A Hybrid Approach for Scholarly Information Extraction

  • Zalan Bodo Department of Computer Science, Faculty of Mathematics and Computer Science, Babeș-Bolyai University, Cluj-Napoca, Romania
  • Lehel Csato Department of Computer Science, Faculty of Mathematics and Computer Science, Babeș-Bolyai University, Cluj-Napoca, Romania


Metadata extraction from documents forms an essential part of web or desktop search systems.
Similarly, digital libraries that index scholarly literature require to find and extract the title,  the list of authors and other publication-related information from an article.
We present a hybrid approach for metadata extraction, combining classification and clustering to extract the desired information without the need of a conventional labeled dataset for training.
An important asset of the proposed method is that the resulting clustering parameters can be used in other problems, e.g. document layout analysis.


How to Cite
BODO, Zalan; CSATO, Lehel. A Hybrid Approach for Scholarly Information Extraction. Studia Universitatis Babeș-Bolyai Informatica, [S.l.], v. 62, n. 2, p. 5-16, dec. 2017. ISSN 2065-9601. Available at: <http://www.cs.ubbcluj.ro/~studia-i/journal/journal/article/view/10>. Date accessed: 29 nov. 2020. doi: https://doi.org/10.24193/subbi.2017.2.01.