{"id":1720,"date":"2019-10-01T13:18:00","date_gmt":"2019-10-01T10:18:00","guid":{"rendered":"https:\/\/www.cs.ubbcluj.ro\/~bufny\/?p=1720"},"modified":"2022-01-10T00:25:14","modified_gmt":"2022-01-09T22:25:14","slug":"a-dataset-for-evaluating-query-suggestion-algorithms-in-information-retrieval","status":"publish","type":"post","link":"https:\/\/www.cs.ubbcluj.ro\/~bufny\/a-dataset-for-evaluating-query-suggestion-algorithms-in-information-retrieval\/","title":{"rendered":"A Dataset for Evaluating Query Suggestion Algorithms in Information Retrieval"},"content":{"rendered":"<p>published in Proceedings of the 27<sup>th<\/sup> International Conference on Software, Telecommunications and Computer Networks (SoftCOM), pp. 1-6, DOI: <a href=\"https:\/\/doi.org\/10.23919\/SOFTCOM.2019.8903906\" target=\"_blank\" rel=\"noopener noreferrer\">10.23919\/SOFTCOM.2019.8903906<\/a>, September 19-21, 2019, Split, Croatia.<\/p>\n<p><strong>Cite as<\/strong><\/p>\n<pre class=\"nums:false wrap:on highlight:false\">I. B\u0103d\u0103r\u00eenz\u0103, A. Sterca and D. Bufnea, \"A Dataset for Evaluating Query Suggestion Algorithms in Information Retrieval\", 2019 International Conference on Software, Telecommunications and Computer Networks (SoftCOM), 2019, pp. 1-6, doi: 10.23919\/SOFTCOM.2019.8903906<\/pre>\n<p><strong>Full paper<\/strong><\/p>\n<p><img decoding=\"async\" style=\"border: none; vertical-align: text-bottom;\" src=\"https:\/\/www.cs.ubbcluj.ro\/~bufny\/wp-content\/uploads\/pdf.png\" alt=\"\" \/> <a href=\"https:\/\/www.cs.ubbcluj.ro\/~bufny\/wp-content\/uploads\/A-Dataset-for-Evaluating-Query-Suggestion-Algorithms-in-Information-Retrieval.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">A Dataset for Evaluating Query Suggestion Algorithms in Information Retrieval<\/a><\/p>\n<p><strong>Authors<\/strong><\/p>\n<p>Ioan B\u0103d\u0103r\u00eenz\u0103, Adrian Sterca, Darius Bufnea<br \/>\nDepartment of Computer Science, Faculty of Mathematics and Computer Science, Babe\u0219-Bolyai University of Cluj-Napoca, Romania<\/p>\n<p><strong>Copyright<\/strong><\/p>\n<p>\u00a9 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting\/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.<\/p>\n<p><strong>Abstract<\/strong><\/p>\n<p>This paper presents a dataset that can be used for evaluating query suggestion algorithms in textual information retrieval. The dataset is public and offered free of charge to the information retrieval research community. The data was gathered in an experiment that lasted more than 2 months and to which participated a number of 119 users, mainly faculty students. The dataset contains web browsing history and query history (submitted to the Google search engine) from all these users. The data is indexed in a database and downloadable in a database dump format. The dataset is very useful for evaluating general query suggestion algorithms by themselves (in a standalone manner) or against Google&#8217;s MPC query suggestion algorithm. At the same time, the dataset supports building and testing personalized query suggestion algorithms that consider the user context\/profile when computing query suggestions.<\/p>\n<p><strong>Key words<\/strong><\/p>\n<p>dataset; query suggestion; information retrieval; search engine.<\/p>\n<p><strong>BibTeX bib file<\/strong><\/p>\n<p><img decoding=\"async\" style=\"border: none; vertical-align: text-bottom;\" src=\"https:\/\/www.cs.ubbcluj.ro\/~bufny\/wp-content\/uploads\/bib.png\" alt=\"\" \/> <a href=\"https:\/\/www.cs.ubbcluj.ro\/~bufny\/wp-content\/uploads\/bada2019.bib\" target=\"_blank\" rel=\"noopener noreferrer\">bada2019.bib<\/a><\/p>\n<pre class=\"lang:tex url:wp-content\/uploads\/bada2019.bib nums:false\"><\/pre>\n<p><strong>References<\/strong><\/p>\n<ol>\n<li>J.-R. Wen, J.-Y. Nie, H.-J. Zhang, <em>Clustering user queries of a search engine<\/em>, Proceedings of the 10<sup>th<\/sup> International Conference on World Wide Web ser. WWW &#8217;01, pp. 162-168, 2001.<\/li>\n<li>B. J. Jansen, A. Spink, T. Saracevic, <em>Real life real users and real needs: A study and analysis of user queries on the web<\/em>, Inf. Process. Manage., vol. 36, no. 2, pp. 207-227, Jan. 2000.<\/li>\n<li>H. Cui, J.-R. Wen, J.-Y. Nie, W.-Y. Ma, <em>Probabilistic query expansion using query logs<\/em>, Proceedings of the 11<sup>th<\/sup> International Conference on World Wide Web ser. WWW &#8217;02, pp. 325-332, 2002.<\/li>\n<li>M. Sanderson, <em>Ambiguous queries: Test collections need more sense<\/em>, Proceedings of the 31<sup>st<\/sup> Annual International ACM SIGIR Conference on Research and Development in Information Retrieval ser. SIGIR &#8217;08, pp. 499-506, 2008.<\/li>\n<li>L. Li, H. Deng, A. Dong, Y. Chang, H. Zha, R. Baeza-Yates, <em>Analyzing user&#8217;s sequential behavior in query auto-completion via markov processes<\/em>, Proceedings of the 38<sup>th<\/sup> International ACM SIGIR Conference on Research and Development in Information Retrieval ser. SIGIR &#8217;15, pp. 123-132, 2015.<\/li>\n<li>Z. Bar-Yossef, N. Kraus, <em>Context-sensitive query auto-completion<\/em>, Proceedings of the 20<sup>th<\/sup> International Conference on World Wide Web ser. WWW &#8217;11, pp. 107-116, 2011.<\/li>\n<li>C. Manning, P. Raghavan, and H. Schutze, <em>An introduction to Information Retrieval<\/em>, Cambridge University Press, 2009.<\/li>\n<li>D. D. Lewis, Y. Yang, T. G. Rose, F. Li, <em>Rcv1: A new benchmark collection for text categorization research<\/em>, J. Mach. Learn. Res., vol. 5, pp. 361-397, Dec. 2004.<\/li>\n<li>G. Pass, A. Chowdhury, C. Torgeson, <em>A picture of search<\/em>, Proceedings of the 1<sup>st<\/sup> International Conference on Scalable Information Systems ser. InfoScale &#8217;06, 2006.<\/li>\n<li>Text retrieval conference (trec) data, May 2019, [online] Available: <a href=\"https:\/\/trec.nist.gov\/data.html\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">https:\/\/trec.nist.gov\/data.html<\/a>.<\/li>\n<li>The clueweb09 dataset, May 2019, [online] Available: <a href=\"http:\/\/lemurproject.org\/clueweb09\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">http:\/\/lemurproject.org\/clueweb09\/<\/a>.<\/li>\n<li>The clueweb12 dataset, May 2019, [online] Available: <a href=\"http:\/\/lemurproject.org\/clueweb12\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">http:\/\/lemurproject.org\/clueweb12\/<\/a>.<\/li>\n<li>Kaggle: Your home for data science, May 2019, [online] Available: <a href=\"https:\/\/www.kaggle.com\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">https:\/\/www.kaggle.com\/<\/a>.<\/li>\n<li>Statcounter global stats: Browser market share worldwide, May 2019, [online] Available: <a href=\"http:\/\/gs.statcounter.com\/browser-market-share\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">http:\/\/gs.statcounter.com\/browser-market-share<\/a>.<\/li>\n<li>W3counter: Web browser market share trends, May 2019, [online] Available: <a href=\"https:\/\/www.w3counter.com\/trends\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">https:\/\/www.w3counter.com\/trends<\/a>.<\/li>\n<li>Statista: Search engine market share world-wide, May 2019, [online] Available: <a href=\"https:\/\/www.statista.com\/statistics\/216573\/worldwide-market-share-of-search-engines\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">https:\/\/www.statista.com\/statistics\/216573\/worldwide-market-share-of-search-engines<\/a>.<\/li>\n<li>Statcounter global stats: Desktop search engine market share worldwide, May 2019, [online] Available: <a href=\"http:\/\/gs.statcounter.com\/search-engine-market-share\/desktop\/worldwide\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">http:\/\/gs.statcounter.com\/search-engine-market-share\/desktop\/worldwide<\/a>.<\/li>\n<li>Search engine market share, May 2019, [online] Available: <a href=\"https:\/\/netmarketshare.com\/search-engine-market-share.aspx\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">https:\/\/netmarketshare.com\/search-engine-market-share.aspx<\/a>.<\/li>\n<li>V. Niculescu, D. Bufnea, A. Sterca, <em>MPI scaling up for powerlist based parallel programs<\/em>, Proceedings of 27<sup>th<\/sup> Euromicro International Conference on Parallel Distributed and Network-Based Processing (PDP 2019), pp. 199-204, February 13\u201315, 2019, 2019.<\/li>\n<li>Wordart, May 2019, [online] Available: <a href=\"https:\/\/wordart.com\/6i76j7rcw0bd\/word-art\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">https:\/\/wordart.com\/6i76j7rcw0bd\/word-art<\/a>.<\/li>\n<li>J.-Y. Jiang, Y.-Y. Ke, P.-Y. Chien, P.-J. Cheng, <em>Learning user reformulation behavior for query auto-completion<\/em>, Proceedings of the 37<sup>th<\/sup> International ACM SIGIR Conference on Research &amp; Development in Information Retrieval ser. SIGIR &#8217;14, pp. 445-454, 2014.<\/li>\n<li>I. B\u0103d\u0103r\u00eenz\u0103, A. Sterca, F. M. Boian, <em>Using the user&#8217;s recent browsing history for personalized query suggestions<\/em>, 2018 26<sup>th<\/sup> International Conference on Software Telecommunications and Computer Networks (SoftCOM), pp. 1-6, Sep. 2018.<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>published in Proceedings of the 27th International Conference on Software, Telecommunications and Computer Networks (SoftCOM), pp. 1-6, DOI: 10.23919\/SOFTCOM.2019.8903906, September 19-21, 2019, Split, Croatia. Cite as I. B\u0103d\u0103r\u00eenz\u0103, A. Sterca and D. Bufnea, &#8220;A Dataset for Evaluating Query Suggestion Algorithms&hellip; <a href=\"https:\/\/www.cs.ubbcluj.ro\/~bufny\/a-dataset-for-evaluating-query-suggestion-algorithms-in-information-retrieval\/\" class=\"more-link\">Continue Reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[110],"tags":[],"_links":{"self":[{"href":"https:\/\/www.cs.ubbcluj.ro\/~bufny\/wp-json\/wp\/v2\/posts\/1720"}],"collection":[{"href":"https:\/\/www.cs.ubbcluj.ro\/~bufny\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.cs.ubbcluj.ro\/~bufny\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.cs.ubbcluj.ro\/~bufny\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.cs.ubbcluj.ro\/~bufny\/wp-json\/wp\/v2\/comments?post=1720"}],"version-history":[{"count":9,"href":"https:\/\/www.cs.ubbcluj.ro\/~bufny\/wp-json\/wp\/v2\/posts\/1720\/revisions"}],"predecessor-version":[{"id":2223,"href":"https:\/\/www.cs.ubbcluj.ro\/~bufny\/wp-json\/wp\/v2\/posts\/1720\/revisions\/2223"}],"wp:attachment":[{"href":"https:\/\/www.cs.ubbcluj.ro\/~bufny\/wp-json\/wp\/v2\/media?parent=1720"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.cs.ubbcluj.ro\/~bufny\/wp-json\/wp\/v2\/categories?post=1720"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.cs.ubbcluj.ro\/~bufny\/wp-json\/wp\/v2\/tags?post=1720"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}