Galke, Lukas and Mai, Florian and Schelten, Alan and Brunsch, Dennis and Scherp, Ansgar (2017) Using Titles vs. Full-text as Source for Automated Semantic Document Annotation. In: Knowledge Capture Conference (K-CAP 2017), 2017-12-04 - 2017-12-06, Austin, TX, USA.
Galke, Lukas and Mai, Florian and Schelten, Alan and Brunsch, Dennis and Scherp, Ansgar (2017) Using Titles vs. Full-text as Source for Automated Semantic Document Annotation. In: Knowledge Capture Conference (K-CAP 2017), 2017-12-04 - 2017-12-06, Austin, TX, USA.
Galke, Lukas and Mai, Florian and Schelten, Alan and Brunsch, Dennis and Scherp, Ansgar (2017) Using Titles vs. Full-text as Source for Automated Semantic Document Annotation. In: Knowledge Capture Conference (K-CAP 2017), 2017-12-04 - 2017-12-06, Austin, TX, USA.
Abstract
We conduct the first systematic comparison of automated semantic annotation based on either the full-text or only on the title metadata of documents. Apart from the prominent text classification baselines kNN and SVM, we also compare recent techniques of Learning to Rank and neural networks and revisit the traditional methods logistic regression, Rocchio, and Naive Bayes. Across three of our four datasets, the performance of the classifications using only titles reaches over 90% of the quality compared to the performance when using the full-text.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Additional Information: | Notes: Accepted as SHORT PAPER by K-CAP 2017, 9 pages, 1 figure, 3 tables |
Uncontrolled Keywords: | cs.DL; cs.CL |
Divisions: | Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
Depositing User: | Unnamed user with email elements@essex.ac.uk |
Date Deposited: | 31 Jul 2019 11:54 |
Last Modified: | 18 Sep 2024 08:19 |
URI: | http://repository.essex.ac.uk/id/eprint/24366 |
Available files
Filename: 1705.05311v2.pdf