Research Repository

quanteda: An R package for the quantitative analysis of textual data

Benoit, Kenneth and Watanabe, Kohei and Wang, Haiyan and Nulty, Paul and Obeng, Adam and Müller, Stefan and Matsuo, Akitaka (2018) 'quanteda: An R package for the quantitative analysis of textual data.' Journal of Open Source Software, 3 (30). ISSN 2475-9066

[img]
Preview
Text
10.21105.joss.00774.pdf - Published Version
Available under License Creative Commons Attribution.

Download (218kB) | Preview

Abstract

quanteda is an R package providing a comprehensive workflow and toolkit for natural language processing tasks such as corpus management, tokenization, analysis, and visualization. It has extensive functions for applying dictionary analysis, exploring texts using keywords-in-context, computing document and feature similarities, and discovering multi-word expressions through collocation scoring. Based entirely on sparse operations,it provides highly efficient methods for compiling document-feature matrices and for manipulating these or using them in further quantitative analysis. Using C++ and multi-threading extensively, quanteda is also considerably faster and more efficient than other R and Python packages in processing large textual data. The package is designed for R users needing to apply natural language processing to texts,from documents to final analysis. Its capabilities match or exceed those provided in many end-user software applications, many of which are expensive and not open source. The package is therefore of great benefit to researchers, students, and other analysts with fewer financial resources. While using quanteda requires R programming knowledge, its API is designed to enable powerful, efficient analysis with a minimum of steps. By emphasizing consistent design, furthermore, quanteda lowers the barriers to learning and using NLP and quantitative text analysis even for proficient R programmers.

Item Type: Article
Divisions: Faculty of Social Sciences > Government, Department of
Depositing User: Elements
Date Deposited: 04 Jul 2019 10:55
Last Modified: 04 Jul 2019 10:55
URI: http://repository.essex.ac.uk/id/eprint/24916

Actions (login required)

View Item View Item