Benoit, Kenneth and Watanabe, Kohei and Wang, Haiyan and Nulty, Paul and Obeng, Adam and Müller, Stefan and Matsuo, Akitaka (2018) quanteda: An R package for the quantitative analysis of textual data. Journal of Open Source Software, 3 (30). p. 774. DOI https://doi.org/10.21105/joss.00774
Benoit, Kenneth and Watanabe, Kohei and Wang, Haiyan and Nulty, Paul and Obeng, Adam and Müller, Stefan and Matsuo, Akitaka (2018) quanteda: An R package for the quantitative analysis of textual data. Journal of Open Source Software, 3 (30). p. 774. DOI https://doi.org/10.21105/joss.00774
Benoit, Kenneth and Watanabe, Kohei and Wang, Haiyan and Nulty, Paul and Obeng, Adam and Müller, Stefan and Matsuo, Akitaka (2018) quanteda: An R package for the quantitative analysis of textual data. Journal of Open Source Software, 3 (30). p. 774. DOI https://doi.org/10.21105/joss.00774
Abstract
quanteda is an R package providing a comprehensive workflow and toolkit for natural language processing tasks such as corpus management, tokenization, analysis, and visualization. It has extensive functions for applying dictionary analysis, exploring texts using keywords-in-context, computing document and feature similarities, and discovering multi-word expressions through collocation scoring. Based entirely on sparse operations,it provides highly efficient methods for compiling document-feature matrices and for manipulating these or using them in further quantitative analysis. Using C++ and multi-threading extensively, quanteda is also considerably faster and more efficient than other R and Python packages in processing large textual data. The package is designed for R users needing to apply natural language processing to texts,from documents to final analysis. Its capabilities match or exceed those provided in many end-user software applications, many of which are expensive and not open source. The package is therefore of great benefit to researchers, students, and other analysts with fewer financial resources. While using quanteda requires R programming knowledge, its API is designed to enable powerful, efficient analysis with a minimum of steps. By emphasizing consistent design, furthermore, quanteda lowers the barriers to learning and using NLP and quantitative text analysis even for proficient R programmers.
Item Type: | Article |
---|---|
Divisions: | Faculty of Social Sciences Faculty of Social Sciences > Government, Department of |
SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
Depositing User: | Unnamed user with email elements@essex.ac.uk |
Date Deposited: | 04 Jul 2019 10:55 |
Last Modified: | 06 Jan 2022 14:01 |
URI: | http://repository.essex.ac.uk/id/eprint/24916 |
Available files
Filename: 10.21105.joss.00774.pdf
Licence: Creative Commons: Attribution 3.0