Central and East European
Society for Phenomenology

Repository | Series | Book | Chapter

226427

A comparison of pre-processing techniques for twitter sentiment analysis

Dimitrios EffrosynidisSymeon SymeonidisAvi Arampatzis

pp. 394-406

Abstract

Pre-processing is considered to be the first step in text classification, and choosing the right pre-processing techniques can improve classification effectiveness. We experimentally compare 15 commonly used pre-processing techniques on two Twitter datasets. We employ three different machine learning algorithms, namely, Linear SVC, Bernoulli Naïve Bayes, and Logistic Regression, and report the classification accuracy and the resulting number of features for each pre-processing technique. Finally, based on our results, we categorize these techniques based on their performance. We find that techniques like stemming, removing numbers, and replacing elongated words improve accuracy, while others like removing punctuation do not.

Publication details

Published in:

Kamps Jaap, Tsakonas Giannis, Manolopoulos Yannis, Iliadis Lazaros, Karydis Ioannis (2017) Research and advanced technology for digital libraries: 21st international conference on theory and practice of digital libraries, TPDL 2017, Thessaloniki, Greece, September 18-21, 2017. Dordrecht, Springer.

Pages: 394-406

DOI: 10.1007/978-3-319-67008-9_31

Full citation:

Effrosynidis Dimitrios, Symeonidis Symeon, Arampatzis Avi (2017) „A comparison of pre-processing techniques for twitter sentiment analysis“, In: J. Kamps, G. Tsakonas, Y. Manolopoulos, L. Iliadis & I. Karydis (eds.), Research and advanced technology for digital libraries, Dordrecht, Springer, 394–406.