Automatic classification of reviews based on machine learning


  • K. Ch. Koybagarov Institute of Information and Computational Technologies of КS MES of the RK
  • M. Ye. Mansurova Al-Farabi Kazakh National University
tone classification, machine learning, support vector machines, logistic regression, naive Bayesian classifier


Currently, there is strong interest in the problem of automatic analysis of reviews of Internet users on various issues. One of the main problems in the analysis of reviews is a tone classification of the texts. This article is about different approaches to the problem of tone classification in 3 classes using the machine learning methods on the example of three collections. The main objective that was set in this work is the comparison of different approaches to the text view within the frame of the vector model, several machine learning methods, and various combinations of statistical and linguistic features. To build the model of tone classification the follow set of statistical and linguistic features is identified: Building word vectors, accounting N -gramm, accounting emoticons, counting of exclamation and question marks, accounting parts of speech, replacing the long repetition of vowel to one vowel, accounting negations, accounting the review length. In this work we used the following machine learning methods: support vector machines, logistic regression and naive Bayesian classifier. The computing experiments were conducted with different variants of word vector models, N -grams and text description features. The experimental results allow us to make recommendations on the selection of the most effective features for tone classification.


Koybagarov, K. C., & Mansurova, M. Y. (2017). Automatic classification of reviews based on machine learning. Journal of Mathematics, Mechanics and Computer Science, 91(3), 66–74.