Automatic classification of reviews based on machine learning
Keywords:
tone classification, machine learning, support vector machines, logistic regression, naive Bayesian classifierAbstract
Currently, there is strong interest in the problem of automatic analysis of reviews of Internet users on various issues. One of the main problems in the analysis of reviews is a tone classification of the texts. This article is about different approaches to the problem of tone classification in 3 classes using the machine learning methods on the example of three collections. The main objective that was set in this work is the comparison of different approaches to the text view within the frame of the vector model, several machine learning methods, and various combinations of statistical and linguistic features. To build the model of tone classification the follow set of statistical and linguistic features is identified: Building word vectors, accounting N -gramm, accounting emoticons, counting of exclamation and question marks, accounting parts of speech, replacing the long repetition of vowel to one vowel, accounting negations, accounting the review length. In this work we used the following machine learning methods: support vector machines, logistic regression and naive Bayesian classifier. The computing experiments were conducted with different variants of word vector models, N -grams and text description features. The experimental results allow us to make recommendations on the selection of the most effective features for tone classification.
References
[2] Feng, V. W., Hirst G. Detecting deceptive opinions with profile compatibility. // In: Proceedings of the 6th international joint conference on natural language processing. – 2013. – P. 338–346.
[3] Liu B. Sentiment Analysis and Opinion Mining. Morgan and Claypool Publ. – 2012.
[4] Kotelnikov Y.V. Combined method of automatic determination of the text tonality. // J. Software products and systems. – 2012. – Vol 3. – P. 189–195.
[5] Prabowo R., Thelwall M. Sentiment analysis: A combined approach. // Journal of Informetrics. – Vol. 3, issue 2. – 2009. – P. 143-157.
[6] Kevin P. Murphy. Machine Learning: A Probabilistic Perspective (Adaptive Computation and Machine Learning series). The MIT Press. – 2012.
[7] Jindal N., Liu B., Lim E. DFinding unusual review patterns using unexpected rules. // In: CIKM ’10, Proceedings of the 19th ACM international conference on information and knowledge management. – 2010. – P. 219–230.
[8] Montoyo A., Martinez-Barco P., Balahur A. (2012). Subjectivity and sentiment analysis: An overview of the current state of the area and envisaged developments. // J. Decision Support Systems. – Vol. 53, issue 4. – P. 675–679.
[9] Panicheva P., Cardiff J., Rosso P. Identifying subjective statements in news titles using a personal sense annotation framework. // Journal of the American Society for Information Science and Technology. – 2013. – Vol. 64, issue 7. – P. 1411–1422.
[10] Severyn A., Moschitti A., Uryupina O., Plank B., Filippova K. Opinion mining on YouTube. // In: Proceedings of the Conference ACL. – 2014.
[11] Uryupina O., Plank B., Severyn A., Rotondi A., Moschitti A. SenTube: A corpus for sentiment analysis on YouTube social media. // In: Proceedings of the International Conference on Language Resources and Evaluation LREC. – 2014.
[12] Basile V., Nissim M. Sentiment analysis on Italian tweets. // In: Proceedings of the 4th Workshop on computational approaches to subjectivity, sentiment and social media analysis. – 2013.