Creating a model of semantic analysis of extremist texts in the Kazakh language
DOI:
https://doi.org/10.26577/JMMCS2024121111Keywords:
internet extremism, machine learning, deep learning, social networks, neural networksAbstract
Presently, there is a significant emphasis on the utilization of semantic analysis to scrutinize texts and viewpoints expressed in the Kazakh language within the realm of social networks, with the primary objective of identifying content of a suspicious or extremist nature. This research article is dedicated to exploring the application of machine learning and deep learning techniques in the realm of extremist content detection within textual data.
The investigation takes into account several critical factors, including oversampling and undersampling during the feature processing phase, the nuanced differentiation between extremist and neutral subjects, and the handling of imbalanced classification challenges. These considerations culminate in the development of a sophisticated deep learning model for text classification. The study encompasses the deployment of various machine learning models to discern extremist content within textual materials. Additionally, a comprehensive comparative analysis of machine learning methodologies is conducted to ascertain the most effective approach for this task, taking into consideration oversampling and undersampling techniques for addressing data imbalance issues.
The research endeavors are delineated into two core subtasks: the formulation of a machine learning model specialized in the detection of extremist content within text, and the construction of a deep learning model that factors in the unique characteristics of the Kazakh language and the available dataset.
Furthermore, the study delves into the intricacies of feature processing, culminating in a comparative assessment of outcomes derived from a range of machine learning algorithms used to classify religious extremism, each leveraging distinct feature combinations. The methodologies explored encompass decision trees, random forests, support vector machines, k-nearest neighbors, logistic regression, and naive Bayes.
This research significantly contributes to the spheres of text mining, artificial intelligence, and machine learning, offering practical recommendations for the processing and categorization of texts linked to religious extremism. Moreover, it underscores the contemporary significance of conducting semantic analyses on extremist texts written in the Kazakh language.
References
Bolatbek M.A., Mussiraliyeva Sh.Zh. Identification of extremist texts using machine learning methods // Bulletin of KazUTZU. – 2018. No. 6 (130). - P. 300-304.
Yntykbai B.N., Mussiraliyeva Sh.Zh., Bolatbek M.A. Analysis of security and confidentiality in social networks using machine learning methods // Materials of the International Scientific Conference of Students and Young Students "Farabi World". - Almaty: Kazakh University, 2021. - P.119.
Chesnokov V.O. The application of the algorithm of selection of communities in information warfare in social networks // Questions of cyber security. – 2017. – No. 1 (19). - C. 37-44.
Ripeanu, K. Beznosov, and E. Santos-Neto. Thwarting fake OSN accounts by predicting their victims // Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security. - 2015. - P.81-89.
Basu A. Social network analysis: A methodology for studying terrorism // Social Networking, ser. Intelligent Systems Reference Library. - 2014. - Vol. 65. P. 215–242.
Freeman, M. The Sources of Terrorist Financing: Theory and Typology // Studies in Conflict & Terrorism -2011. - No. 34. P. 461-475. doi:10.1080/1057610X.2011.571193.
Ahmad S., Asghar M.Z., Alotaibi F.M., Awan I. Detection and classification of social media-based extremist affiliations using sentiment analysis techniques // Human- centric Computing and Information Sciences. – 2019. –Vol.9, №24. – Р. 1 – 23. Q1
Mayur G., Swati A., Ketan K., Ajith A. Multi-ideology Multi-class Extremism Classification using Deep Learning Techniques. // IEEE Access. –2022. Q1
M. Asif, A. Ishtiaq, H. Ahmad, H. Aljuaid, and J. Shah. Sentiment analysis of extremism in social media from textual information. // Telematics Informat. – 2020. vol. 48, Art. no. 101345,. Q1
J. Klausen, C. E. Marks, and T. Zaman. Finding extremists in online social networks. // European Journal of Operational Research. – 2018. vol. 66, no. 4, pp. 957–976. Q1
Taha K., Yoo PD. Shortlisting the influential members of criminal organizations and identifying their important communication channels // IEEE Transactions on Information Forensics and Security. - 2019. - Vol. 14. No. 8. P. 1988-1999.
Devyatkin D.A., Smirnov I.V., Ananyeva M.I., Kobozeva M.V., Chepovskiy A.M., Solovyev F.N. Exploring linguistic features for extremist texts detection (on the material of Russian-speaking illegal texts) // 2017 IEEE International Conference on Intelligence and Security Informatics (ISI). - 2017. - P.188-190.
Bissaliyev M.S., Nyussupov A.T., Mussiraliyeva Sh.Zh. Enterprise Security Assessment Framework for Cryptocurrency Mining Based on Monero // Vestnik KazNU Series "Mathematics, Mechanics, Informatics". - 2018. - No. 2(98). - P. 67-76.
Nouh M., Nurse J. Identifying Key Players in Online Activist Groups on Facebook Social Network // IEEE Computer Society. - 2015. - P. 969-978.