IDENTIFYING LEXICAL COMPATIBILITIES OF WORDS BY VECTORS OF SPECIALIZED WORDS

Abstract

In court system secretary fills protocols. Filling protocols with mistakes can lead to misunderstanding between people. Hence it is important writing protocols properly. In current work to identify mistakes lexical compatibilities of words were computed. To do it Skip-gram model was applied. In Skip-gram model words are represented by vectors. Words with similar meaning and lexically compatible words should have approximately the same direction. Therefore to calculate lexical compatibility of two words cosine value of angle between corresponding two vectors was identified. Cosine value of highly lexically compatible words should be approximately equal to 1. Lexically incompatible words should approximately have value -1. To test their system authors used the text of article of the constitution of the Republic of Kazakhstan.  Particularly, words which are not related to meaning of article of the constitution were inserted, and the system had to identify that inserted words. The system for some words showed high accuracy, however some words showed low accuracy. By authors' opinion, it happened because even inserted words were not related in meaning, they could be lexically compatible with their neighbors. For example, word computer can be used in other contexts with word бұрынғы (old) of Kazakh language. This research is carried out within the framework of the Ministry of Education and Science of Republic of Kazakhstan grant project "Developing and implementing the innovative competency-based model of multilingual IT specialist in the course of national education system modernization".
Published
2020-09-30
How to Cite
BAIMURATOV, O. A.; AYAZBAYEV, D. A.. IDENTIFYING LEXICAL COMPATIBILITIES OF WORDS BY VECTORS OF SPECIALIZED WORDS. Journal of Mathematics, Mechanics and Computer Science, [S.l.], v. 107, n. 3, p. 67-73, sep. 2020. ISSN 2617-4871. Available at: <https://bm.kaznu.kz/index.php/kaznu/article/view/785>. Date accessed: 22 oct. 2020.