Identifying lexical compatibilities of words by vectors of specialized words

Authors

DOI:

https://doi.org/10.26577/JMMCS.2020.v107.i3.07

Keywords:

vectors of words, Skip-gram model, lexically compatibilities of words

Abstract

In court system secretary fills protocols. Filling protocols with mistakes can lead to misunderstanding between people. Hence it is important writing protocols properly. In current work to identify mistakes lexical compatibilities of words were computed. To do it Skip-gram model was applied. In Skip-gram model words are represented by vectors. Words with similar meaning and lexically compatible words should have approximately the same direction. Therefore to calculate lexical compatibility of two words cosine value of angle between corresponding two vectors was identified. Cosine value of highly lexically compatible words should be approximately equal to 1. Lexically incompatible words should approximately have value -1. To test their system authors used the text of article of the constitution of the Republic of Kazakhstan.  Particularly, words which are not related to meaning of article of the constitution were inserted, and the system had to identify that inserted words. The system for some words showed high accuracy, however some words showed low accuracy. By authors' opinion, it happened because even inserted words were not related in meaning, they could be lexically compatible with their neighbors. For example, word computer can be used in other contexts with word бұрынғы (old) of Kazakh language. This research is carried out within the framework of the Ministry of Education and Science of Republic of Kazakhstan grant project "Developing and implementing the innovative competency-based model of multilingual IT specialist in the course of national education system modernization".

References

[1] I.V. Bondareva, D.G. Lagerev. 2018, Issledovanie metodov vektornogo predstavlenija tekstovoj informacii dlja reshenija zadachi analiza tonal’nosti, Vserossijskaja nauchnaja konferencija "Informacionnye tehnologii intellektual’noj podderzhki prinjatija reshenij Ufa-Stavropol, Russia, 2018, 10-15 p.

[2] Gerhard Wohlgenannt, Ekaterina Chernyak, Dmitry Ilvovsky, 2016, Extracting Social Networks from Literary Text with Word Embedding, Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH), December 11-17 2016. pages 18–25.

[3] http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/. Accessed date: 10.06.2020.

[4] David Meyer, 2016, How exactly does word2vec work? July 31, 2016. Pages 1-18.

[5] https://hmkcode.com/ai/backpropagation-step-by-step/. Accessed date: 10.06.2020.

[6] https://www.kdnuggets.com/2018/04/implementing-deep-learning-methods-feature-engineering-text-data-skip-gram.html. Accessed date: 10.06.2020.

[7] Nawal Ould-Amer, Philippe Mulhem, Mathias Gery, Karam Abdulahhad, 2016, Word Embedding for Social Book Suggestion, Clef 2016 Conference, 09.05.2016, Volume 1609

[8] Ensaf Hussein Mohamed, Eyad Mohamed Shokry, 2020, QSST: A Quranic Semantic Search Tool based on word embedding, Journal of King Saud University –Computer and Information Sciences, 4 January 2020

[9] https://code.google.com/archive/p/word2vec/. Accessed date: 10.06.2020.


[10] https://sites.google.com/site/rmyeid/projects/polyglot. Accessed date: 10.06.2020.

Downloads

Published

2020-09-30