Graphematic analysis of Kazakh language text

  • A. Sharipbay L.N. Gumilyov Eurasian National University
  • R. Niyazova L.N. Gumilyov Eurasian National University
  • R. Turebayeva L.N. Gumilyov Eurasian National University
  • B. Razakhova L.N. Gumilyov Eurasian National University
  • A. Zulkhazhav L.N. Gumilyov Eurasian National University
  • G. Yelibayeva L.N. Gumilyov Eurasian National University

Abstract

In this paper, a grammatical analysis of the text in the Kazakh language is considered, which is oneof the main stages of the automatic processing of texts. Grammatical analysis shows the location ofthe automatic analysis of the text. Various classes of grammar descriptors for describing grammarare described, such as main and alternative graphematic descriptors. What tasks are presentedare solved by the automatic analysis of the text. This work presents the grammatical descriptors,tasks of the grammatical analysis, provides an algorithm for the separation of the text on thesentences and describes the grammatical analyzer of the Kazakh language. Also described is thealgorithm for dividing text into sentences, where the key task of a grammatical analysis is thecorrect search for word and sentence borders. This article gives examples of auxiliary primitives,as well as some notes on abbreviations, abbreviations, enumerations, definitions, and fragments.The article also presents what tasks should be solved by grammatical analysis; descriptors relatedto macro syntactic analysis are considered. Examples of basic graphical descriptors are given. Andalso examples of macro syntactic descriptors are given. All algorithms described in this work wereimplemented in Python.

References

[1] Jackson, P., Mouliner, I. Natural Language Processing for Online Applications: Text Retrieval, Extraction and
Categorization: John Benjamins Publishing Co.– 2002. – 237 p.
[2] Автоматическая обработка текста. [Электр.ресурс]. – 2003. – URL: http://aot.ru/docs/graphan.html (дата обращения: 25.07.2019)
[3] Первушин А. Модуль графематического анализа в системе обработки русскоязычных текстов [Электр.ресурс]. – 2003. – URL: https://cyberleninka.ru/article/n/modul-grafematicheskogo-analiza-v-sisteme-obrabotki-russkoyazychnyhtekstov (дата обращения: 02.08.2019)
[4] Графема - это ... Виды и особенности графем [Электр.ресурс]. – 2018. – URL: https://fb.ru/article/432209/grafema—eto-vidyi-i-osobennosti-grafem (дата обращения: 25.07.2019)
[5] Шәрiпбай А.Ә., Гатиатуллин А.Р., Ергеш Б.Ж., Қажымұхан Д.А. Разработка единого метаязыка морфологии тюркских языков // Вестник КазНУ. Серия математика, механика, информатика. – Алматы. – 2018. – N. 4(100). – С.78–87.
[6] Yelibayeva G., Mukanova A., Sharipbay A., Zulkhazhav A., Yergesh B., Bekmanova G. Metalanguage and Knowledgebase for Kazakh Morphology // Lecture Notes in Computer Science. – 2019. No. 11619. – P. 717–730.
[7] Sharipbay A., Mukanova A., Yergesh B., Zhetkenbay L., Zulkhazhav A., Yelibayeva G. Ontology modeling of
morphological rules of the Kazakh and Turkish languages // Abstract of the VI international conference «Modern problems of applied mathematics and information technology - al-Khorezmiy 2018». – Tashkent, Uzbekistan. – 2018. – P. 51-52.
[8] Zhetkenbay L., Sharipbay A., Bekmanova G., Kamanur U. Ontological modeling of morphological rules for the adjectives in Kazakh and Turkish languages // Journal of Theoretical and Applied Information Technology. – 2016. – Vol. 91. No.2. – P. 257- 263.
[9] Bekmanova G., Sharipbay A., Altnbek G., Adalı E., Zhetkenbay L., Kamanur U., Zulkhazhav A. The uniform
morphological analyzer for the Kazakh and Turkish languages. // Proceedings of the Sixth International Conference on Analysis of Images, Social Networks and Texts (AIST 2017), Moscow, Russia, July 2017. –P. 20-30.
[10] Жеткенбай Л., Шарипбай А., Бекманова Г., Қажымұқан Д., Каманур У. Сравнение морфологических правил глагола казахского и турецкого языков. // Вестник. Алматы: Казахский национальный университет им. аль-Фараби. – 2018.4(100).–С. 42-51.
[11] Garside, R., Leech G. and Sampson G. (eds). The CLAWS Word-tagging System // The Computational Analysis of English: A Corpus-based Approach. – London: Longman. – 1987.
[12] Jurafsky D., James H. Speech and Language Processing. // An Introduction to Natural Language Processing,
Computational Linguistics, and Speech Recognition. – 2nd Edition. – Prentice-Hall. –2009. – 988 p.
[13] Nitin I., Fred J. D. Handbook of Natural Language Processing. – 2nd ed. – Chapman & Hall/CRC. – 2010.– 704 p.
[14] Dunaev A.A. Research system for analyzing natural language texts
https://www.iis.nsk.su/files/articles/sbor_kas_13_dunaev.pdf
[15] Berg K. Identifying graphematic units: vowel and consonant letters. // Writ. Lang. Lit. 15. – 2012. P.26–45.
10.1075/wll.15.1.02ber;
[16] Eisenberg P. Uber die Autonomie der graphematischen analyse. // in Probleme der Geschriebenen Sprache, eds Nerius D., Augst G., editors. Berlin: Akademie Verlag . – 1988. P. 25–35.
[17] Aronoff M. Morphological stems. what William of Ockham really said. Word Struct. 5. – 2012. P. 28–51.
10.3366/word.2012.0018
[18] Frost R., Katz L. The reading process is different for different orthographies. The orthographic depth hypothesis, in Orthography, Phonology, Morphology and Meaning, eds Frost R., Katz L., editors. Amsterdam/London: North Holland. – 1992, P.67–84.
[19] Saenger P. Space Between Words. The Origins of Silent Reading. Stanford, CA: Stanford University Press. – 1997.
Published
2019-10-28
How to Cite
SHARIPBAY, A. et al. Graphematic analysis of Kazakh language text. Journal of Mathematics, Mechanics and Computer Science, [S.l.], v. 103, n. 3, p. 90-102, oct. 2019. ISSN 2617-4871. Available at: <https://bm.kaznu.kz/index.php/kaznu/article/view/645>. Date accessed: 22 oct. 2020. doi: https://doi.org/10.26577/JMMCS-2019-3-28.
Keywords graphematic analyzer, graphematic descriptors, automatic text processing, grapheme, graphematic analysis