Graphematic analysis of Kazakh language text
DOI:
https://doi.org/10.26577/JMMCS-2019-3-28Keywords:
graphematic analyzer, graphematic descriptors, automatic text processing, grapheme, graphematic analysisAbstract
In this paper, a grammatical analysis of the text in the Kazakh language is considered, which is one
of the main stages of the automatic processing of texts. Grammatical analysis shows the location of
the automatic analysis of the text. Various classes of grammar descriptors for describing grammar
are described, such as main and alternative graphematic descriptors. What tasks are presented
are solved by the automatic analysis of the text. This work presents the grammatical descriptors,
tasks of the grammatical analysis, provides an algorithm for the separation of the text on the
sentences and describes the grammatical analyzer of the Kazakh language. Also described is the
algorithm for dividing text into sentences, where the key task of a grammatical analysis is the
correct search for word and sentence borders. This article gives examples of auxiliary primitives,
as well as some notes on abbreviations, abbreviations, enumerations, definitions, and fragments.
The article also presents what tasks should be solved by grammatical analysis; descriptors related
to macro syntactic analysis are considered. Examples of basic graphical descriptors are given. And
also examples of macro syntactic descriptors are given. All algorithms described in this work were
implemented in Python.
