Morphological parsing of Kazakh texts with deep learning approaches

M.E.  Mansurova; D.R.  Rakhimova

doi:10.26577/JMMCS2024-v124-i4-a4

Authors

M.E. Mansurova Al-Farabi Kazakh National University, Almaty, Kazakhstan https://orcid.org/0000-0002-9680-2758
D.R. Rakhimova Al-Farabi Kazakh National University, Almaty, Kazakhstan https://orcid.org/0000-0003-1427-198X

DOI:

https://doi.org/10.26577/JMMCS2024-v124-i4-a4

392 304

Keywords:

kazakh language, morphological analysis, RNN, Transformer

Abstract

Morphological analysis is a crucial task in Natural Language Processing (NLP) that greatly contributes to enhancing the performance of large language models (LLMs). Although NLP technologies have seen rapid advancements in recent years, the creation of efficient morphological analysis algorithms for morphologically complex languages, such as Kazakh, continues to be a significant challenge. This research focuses on designing a morphological analysis algorithm for the Kazakh language, specifically optimized for integration with LLMs. The study will address the following tasks: data corpus collection and processing, selection and adaptation of suitable algorithms, and model training and evaluation. This paper delivers a detailed exploration of using deep learning models for the morphological analysis of the Kazakh language, specifically highlighting Recurrent Neural Networks (RNN) and Transformer models. Because of Kazakh is an agglutinative language, where word formation is achieved by attaching multiple suffixes and preffixes, the task of morphological analysis poses 25 unique challenges for computational models. The performance of Recurrent Neural Networks (RNNs) is evaluated, including those with LSTM and GRU enhancements, in comparison with Transformer models, focusing on their capability to analyze the complex morphology of Kazakh. The results outline the benefits and limitations of each approach for processing agglutinative languages, indicating that RNNs are often more effective for Kazakh morphological analysis, whereas Transformer models may require additional fine-tuning to achieve optimal results with such languages.

Author Biography

M.E. Mansurova, Al-Farabi Kazakh National University, Almaty, Kazakhstan

Madina Mansurova (corresponding author)– Candidate of Physical and Mathematical Sciences, Professor of the Department of Artificial Intelligence and Big Data at the Information Technology Faculty of Al-Farabi Kazakh National University (Almaty, Kazakhstan, email: mansurova.madina@gmail.com)

Morphological parsing of Kazakh texts with deep learning approaches

Authors

DOI:

Keywords:

Abstract

Author Biography

M.E. Mansurova, Al-Farabi Kazakh National University, Almaty, Kazakhstan

Downloads

How to Cite

Issue

Section

Language

Information

Site links