Morphological parsing of Kazakh texts with deep learning approaches
DOI:
https://doi.org/10.26577/JMMCS2024-v124-i4-a4Keywords:
kazakh language, morphological analysis, RNN, TransformerAbstract
Morphological analysis is a crucial task in Natural Language Processing (NLP) that greatly contributes to enhancing the performance of large language models (LLMs). Although NLP technologies have seen rapid advancements in recent years, the creation of efficient morphological analysis algorithms for morphologically complex languages, such as Kazakh, continues to be a significant challenge. This research focuses on designing a morphological analysis algorithm for the Kazakh language, specifically optimized for integration with LLMs. The study will address the following tasks: data corpus collection and processing, selection and adaptation of suitable algorithms, and model training and evaluation. This paper delivers a detailed exploration of using deep learning models for the morphological analysis of the Kazakh language, specifically highlighting Recurrent Neural Networks (RNN) and Transformer models. Because of Kazakh is an agglutinative language, where word formation is achieved by attaching multiple suffixes and preffixes, the task of morphological analysis poses 25 unique challenges for computational models. The performance of Recurrent Neural Networks (RNNs) is evaluated, including those with LSTM and GRU enhancements, in comparison with Transformer models, focusing on their capability to analyze the complex morphology of Kazakh. The results outline the benefits and limitations of each approach for processing agglutinative languages, indicating that RNNs are often more effective for Kazakh morphological analysis, whereas Transformer models may require additional fine-tuning to achieve optimal results with such languages.