Распознавание именованных объектов для казахского языка

Z. M. Kozhirbayev; Z. A. Yessenbayev

doi:10.26577/JMMCS.2020.v107.i3.06

Authors

Z. M. Kozhirbayev Private Institution "National Laboratory Astana", Nur-Sultan, Kazakhstan http://orcid.org/0000-0003-4235-9049
Z. A. Yessenbayev Private Institution "National Laboratory Astana", Nur-Sultan, Kazakhstan http://orcid.org/0000-0002-6322-3848

DOI:

https://doi.org/10.26577/JMMCS.2020.v107.i3.06

Keywords:

named entity recognition, conditional random field, long-term short-term memory, word embeddings

Abstract

Named Entity Recognition (NER) is considered one of the important tasks of natural language processing (NLP). This is a way of recognizing real world objects, such as geographical location, person's name, organization, etc., that are found in a sentence. There are several approaches based on manually created grammar rules and statistical models, such as machine learning and hybrid methods, to solve the problem of recognizing named entities. The aim of this work is to experiment with methods based on statistical approach and machine learning, and to check how they deal with agglutinative Kazakh language. This paper presents the recognition of named objects based on a machine learning approach called conditional random field (CRF) as a statistical method. We also use a hybrid approach combining a bidirectional neural network model with long-term short-term memory (LSTM) and a CRF model. This is a modern approach to the recognition of named objects. The cross-validated randomized search model shows an f1 score of 0.95. The hybrid LSTM-CRF model shows an f1 score of 0.88. The results look pretty good and it doesn't require any design specifics compared to the CRF model.
For the experiments, a corpus (kazNER) was created for the NER task with such marks as a person's name, location, organization and others. The corpus consists of 29,629 sentences that contain at least one proper noun containing only part of speech tags.

Named entity recognition for the Kazakh language

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

How to Cite

Language

Information

Links