Development of an unified meta language of the turkic languages morphology

  • А. Sharipbay Scientific-Research Institute "Artificial intelligence" , L. N. Gumilyov Eurasian National University
  • A. Gatiatullin Institute of Applied Semiotics of the Academy of Sciences Republic of Tatarstan
  • B. Yergesh Scientific-Research Institute "Artificial intelligence" , L. N. Gumilyov Eurasian National University
  • D. Kazhymukhan Scientific-Research Institute "Artificial intelligence", L. N. Gumilyov Eurasian National University

Abstract

Currently, due to the sharp increase of information amount in natural languages on the Internet and social networks, research and development in the field of computational linguistics is becoming extremely relevant. As is known, computational linguistics is a new scientific field and part of computer science.Computational linguistics includes the Natural Language Proccesing (NLP). Creating a unified metalanguage for Turkic languages (UniTurk) is an important task for processing Turkic languages. An unified metalanguage system will allow to unify tags, facilitate their understanding and use common software, as well as conduct various studies on linguistic-statistical comparative analysis among the Turkic languages.The article presents some of the results obtained on an international project to create a multilingual ontology and unified metalanguage of the Turkic languages morphology. Using ontological models, the morphological rules of the Turkic (Kazakh, Kyrgyz, Tatar, Turkish, and Uzbek) languages are formalized. The result of these works can be used in the NLP applications, for example, for corpus tagging, in knowledge extraction systems, information retrieval systems, machine translation, etc.

Author Biographies

А. Sharipbay, Scientific-Research Institute "Artificial intelligence" , L. N. Gumilyov Eurasian National University
Senior lecturer of the Department of Informatics and Information Security, researcher of the Scientific-Research Institute «Artificial intelligence»
A. Gatiatullin, Institute of Applied Semiotics of the Academy of Sciences Republic of Tatarstan
Doctor of Technical Sciences, professor of the Department of Informatics and Information Security, director of the Scientific-Research Institute «Artificial intelligence»
B. Yergesh, Scientific-Research Institute "Artificial intelligence" , L. N. Gumilyov Eurasian National University
Candidate of Technical Sciences, head of department of intellectual information systems of Institute of application-oriented semiotics of AS of RT
D. Kazhymukhan, Scientific-Research Institute "Artificial intelligence", L. N. Gumilyov Eurasian National University
master student of the specialty  5M060100-Computer science

References

[1] "WordNet. A large lexical database of English accessed September 10, 2018, https://wordnet.princeton.edu.
[2] "Resolution of the scientific-practical seminar "Unification of the systems of grammatical annotation in the Turkic languages corpora (UniTurk seminar) accessed October 10, 2018, http://www.turklang.tatar/ru/резолюция/.
[3] Melchuk I.A. Kurs obschey morfologii [General morphology course]. Moskva - Vena: «Progress», Vol 1, 1997.
[4] Zalevskaya A.A. Vvedenie v psiholingvistiku [Introduction to Psycholinguistics]. M.: Rossiysk. gos. gumanit. un-t, 2000.
[5] Marcus, Mitchell P. Beatrice Santorini, and Mary Ann Marcinkiewicz. "Building a large annotated corpus of English: the Penn Treebank."Computational Linguistics.19(2)(1993):313–330.
[6] “British National Corpus”, accessed September 10, 2018, http://www.natcorp.ox.ac.uk.
[7] Garside, R. "The CLAWS Word-tagging System."In: R. Garside, G. Leech and G. Sampson (eds), The Computational Analysis of English: A Corpus-based Approach. London: Longman, 1987.
[8] “The Open American National Corpus”, accessed October 10, 2018, http://www.anc.org.
[9] Ide, N. "The American National Corpus: Then, Now, and Tomorrow."Selected Proceedings of the 2008 HCSNet Workshop on Designing the Australian National Corpus: Mustering Languages. Sommerville, MA, 2008.
[10] Jurafsky, Daniel and James H. Martin.Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. 2nd Edition. Prentice-Hall, 2009.
[11] Nitin Indurkhya and Fred J. Damerau. Handbook of Natural Language Processing. 2nd Edition. Chapman and Hall/CRC, 2010.
[12] "Kazakh Language Corpus accessed September 10, 2018, http://kazcorpus.kz/.
[13] Makhambetov O., Makazhanov A., Yessenbayev Zh., Matkarimov B., Sabyrgaliyev I., and Sharafudinov A. "Assembling the Kazakh Language Corpus"In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (2013): 1022–1031.
[14] Madieva G.B., Umatova Zh.M. "Ob Almatinskom korpuse kazahskogo yazyika[About Almaty Corpus of the Kazakh language]"KazNU Bulletin. Philology series. no.5(157) (2015): 99-103.
[15] "Tugan Tel"Tatar National Corpus accessed October 10, 2018, http://tugantel.tatar.
[16] Galieva A., Khakimov B., Gatiatullin A. "On the Way to the Relevant Grammatical Tagset for the Tatar National Corpus "EPiC Series in Language and Linguistics .CILC2016. 8th International Conference on Corpus Linguistics, Volume 1 (2016): 121–129.
[17] “Turkish National Corpus (TNC)”, accessed October 10, 2018, http://www.tnc.org.tr/.
[18] Kubedinova L., Gatiatullin A. "Morphological tagging of crimean tatar electronic corpus."Proceedings of the international conference "Turkic languages processing"TurkLang-2015(2015):331-337.
[19] Zheltov P. "Morphological annotation system for the national corpus of the chuvash language."Proceedings of the international conference "Turkic languages processing"TurkLang-2015 (2015):328-331.
[20] Sharipbay A., Mukanova A., Yergesh B., Zhetkenbay L., Zulkhazhav A., Yelibayeva G. "Ontology modeling of morphological rules of the kazakh and turkish languages."Abstract of the VI international conference "Modern problems of applied mathematics and information technology - al-Khorezmiy 2018"(2018): 51-52.
[21] Zhetkenbay L., Sharipbay A., Bekmanova G., Kamanur U. "Ontological modeling of morphological rules for the adjectives in Kazakh and Turkish languages."Journal of Theoretical and Applied Information Technology , Vol. 91. No.2 (2016): 257- 263.
[22] Aripov M., Sharipbay A., Abdurakhmonova N., Razakhova B. "Ontology of grammar rules as example of noun of Uzbek and Kazakh languages."Abstract of the VI international conference "Modern problems of applied mathematics and information technology - al-Khorezmiy 2018"(2018): 37-38.
[23] Fellbaum, Christiane. "WordNet and wordnets."Encyclopedia of Language and Linguistics, ed. Brown, Keith et al., 2nd Edition, Oxford: Elsevier (2007): 665-670.
[24] "Protege accessed October 10, 2018, http://protege.stanford.edu.
[25] Musen, M.A. "The Protege project: A look back and a look forward."AI Matters. Association of Computing Machinery Specific Interest Group in Artificial Intelligence, 1(4) (2015): 4-12. doi: 10.1145/2557001.25757003.
Published
2019-01-24
How to Cite
SHARIPBAY, А. et al. Development of an unified meta language of the turkic languages morphology. Journal of Mathematics, Mechanics and Computer Science, [S.l.], v. 100, n. 4, p. 78=87, jan. 2019. ISSN 1563-0277. Available at: <http://bm.kaznu.kz/index.php/kaznu/article/view/557>. Date accessed: 23 may 2019.
Keywords morphology, turkic languages, metalanguage, thesaurus