Development of an unified meta language of the turkic languages morphology
DOI:
https://doi.org/10.26577/JMMCS-2018-4-557Keywords:
morphology, turkic languages, metalanguage, thesaurusAbstract
Currently, due to the sharp increase of information amount in natural languages on the Internet and social networks, research and development in the field of computational linguistics is becoming extremely relevant. As is known, computational linguistics is a new scientific field and part of computer science.Computational linguistics includes the Natural Language Proccesing (NLP). Creating a unified metalanguage for Turkic languages (UniTurk) is an important task for processing Turkic languages. An unified metalanguage system will allow to unify tags, facilitate their understanding and use common software, as well as conduct various studies on linguistic-statistical comparative analysis among the Turkic languages.The article presents some of the results obtained on an international project to create a multilingual ontology and unified metalanguage of the Turkic languages morphology. Using ontological models, the morphological rules of the Turkic (Kazakh, Kyrgyz, Tatar, Turkish, and Uzbek) languages are formalized. The result of these works can be used in the NLP applications, for example, for corpus tagging, in knowledge extraction systems, information retrieval systems, machine translation, etc.
References
[2] "Resolution of the scientific-practical seminar "Unification of the systems of grammatical annotation in the Turkic languages corpora (UniTurk seminar) accessed October 10, 2018, http://www.turklang.tatar/ru/резолюция/.
[3] Melchuk I.A. Kurs obschey morfologii [General morphology course]. Moskva - Vena: «Progress», Vol 1, 1997.
[4] Zalevskaya A.A. Vvedenie v psiholingvistiku [Introduction to Psycholinguistics]. M.: Rossiysk. gos. gumanit. un-t, 2000.
[5] Marcus, Mitchell P. Beatrice Santorini, and Mary Ann Marcinkiewicz. "Building a large annotated corpus of English: the Penn Treebank."Computational Linguistics.19(2)(1993):313–330.
[6] “British National Corpus”, accessed September 10, 2018, http://www.natcorp.ox.ac.uk.
[7] Garside, R. "The CLAWS Word-tagging System."In: R. Garside, G. Leech and G. Sampson (eds), The Computational Analysis of English: A Corpus-based Approach. London: Longman, 1987.
[8] “The Open American National Corpus”, accessed October 10, 2018, http://www.anc.org.
[9] Ide, N. "The American National Corpus: Then, Now, and Tomorrow."Selected Proceedings of the 2008 HCSNet Workshop on Designing the Australian National Corpus: Mustering Languages. Sommerville, MA, 2008.
[10] Jurafsky, Daniel and James H. Martin.Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. 2nd Edition. Prentice-Hall, 2009.
[11] Nitin Indurkhya and Fred J. Damerau. Handbook of Natural Language Processing. 2nd Edition. Chapman and Hall/CRC, 2010.
[12] "Kazakh Language Corpus accessed September 10, 2018, http://kazcorpus.kz/.
[13] Makhambetov O., Makazhanov A., Yessenbayev Zh., Matkarimov B., Sabyrgaliyev I., and Sharafudinov A. "Assembling the Kazakh Language Corpus"In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (2013): 1022–1031.
[14] Madieva G.B., Umatova Zh.M. "Ob Almatinskom korpuse kazahskogo yazyika[About Almaty Corpus of the Kazakh language]"KazNU Bulletin. Philology series. no.5(157) (2015): 99-103.
[15] "Tugan Tel"Tatar National Corpus accessed October 10, 2018, http://tugantel.tatar.
[16] Galieva A., Khakimov B., Gatiatullin A. "On the Way to the Relevant Grammatical Tagset for the Tatar National Corpus "EPiC Series in Language and Linguistics .CILC2016. 8th International Conference on Corpus Linguistics, Volume 1 (2016): 121–129.
[17] “Turkish National Corpus (TNC)”, accessed October 10, 2018, http://www.tnc.org.tr/.
[18] Kubedinova L., Gatiatullin A. "Morphological tagging of crimean tatar electronic corpus."Proceedings of the international conference "Turkic languages processing"TurkLang-2015(2015):331-337.
[19] Zheltov P. "Morphological annotation system for the national corpus of the chuvash language."Proceedings of the international conference "Turkic languages processing"TurkLang-2015 (2015):328-331.
[20] Sharipbay A., Mukanova A., Yergesh B., Zhetkenbay L., Zulkhazhav A., Yelibayeva G. "Ontology modeling of morphological rules of the kazakh and turkish languages."Abstract of the VI international conference "Modern problems of applied mathematics and information technology - al-Khorezmiy 2018"(2018): 51-52.
[21] Zhetkenbay L., Sharipbay A., Bekmanova G., Kamanur U. "Ontological modeling of morphological rules for the adjectives in Kazakh and Turkish languages."Journal of Theoretical and Applied Information Technology , Vol. 91. No.2 (2016): 257- 263.
[22] Aripov M., Sharipbay A., Abdurakhmonova N., Razakhova B. "Ontology of grammar rules as example of noun of Uzbek and Kazakh languages."Abstract of the VI international conference "Modern problems of applied mathematics and information technology - al-Khorezmiy 2018"(2018): 37-38.
[23] Fellbaum, Christiane. "WordNet and wordnets."Encyclopedia of Language and Linguistics, ed. Brown, Keith et al., 2nd Edition, Oxford: Elsevier (2007): 665-670.
[24] "Protege accessed October 10, 2018, http://protege.stanford.edu.
[25] Musen, M.A. "The Protege project: A look back and a look forward."AI Matters. Association of Computing Machinery Specific Interest Group in Artificial Intelligence, 1(4) (2015): 4-12. doi: 10.1145/2557001.25757003.