IMPLEMENTATION OF A PARALLEL ALGORITHM TO EXTRACT N-GRAM FROM TEXT IN A FUNCTIONAL LANGUAGE

Abstract

This paper discusses the implementation of a parallel algorithm for extracting N-grams from a semi-structured text in the functional language of the fragmented programming LuNA system. The N-gram extraction algorithm relates to NLP tasks. The analysis of other considered implementations of the parallel algorithm using MPJ Express, Apache Spark and Apache Hadoop technologies were carried out. Based on the analysis, it is proposed to choose the LuNA system due to the fact that it is able to automatically configure the algorithm for a specific computer system due to the algorithm model used in the form of a set of sequential information-dependent tasks that are dynamically distributed among the processor and processor cores. The paper describes the implementation scheme of this algorithm using fragmented programming technology. In this paper the scheme of division into data fragments and fragments of calculations is described. The implementation scheme of the N-gram extraction algorithm is presented. Testing was conducted on a different number of processors to extract N-gram by words. When extracting tokens, all stop words that were set in advance in a separate text storage were deleted. Testing showed good efficiency of the proposed approach for the implementation of algorithms using the LuNA system.
Published
2020-09-30
How to Cite
DARIBAYEV, B. S.; LEBEDEV, D. V.; AKHMED-ZAKI, D. Zh.. IMPLEMENTATION OF A PARALLEL ALGORITHM TO EXTRACT N-GRAM FROM TEXT IN A FUNCTIONAL LANGUAGE. Journal of Mathematics, Mechanics and Computer Science, [S.l.], v. 107, n. 3, p. 47-56, sep. 2020. ISSN 2617-4871. Available at: <https://bm.kaznu.kz/index.php/kaznu/article/view/783>. Date accessed: 22 oct. 2020.