Реализация параллельного алгоритма извлечения N-gram из текста на функциональном языке

B. S. Daribayev; D. V. Lebedev; D. Zh. Akhmed-Zaki

doi:10.26577/JMMCS.2020.v107.i3.05

Authors

B. S. Daribayev Al-Farabi Kazakh National University, University of International Business (UIB), Almaty, Kazakhstan http://orcid.org/0000-0003-1313-9004
D. V. Lebedev Al-Farabi Kazakh National University, Almaty, Kazakhstan; http://orcid.org/0000-0002-5186-6483
D. Zh. Akhmed-Zaki University of International Business (UIB), Almaty, Kazakhstan http://orcid.org/0000-0001-8100-8263

DOI:

https://doi.org/10.26577/JMMCS.2020.v107.i3.05

Keywords:

parallel algorithm, functional language, LuNA, N-gram, fragmented programming

Abstract

This paper discusses the implementation of a parallel algorithm for extracting N-grams from a semi-structured text in the functional language of the fragmented programming LuNA system. The N-gram extraction algorithm relates to NLP tasks. The analysis of other considered implementations of the parallel algorithm using MPJ Express, Apache Spark and Apache Hadoop technologies were carried out. Based on the analysis, it is proposed to choose the LuNA system due to the fact that it is able to automatically configure the algorithm for a specific computer system due to the algorithm model used in the form of a set of sequential information-dependent tasks that are dynamically distributed among the processor and processor cores. The paper describes the implementation scheme of this algorithm using fragmented programming technology. In this paper the scheme of division into data fragments and fragments of calculations is described. The implementation scheme of the N-gram extraction algorithm is presented. Testing was conducted on a different number of processors to extract N-gram by words. When extracting tokens, all stop words that were set in advance in a separate text storage were deleted. Testing showed good efficiency of the proposed approach for the implementation of algorithms using the LuNA system.

Implementation of A Parallel Algorithm to Extract N-gram from Text in a Functional Language

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

How to Cite

Language

Information

Links