Efficiency comparison of parallel implementations Thomas algorithm: pipelined Thomas algorithm, parallel Thomas algorithm

Authors

  • D. Zh. Akhmed-Zaki al-Farabi Kazakh National University
  • D. V. Lebedev al-Farabi Kazakh National University
  • V. A. Perepelkin Институт вычислительной математики и математической геофизики СО РАН

Keywords:

MPI, Parallel program, Janenko method, Pipelined Thomas algorithm

Abstract

Study the problem of solutions a series of tridiagonal systems of equations Thomas algorithm for use in three-dimensional problems such as heat conduction problem for large mesh sizes (1000 3 and more). At Supercomputing performance growth requires the use of highly efficient parallel programs. The Thomas algorithm examine of because of that that he is a direct method, economical and easy to implement in the case of a sequential program, but it’s hard to parallelize because of information dependencies between the operations of the algorithm. This method is also used for solving the three-dimensional and two-dimensional problems, which gives rise to a series of Thomas algorithm. And an efficient algorithm parallelization Thomas algorithm will allow to solve such problems on supercomputers with a good performance. The paper summarizes the two algorithm parallelization Thomas algorithm, the example of the one-dimensional solutions of an elliptic equation with different sizes of servings, using MPI standard, as well as a comparison of their efficiency by using a series Thomas algorithm. The results of numerical experiments to study the optimum size of servings, to draw conclusions about the applicability of the studied algorithms for large three-dimensional problems on supercomputers containing tens of thousands of compute nodes and more.

References

[1] Samarskiy А.А. The theory of difference systems. - М.: Nauka, 1977. -656p.
[2] Davidson A., Zhang Y., D. Owens J. An auto-tuned method for solving large tridiagonal systems on the GPU. URL: http://idav.ucdavis.edu/func/return_pdf?pub_id=1052 (дата обращения: 07.09.2016).
[3] 3. Kim H.-S., Wu S., Chang L.-W., Hwu W.-M. A scalable tridiagonal solver for gpus. // In Parallel Processing (ICPP), 2011 International Conference on. P. 444 –453.
[4] Hockney R. W., Jesshope C. R. Parallel computers: architecture, programming and algorithm. // Hilger. Bristol. - 1986. - P. 274-280.
[5] NVIDIA Corporation. CUDA CUSPARSE Library. URL: https://developer.nvidia.com/cusparse (дата обращения: 07.09.2016).
[6] Sengupta S., Harris M., Zhang Y., Owens J. D. Scan primitives for GPU computing. URL: http://idav.ucdavis.edu/func/return_pdf?pub_id=915 (дата обращения: 07.09.2016).
[7] Goddeke D., Strzodka R. Cyclic reduction tridiagonal solvers on GPUs applied to mixed-precision multigrid // IEEE Transactions on Parallel and Distributed Systems. - 2011. - Vol. 22. - P. 22–32.
[8] Davidson A., Owens J.D. Register packing for cyclic reduction: A case study. URL: http://idav.ucdavis.edu/func/return_pdf?pub_id=1052 (дата обращения: 07.09.2016).
[9] Egloff D. High performance finite difference PDE solvers on GPUs. URL: http://www.researchgate.net/publication/228942149_High_performance_finite_difference_PDE_solvers_on_GPUs (дата обращения: 07.09.2016).
[10] Sakharnykh N. Efficient tridiagonal solvers for ADI methods and fluid simulation. // NVIDIA GPU Technology Conference. - 2010. - P.17-21
[11] Zhang Y., Cohen J., et al. Fast tridiagonal solvers on the GPU. URL: http://idav.ucdavis.edu/func/return_pdf?pub_id=978 (дата обращения: 07.09.2016).
[12] Stone H. S. An efficient parallel algorithm for the solution of a tridiagonal linear system of equations // Journal of the ACM. - 1973. - Vol. 20. - P. 27–38.
[13] Chang L.-W., Stratton J. A., Kim H.-S., Hwu W.-M. A scalable, numerically stable, highperformance tridiagonal solver using GPUs. High Performance Computing, Networking, Storage and Analysis (SC). // 2012 International Conference for. IEEE Computer Society Press. 2012. - P. 11.
[14] Polizzi E., Sameh A. A parallel hybrid banded system solver: The SPIKE algorithm // Parallel Computing. - 2006. - Vol. 32, No. 2. - P. 177–194.
[15] 15. Polizzi E., Sameh A. SPIKE: A parallel environment for solving banded linear systems // Computers and Fluids. - 2007. - Vol. 36, No. 1. - P. 113– 120.
[16] 15. Yanenko N.N., Konovalov A.N., Bugrov A.N., Shustov G.V On organization of parallel computations and parallelization of the tridiagonal matrix algorithm // Numerical Methods of Continuum Mechanics. - 1978. - Vol. 9. No7. - P. 139-146.
[17] Povitsky A. Parallelization of the pipelined Thomas algorithm. // ICASE Report No. 98-48. Hampton. NASA Langley Research Center. - 1998. - P. 22.
[18] Sapronov I.S., Bykov А.N. Pipelined Thomas algorithm // Atom. - 2009. - No 44. - P. 24-25.
[19] 19. Janenko N. N. The method of fractional steps for solving multidimensional problems of mathematical physics. - Novosibirsk: Nauka, 1967. - 197p.
[20] Akhmed-Zaki D., Lebedev D., Perepelkin V. Implementation of a Three-Phase Fluid Flow ("Oil-Water-Gas") Numerical Model in the LuNA Fragmented Programming System //Proc. of the 13th Conference on Parallel Computing Technologies, LNCS 9251. -2015. - P. 489-497.

Downloads

Published

2018-11-01