Enhanced Parallel Shared-Memory Sparse Matrix– Vector Multiplication using Optimized CSB

Krishna Girish; G. Kharmega Sundararaj

Enhanced Parallel Shared-Memory Sparse Matrix– Vector Multiplication using Optimized CSB

Krishna Girish, G. Kharmega Sundararaj

Abstract

Sparse matrix-vector multiplication (SpMxV) has been considered as one of the most significant computational scientific kernels approaches. The key algorithmic approach of the SpMxV kernel, that inhibits it from achieving high performance, is its very low flop: byte ratio with speeded performance. Accessing the tremendous potential of throughput-oriented processors for sparse operations requires that we should allow substantial fine-grained parallelism and impose sufficient regularity on execution paths and memory access patterns. In this paper, a storage format for sparse
matrices, called optimized compressed sparse blocks (CSB), which allows both Ax and ATx to be computed efficiently in parallel, where A is an n×n sparse matrix with nnz≥n non zeros and x is a dense nvector is used to enhance the speed of computation in parallelization The proposed system provides optimizational approach for enhanced Computations. Our sparse matrix multiple-vector multiplication algorithm provides high throughput results on all platforms and is implemented using platform neutral optimizations. The proposed Storage format is optimizational approach that allow high rate access additional computational capabilities. Experimental results indicate that on one processor, the CSB algorithms for Ax and ATx run just as fast as the CSR algorithm for Ax, but the CSB algorithms also scale up linearly with processors until limited by off-chip memory bandwidth. We show that the use of enhanced CSB not only improves the performance significantly but reduces matrix storage also.

Keywords

Sparse Matrix Vector Multiplication, Compressed Sparse Block Formats, Optimizations

Full Text:

PDF

References

R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An efficient multithreaded runtime system. In PPoPP, pages 207–216, Santa Barbara, CA, July 1995.

R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. JACM, 46(5):720–748, Sept. 1999.

A. Buluç and J. R. Gilbert. On the representation and multiplication of hypersparse matrices. In IPDPS, pages 1–11, 2008.

U. Catalyurek and C. Aykanat. A fine-grain hypergraph model for 2D decomposition of sparse matrices. In IPDPS, page 118, Washington, DC, USA, 2001. IEEE Computer Society.

J. Cho, H. Garcia-Molina, T. Haveliwala, W. Lam, A. Paepcke, S Raghavan, and G. Wesley. Stanford webbase components and applications. ACM Transactions on Internet Technology, 6(2):153–186, 2006.

Cilk Arts, Inc., Burlington, MA. Cilk++ Programmer’s Guide, 2009. Available from http://www.cilk.com/.

T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. The MIT Press, third edition, 2009.

E. Cuthill and J. McKee. Reducing the bandwidth of sparse symmetric matrices. In Proceedings of the 24th National Conference, pages 157–172, New York, NY, USA, 1969. ACM.

T. A. Davis. University of Florida sparse matrix collection. NA Digest, 92, 1994.

T. A. Davis. Direct Methods for Sparse Linear Systems. SIAM, Philadelpha, PA, 2006.

J. Dongarra. Sparse matrix storage formats. In Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, editors, Templates for the Solution of Algebraic Eigenvalue Problems: a Practical Guide. SIAM, 2000.

J. Dongarra, P. Koev, and X. Li. Matrix-vector and matrix-matrix multiplication. In Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, editors, Templates for the Solution of Algebraic Eigenvalue Problems: a Practical Guide. SIAM, 2000.

A. V. Knyazev. Toward the optimal preconditioned eigensolver: Locally optimal block preconditioned conjugate gradient method. SIAM Journal on Scientific Computing, 23(2):517–541, 2002.

C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh. Basic Linear Algebra Subprograms for Fortran Usage. ACM Transactions on Mathematical Software (TOMS), 5(3):308–323, 1979.

E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym. NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro, 28(2):39–55, Mar/Apr 2008.

J. Nickolls, I. Buck, M. Garland, and K. Skadron. Scalable parallel programming with CUDA. Queue, 6(2):40–53, Mar/Apr 2008.

NVIDIA Corporation. NVIDIA CUDA Programming Guide, June 2008. Version 2.0.

Y. Saad. SPARSKIT: A basic tool kit for sparse computations; Version 2, June 1994

Refbacks

There are currently no refbacks.

Username
Password
Remember me