This is an archival dump of old wiki content --- see scipy.org for current material

Introduction

What is a sparse matrix?

Where do sparse matrices arise?

Sparsity Patterns

  1. Diagonal
  2. Block
  3. Unstructured
  4. Sensitivity of pattern to ordering, and use of reordering for locality (e.g. direct solvers)

Sparse Formats

If you've handled sparse matrices before, then some of these formats will be familiar to you. If not, don't be overwhelmed by the abundance of sparse matrix formats!

Constructing Sparse Matrices

sparse from dense

sparse to dense

sparse to sparse conversions

constructing from scratch

implementation considerations

sparsity structure changes can be expensive

constructing from scratch faster, with coo_matrix

Here's the result

The coo_matrix method is clearly the fastest, beating the others by a factor of 500. Interestingly, constructing a matrix in COO first, and then converting to LIL or DOK is still faster than using those formats directly:

construction utilities

Handling Sparse Matrices

Solving Sparse Linear Systems

There are two main classes of solvers for sparse linear systems: direct and iterative.

Direct method can be thought of as a extension of dense matrix factorizations, like the LU factorization, to sparse matrices. The goal of such algorithms is to produce factors (e.g. matrices L and U) that are as sparse as possible. Sophisticated agorithms are used to reorder rows and columns of the input matrix so that the fill in of the sparse factors is minimized.

In contrast, iterative methods solve linear systems by iteratively improving an approximation to the solution. Since iterative methods do not require a matrix factorization to be computed they are usually applied to large linear systems where the memory cost of computing such factors is prohibitive. On the other hand, many iterative methods only work for a specific type of matrix (e.g. symmetric matrices) and have several tuning parameters, so they can be somewhat more difficult to use.

Direct Factorization Methods

Iterative Methods

Solving Sparse Eigenvalue Problems

Sparse Matrix File Input/Output

MatrixMarket

MATLAB

Additional Resources

Python

Books

Other Packages and Libraries

TODO: update and integrate this information where possible

for convenience; the lil_matrix class supports basic slicing and fancy indexing with a similar syntax to NumPy arrays, and is adapted to gradually fill your sparse matrix, while compressed formats are not suited to a progressive filling.

To perform manipulations such as multiplication or inversion, first convert the matrix to either CSC or CSR format. These formats store the sparse matrix in arrays and allow faster computations than the list or dictionary-based formats. The lil_matrix format is row-based, so conversion to CSR is efficient, whereas conversion to CSC is less so.

Notes:

Examples (TODO: update)

Example 1

This is a very simple example illustrating basic usage of linsolve, sparse and some other useful things (use the latest version of linsolve.py). If you don't know how to get hold of the latest modules, spend some time in the Developer_Zone and see the SOURCE CODE section.

We're going to solve a trivial case of Ax = b, where A is a matrix, b a vector (the RHS) and x the unknowns. In this example A refers to the 'normal' matrix, and Asp to the sparse representation of A, x for the 'normal' solution and xsp for the solution arising from using the sparse method. (You'll most probably find it useful to use IPython when going through this example, especially if you're used to MatlabĀ®. The normal Python prompt looks like this >>>, IPython's default prompt (one can change it) looks like this In [x]:, where the x is a number).

Import the necessary commands from the numpy and scipy modules:

from numpy import allclose, arange, eye, linalg, ones
from scipy import linsolve, sparse

It is of course also possible to import all the commands from the two modules, but in that case it might not be a bad idea to do it as follows (note that this is just one way of doing it):

import numpy as N
import scipy as S

If you're working in an IPython session, type S. and press the <TAB> button, you'll see all the available commands (attributes). The same goes for N.

Construct an identity 1000x1000 lil_matrix. There is a couple of ways of doing this, one method is slow, the other not. A slow method:

A = eye(1000)
Asp = sparse.lil_matrix(A)

A faster method:

Asp = sparse.lil_matrix((1000,1000))
Asp.setdiag(ones(1000))

Note that Asp is still in LInked List (lil_matrix) format - leave it like that for now.

Now create vector b; we choose the entries so that we can easily check the (trivial) solution:

b = arange(1,1001)

To see what b looks like, type b or print b at your prompt.

Now let's solve Ax = b, first using A...

x = linalg.solve(A,b)

... and now using Asp:

xsp = linsolve.spsolve(Asp,b)

Check the result: x and xsp should both be equal to b, as one expects. A convenient way of checking this is to make use of allclose(). In IPython you can type allclose? to get help on it, and at the Python prompt one will type help(allclose) (this is the case for almost all commands, but some might not have a Docstring). The input and resulting output from the command is shown here:

>>> allclose(x,b)
True
>>> allclose(b,xsp)
True

Let's have a look at the difference in solution time between the two methods. In an IPython session, simply type (or scroll back with your arrow keys and just insert the time in front):

time x = linalg.solve(A,b)
Itime xsp1 = linsolve.spsolve(Asp,b)

You should see a significant difference (roughly 5 to 6 times faster).

If you're working in the normal Python shell instead, do the following:

from time import time
t=time(); x = linalg.solve(A,b); time()-t
t=time(); xsp1 = linsolve.spsolve(Asp,b); time()-t

We saw that we can get the solution to Ax = b even if A is in LIL format. We'll now inspect the difference in execution speed for the following (assuming that you're using IPython, Out [x]:'s not shown, and in a new session):

from numpy import allclose, arange, eye, linalg, random, ones
from scipy import linsolve, sparse
Asp = sparse.lil_matrix((50000,50000))
Asp.setdiag(ones(50000))
Asp[20,100:250] = 10*random.rand(150)
Asp[200:250,30] = 10*random.rand(50)
b = arange(0,50000)
time xsp1 = linsolve.spsolve(Asp,b)
time xsp2 = linsolve.spsolve(Asp.tocsc(),b)
time xsp3 = linsolve.spsolve(Asp.tocsr(),b)
allclose(xsp1,xsp2) # Should be True
allclose(xsp2,xsp3) # Should be True

Note: the line that returns xsp3 fails if you don't have a recent enough version of scipy.

The time for the last solution (xsp3) should be the fastest by a factor of roughly 1,5 to 2 times compared to the other two solutions (xsp1 and xsp2). The benefit in speed when compared to using 'normal' methods for solving sparse systems is obvious.

Example 2

Construct a 1000x1000 lil_matrix and add some values to it:

from scipy import sparse, linsolve
from numpy import random, linalg
A = sparse.lil_matrix((1000, 1000))
A[0, :100] = random.rand(100)
A[1, 100:200] = A[0, :100]
A.setdiag(random.rand(1000))

Now convert it to CSR format and solve (A A^T) x = b for x:

A = A.tocsr()
b = random.rand(1000)
x = linsolve.spsolve(A * A.T, b)

Convert it to a dense matrix and solve, and check that the result is the same:

A_ = A.todense()
x_ = linalg.solve(A_ * A_.T, b)
err = linalg.norm(x-x_)

Now we can print the error norm with:

print "Norm error =", err

It should be small :)

SciPy: SciPyPackages/Sparse (last edited 2015-10-24 17:48:26 by anonymous)