Contents

New SciPy Tutorial
Prerequisites
Older SciPy Tutorial

New SciPy Tutorial

[Under construction -- Please help!]

Prerequisites

Before reading this tutorial you should know a bit of Python. If this is not the case, or if you want to refresh your memory, take a look at the Python tutorial. In particular, you may wish to read up to section 6 (Modules).

You also need to have some software installed on your computer. You need at least

Python
and NumPy (It is useful to read the Tentative_NumPy_Tutorial)
and SciPy,

but there are some other tools that may be useful to you:

ipython is an enhanced Interactive Python shell which is very convenient for exploring SciPy's features.
matplotlib will enable you to plot graphics.

Basic matrix operations

For the basic matrix operations you can make a matrix like an array in the NumPy (see the Tentative_NumPy_Tutorial):

from numpy import matrix
from scipy.linalg import inv, det, eig

A=matrix([[1,1,1],[4,4,3],[7,8,5]]) # 3 lines 3 rows
b = matrix([1,2,1]).transpose()     # 3 lines 1 rows.

print det(A)     # We can check, whether the matrix is regular
print inv(A)*b   # Now we can print the solution of the Ax=b linear equation system.
print eig(A)

When we use numpy.array or numpy.matrix there is a difference. A*x will be in the latter case matrix product, not elementwise product as with array.

We have an other possibility to solve the equation system with scipy.linalg.solve function (see Example 1).

Sparse matrices

The scipy.sparse module provides data structures for 2D sparse matrices. There are five available sparse matrix types:

csc_matrix: Compressed Sparse Column
csr_matrix: Compressed Sparse Row format
lil_matrix: List of Lists format
dok_matrix: Dictionary of Keys format
coo_matrix: Coordinate format

To construct a matrix efficiently, using either lil_matrix (recommended) or dok_matrix is suggested, for convenience; the lil_matrix class supports basic slicing and fancy indexing with a similar syntax to NumPy arrays, and is adapted to gradually fill your sparse matrix, while compressed formats are not suited to a progressive filling.

To perform manipulations such as multiplication or inversion, first convert the matrix to either CSC or CSR format. These formats store the sparse matrix in arrays and allow faster computations than the list or dictionary-based formats. The lil_matrix format is row-based, so conversion to CSR is efficient, whereas conversion to CSC is less so.

Notes:

scipy.sparse currently only handles float numbers (both 32 and 64bits, both real and complex).
CSC format is also called Harwell-Boeing format, and used in other libraries or software (Matlab for example).

Example 1

This is a very simple example illustrating basic usage of linsolve, sparse and some other useful things (use the latest version of linsolve.py). If you don't know how to get hold of the latest modules, spend some time in the Developer_Zone and see the SOURCE CODE section.

We're going to solve a trivial case of Ax = b, where A is a matrix, b a vector (the RHS) and x the unknowns. In this example A refers to the 'normal' matrix, and Asp to the sparse representation of A, x for the 'normal' solution and xsp for the solution arising from using the sparse method. (You'll most probably find it useful to use IPython when going through this example, especially if you're used to Matlab®. The normal Python prompt looks like this >>>, IPython's default prompt (one can change it) looks like this In [x]:, where the x is a number).

Import the necessary commands from the numpy and scipy modules:

from numpy import allclose, arange, eye, linalg, ones
from scipy import linsolve, sparse

It is of course also possible to import all the commands from the two modules, but in that case it might not be a bad idea to do it as follows (note that this is just one way of doing it):

import numpy as N
import scipy as S

If you're working in an IPython session, type S. and press the <TAB> button, you'll see all the available commands (attributes). The same goes for N.

Construct an identity 1000x1000 lil_matrix. There is a couple of ways of doing this, one method is slow, the other not. A slow method:

A = eye(1000)
Asp = sparse.lil_matrix(A)

A faster method:

Asp = sparse.lil_matrix((1000,1000))
Asp.setdiag(ones(1000))

Note that Asp is still in LInked List (lil_matrix) format - leave it like that for now.

Now create vector b; we choose the entries so that we can easily check the (trivial) solution:

b = arange(1,1001)

To see what b looks like, type b or print b at your prompt.

Now let's solve Ax = b, first using A...

x = linalg.solve(A,b)

... and now using Asp:

xsp = linsolve.spsolve(Asp,b)

Check the result: x and xsp should both be equal to b, as one expects. A convenient way of checking this is to make use of allclose(). In IPython you can type allclose? to get help on it, and at the Python prompt one will type help(allclose) (this is the case for almost all commands, but some might not have a Docstring). The input and resulting output from the command is shown here:

>>> allclose(x,b)
True
>>> allclose(b,xsp)
True

Let's have a look at the difference in solution time between the two methods. In an IPython session, simply type (or scroll back with your arrow keys and just insert the time in front):

time x = linalg.solve(A,b)
Itime xsp1 = linsolve.spsolve(Asp,b)

You should see a significant difference (roughly 5 to 6 times faster).

If you're working in the normal Python shell instead, do the following:

from time import time
t=time(); x = linalg.solve(A,b); time()-t
t=time(); xsp1 = linsolve.spsolve(Asp,b); time()-t

We saw that we can get the solution to Ax = b even if A is in LIL format. We'll now inspect the difference in execution speed for the following (assuming that you're using IPython, Out [x]:'s not shown, and in a new session):

from numpy import allclose, arange, eye, linalg, random, ones
from scipy import linsolve, sparse
Asp = sparse.lil_matrix((50000,50000))
Asp.setdiag(ones(50000))
Asp[20,100:250] = 10*random.rand(150)
Asp[200:250,30] = 10*random.rand(50)
b = arange(0,50000)
time xsp1 = linsolve.spsolve(Asp,b)
time xsp2 = linsolve.spsolve(Asp.tocsc(),b)
time xsp3 = linsolve.spsolve(Asp.tocsr(),b)
allclose(xsp1,xsp2) # Should be True
allclose(xsp2,xsp3) # Should be True

Note: the line that returns xsp3 fails if you don't have a recent enough version of scipy.

The time for the last solution (xsp3) should be the fastest by a factor of roughly 1,5 to 2 times compared to the other two solutions (xsp1 and xsp2). The benefit in speed when compared to using 'normal' methods for solving sparse systems is obvious.

Example 2

Construct a 1000x1000 lil_matrix and add some values to it:

from scipy import sparse, linsolve
from numpy import random, linalg
A = sparse.lil_matrix((1000, 1000))
A[0, :100] = random.rand(100)
A[1, 100:200] = A[0, :100]
A.setdiag(random.rand(1000))

Now convert it to CSR format and solve (A A^T) x = b for x:

A = A.tocsr()
b = random.rand(1000)
x = linsolve.spsolve(A * A.T, b)

The .T bit of A.T above computes the transpose of A, i.e., it's a shortcut for A.transpose(). It will also work for 'normal' dense (numpy) matrices and currently works for arrays in the latest SVN versions of NumPy. Also take note of the differences between matrices and arrays when working with them. If B is a matrix and C an array, both with the same size and entries, then B*B is not the same as C*C. See NumPy_for_Matlab_Users for more details.

Convert it to a dense matrix and solve, and check that the result is the same:

A_ = A.todense()
x_ = linalg.solve(A_ * A_.T, b)
err = linalg.norm(x-x_)

Now we can print the error norm with:

print "Norm error =", err

It should be small

Numerical integration

Suppose we have a function, f(x), whose integral we want to evaluate. Often it is necessary to do this numerically. SciPy provides tools to do this, in the scipy.integrate module:

>>> from scipy.integrate import quad
>>> quad(lambda x: x**2, 0, 1)
(0.33333333333333331, 3.7007434154171879e-15)

The correct answer is of course 1/3. quad returns a tuple consisting of its result and its estimate of the error in its result. In this case we can see that its estimate is pretty good.

For a slightly more difficult example, let's integrate a function (related to) the blackbody spectrum:

>>> quad(lambda x: x**3/(exp(x)-1), 0.1, 10)
(6.431600896801668, 6.1280827858943929e-09)

Here we see that its error estimate is worse.

If we want to carry the integration to infinity, we can use the variable "inf" to indicate this:

>>> from scipy.integrate import inf
>>> quad(lambda x: x**3/(exp(x)-1), 0.1, inf)
(6.4936184022866668, 2.3462101939733651e-09)

The fact that the integrand can't be evaluated at zero doesn't seem to be a problem either:

>>> quad(lambda x: x**3/(exp(x)-1), 0, inf)
(6.4939394022668298, 2.6284709859300214e-09)

Under the hood, quad uses a sophisticated method ("Clenshaw-Curtis with Chebyshev moments") from the FORTRAN package QUADPACK which attempts to efficiently compute which regions contain complicated behaviour and spends more effort evaluating the function there. It can be tricked:

>>> quad(lambda t: 1e5*((1e5*t)**3/(exp(1e5*t)-1)), 0, inf)
(6.043084404980975e-84, 1.1947568723196904e-83)

Here all I have done is a change of variable t=x*1e-5, so the value of the integral should not have changed at all, but unfortunately quad fails to find the peak of the function, so it gives a wildly wrong answer with a wildly wrong error estimate.

Integrating ODEs

Scipy includes several robust ODE solvers. Let's take the simplest possible example and integrate the equation y'=y:

>>> from scipy.integrate import odeint
>>> odeint(lambda y, t: y, 1, [0, 1])
array([[ 1.        ],
       [ 2.71828193]])

Here we provided odeint with a function that computes y' given y and t - in this case we used an anonymous python function, but any function will do. We also provided it with an initial value of y (1) and the initial and final values of t (0 and 1). We then obtained the result evaluated at those times (1 and e, as we would expect). To get more values:

>>> import numpy as N
>>> odeint(lambda y, t: y, 1, N.linspace(0,1,10))
array([[ 1.        ],
       [ 1.11751906],
       [ 1.24884886],
       [ 1.39561243],
       [ 1.55962349],
       [ 1.742909  ],
       [ 1.94773405],
       [ 2.17663003],
       [ 2.43242551],
       [ 2.7182819 ]])

What about second-order differential equations? Well, the standard technique is to rewrite these as coupled first-order differential equations. Let's try integrating y=-y. We begin by introducing a variable yp, so that yp = y'. Then yp' = y = -y.

>>> def deriv(Y, t):
...    y, yp = Y
...    return yp, -y
...
>>> odeint(deriv, [1,0], [0,N.pi,2*N.pi])
array([[  1.00000000e+00,   0.00000000e+00],
       [ -9.99999975e-01,  -1.26160610e-07],
       [  1.00000003e+00,   1.47819759e-07]])

Here we see that the solution is behaving like cos(t), as we would have hoped. There is a certain amount of error; the error tolerance is one of the many things that can be adjusted with extra parameters to odeint.

Older SciPy Tutorial

Seems like people are planning to put a new online tutorial here.

Until they do, you can use this link, which is a direct conversion of Travis Oliphant's original LyX document (usually available in pdf format) to html.

This html tutorial was created using a simple htlatex command in the tutorial source directory of the "old scipy" branch - available here

The resulting files are available as one archived file: scipy_tutorial.tar.gz.

The original document was written by TravisOliphant. I think it's copyrighted to Enthought under the following license.

If you think it's useful, and wish to host it elsewhere, please do, because the current place is temporary and I don't intend to keep it forever.

-- AmitAronovitch 2006-02-17 22:58:57

SciPy: Additional_Documentation/New_SciPy_Tutorial (last edited 2015-10-24 17:48:25 by anonymous)