Inline Weave With Basic Array Conversion (no Blitz)
Python and Numpy are designed to express general statements that work transparently on many sizes of incoming data. Using inline Weave with Blitz conversion can dramatically speed up many numerical operations (eg, addition of a series of arrays) because in some ways it bypasses generality. How can you speed up your algorithms with inline C code while maintaining generality? One tool provided by Numpy is the iterator. Because an iterator keeps track of memory indexing for you, its operation is very analogous to the concept of iteration in Python itself. You can write loop in C that simply says "take the next element in a serial object--the PyArrayObject--and operate on it, until there are no more elements."
This is a very simple example of multi dimensional iterators, and their power to "broadcast" arrays of compatible shapes. It shows that the very same code that is entirely ignorant of dimensionality can achieve completely different computations based on the rules of broadcasting. I have assumed in this case that a has at least as many dimensions as b. It is important to know that the weave array conversion of a gives you access in C++ to:
py_a -- PyObject *
a_array -- PyArrayObject *
a -- (c-type *) py_array->data
1 import numpy as npy
2 from scipy.weave import inline
3
4 def multi_iter_example():
5 a = npy.ones((4,4), npy.float64)
6 # for the sake of driving home the "dynamic code" approach...
7 dtype2ctype = {
8 npy.dtype(npy.float64): 'double',
9 npy.dtype(npy.float32): 'float',
10 npy.dtype(npy.int32): 'int',
11 npy.dtype(npy.int16): 'short',
12 }
13 dt = dtype2ctype.get(a.dtype)
14
15 # this code does a = a*b inplace, broadcasting b to fit the shape of a
16 code = \
17 """
18 %s *p1, *p2;
19 PyObject *itr;
20 itr = PyArray_MultiIterNew(2, a_array, b_array);
21 while(PyArray_MultiIter_NOTDONE(itr)) {
22 p1 = (%s *) PyArray_MultiIter_DATA(itr, 0);
23 p2 = (%s *) PyArray_MultiIter_DATA(itr, 1);
24 *p1 = (*p1) * (*p2);
25 PyArray_MultiIter_NEXT(itr);
26 }
27 """ % (dt, dt, dt)
28
29 b = npy.arange(4, dtype=a.dtype)
30 print '\n A B '
31 print a, b
32 # this reshaping is redundant, it would be the default broadcast
33 b.shape = (1,4)
34 inline(code, ['a', 'b'])
35 print "\ninline version of a*b[None,:],"
36 print a
37 a = npy.ones((4,4), npy.float64)
38 b = npy.arange(4, dtype=a.dtype)
39 b.shape = (4,1)
40 inline(code, ['a', 'b'])
41 print "\ninline version of a*b[:,None],"
42 print a
There are two other iterator applications in iterators_example.py and iterators.py.
Deeper into the "inline" method
The docstring for inline is enormous, and indicates that all kinds of compiling options are supported when integrating your inline code. I've taken advantage of this to make some specialized FFTW calls a lot more simple, and in only a few additional lines add support for inline FFTs. In this example, I read in a file of pure C code and use it as support_code in my inline statement. I also use a tool from Numpy's distutils to locate my FFTW libraries and headers.
1 import numpy as N
2 from scipy.weave import inline
3 from os.path import join, split
4 from numpy.distutils.system_info import get_info
5
6 fft1_code = \
7 """
8 char *i, *o;
9 i = (char *) a;
10 o = inplace ? i : (char *) b;
11 if(isfloat) {
12 cfft1d(reinterpret_cast<fftwf_complex*>(i),
13 reinterpret_cast<fftwf_complex*>(o),
14 xdim, len_array, direction, shift);
15 } else {
16 zfft1d(reinterpret_cast<fftw_complex*>(i),
17 reinterpret_cast<fftw_complex*>(o),
18 xdim, len_array, direction, shift);
19 }
20 """
21 extra_code = open(join(split(__file__)[0],'src/cmplx_fft.c')).read()
22 fftw_info = get_info('fftw3')
23
24 def fft1(a, shift=True, inplace=False):
25 if inplace:
26 _fft1_work(a, -1, shift, inplace)
27 else:
28 return _fft1_work(a, -1, shift, inplace)
29
30 def ifft1(a, shift=True, inplace=False):
31 if inplace:
32 _fft1_work(a, +1, shift, inplace)
33 else:
34 return _fft1_work(a, +1, shift, inplace)
35
36 def _fft1_work(a, direction, shift, inplace):
37 # to get correct C-code, b always must be an array (but if it's
38 # not being used, it can be trivially small)
39 b = N.empty_like(a) if not inplace else N.array([1j], a.dtype)
40 inplace = 1 if inplace else 0
41 shift = 1 if shift else 0
42 isfloat = 1 if a.dtype.itemsize==8 else 0
43 len_array = N.product(a.shape)
44 xdim = a.shape[-1]
45 inline(fft1_code, ['a', 'b', 'isfloat', 'inplace',
46 'len_array', 'xdim', 'direction', 'shift'],
47 support_code=extra_code,
48 headers=['<fftw3.h>'],
49 libraries=['fftw3', 'fftw3f'],
50 include_dirs=fftw_info['include_dirs'],
51 library_dirs=fftw_info['library_dirs'],
52 compiler='gcc')
53 if not inplace:
54 return b
This code is available in fftmod.tar.gz.