Using f2py with gfortran + OpenMP
As you have realized by now, f2py turns wrapping a fortran code into python a manageable task. Here, I will try to describe how you can wrap a fortran95 module that also uses OpenMP. OpenMP is an open standard for parallelizing code on multi-processor/multi-core machines with relatively little effort. It has the big advantage that you can basically take a sequential program and turn it parallel without changing any of the code. Typically, you simply write a few compiler directives in front of your do loops, and you are good to go. I think the example will speak for itself - I tried to make it as simple as possible.
My approach is as follows: 1) Compile a pure fortran module, 2) Write a wrapper fortran/f2py module which will be used from python, and 3) Compile and link the wrapper to the pure fortran module. This approach avoids changing the fortran code, and lets you re-use the fortran code in a pure fortran program later.
The (pure) fortran95 module I wish to wrap contains the following code:
! fortranmodule.f95 module fortranmodule use omp_lib contains subroutine test(x, y) real(8), dimension(:), intent(in) :: x real(8), dimension(:), intent(out) :: y ! Code begins integer :: i, n integer :: num_threads n = size(x, 1) !$omp parallel do private(i) firstprivate(n) shared(x, y) do i = 1, n if (i == 1) then ! The if clause can be removed for serious use. ! It is here for debugging only. num_threads = OMP_get_num_threads() print *, 'num_threads running:', num_threads end if y(i) = sin(x(i)) + cos(x(i) + exp(x(i))) + log(x(i)) end do !$omp end parallel do end subroutine test end module fortranmodule
The wrapper module looks like this:
module w_fortranmodule use fortranmodule, only: test implicit none contains subroutine w_test(x, y, ysize) real(8), intent(in), dimension(ysize) :: x real(8), intent(out), dimension(ysize) :: y integer :: ysize !f2py intent(hide) :: ysize=shape(x, 0) !f2py depend(ysize) y call test(x, y) end subroutine w_test end module w_fortranmodule
I use a setup.py script, it is rather simple:
import os from numpy.distutils.misc_util import Configuration from numpy.distutils.core import setup def configuration(parent_package='',top_path=None): config = Configuration(None,parent_package,top_path) libraries = [ #'gomp', #'blas', ] config.add_extension('wrapper', ['fortranmodule.f95','wrapper.f95'], libraries = libraries, f2py_options = [], define_macros = [('F2PY_REPORT_ON_ARRAY_COPY','1')], # this is the flag gfortran needs to process OpenMP directives extra_compile_args = ['-fopenmp'], extra_link_args = [], ) return config if __name__ == "__main__": setup(configuration=configuration)
I like to have a Makefile to remember the terminal commands and flags for later use.
.PHONY: all clean test all: env F90FLAGS=-fopenmp python setup.py build_ext --inplace --fcompiler=gnu95 test: all @echo Running with 1 thread... env OMP_NUM_THREADS=1 python script.py @echo Running with 2 thread... env OMP_NUM_THREADS=2 python script.py @echo Running with 4 thread... env OMP_NUM_THREADS=4 python script.py @echo Running with 8 thread... env OMP_NUM_THREADS=8 python script.py clean: rm -rf build rm -rf wrapper.so
All you need to do now, is basically to run "make test". Hopefully, your output should look something like mine:
Running with 1 thread... env OMP_NUM_THREADS=1 python script.py num_threads running: 1 Time elapsed: 2.17034s Running with 2 thread... env OMP_NUM_THREADS=2 python script.py num_threads running: 2 Time elapsed: 1.13914s Running with 4 thread... env OMP_NUM_THREADS=4 python script.py num_threads running: 4 Time elapsed: 0.60494s Running with 8 thread... env OMP_NUM_THREADS=8 python script.py num_threads running: 8 Time elapsed: 0.64650s
These test results were run on a 4-core Fedora 12 PC; as we can see, we get a speedup of about 3.5 for 4 threads. Using more threads than cores is of no help, as expected.
My test program script.py reads as follows:
import time import numpy from wrapper import w_fortranmodule N = 10**7 def main(): xs = numpy.linspace(1, 10, N) tstart = time.time() ys = w_fortranmodule.w_test(xs) dt = time.time() - tstart print 'Time elapsed: %.5fs' % dt if __name__ == '__main__': main()
Tested OS-es and gfortran versions: Mac OS X 10.6 with gfortran 4.4.4. Fedora 12 with gfortran 4.4.4. (Suggestions and/or test results from other OS-es, compilers, and versions are of course welcome!)