Using f2py with gfortran + OpenMP

As you have realized by now, f2py turns wrapping a fortran code into python a manageable task. Here, I will try to describe how you can wrap a fortran95 module that also uses OpenMP. OpenMP is an open standard for parallelizing code on multi-processor/multi-core machines with relatively little effort. It has the big advantage that you can basically take a sequential program and turn it parallel without changing any of the code. Typically, you simply write a few compiler directives in front of your do loops, and you are good to go. I think the example will speak for itself - I tried to make it as simple as possible.

My approach is as follows: 1) Compile a pure fortran module, 2) Write a wrapper fortran/f2py module which will be used from python, and 3) Compile and link the wrapper to the pure fortran module. This approach avoids changing the fortran code, and lets you re-use the fortran code in a pure fortran program later.

The (pure) fortran95 module I wish to wrap contains the following code:

! fortranmodule.f95 
module fortranmodule
use omp_lib

contains
    subroutine test(x, y)
        real(8), dimension(:), intent(in)  :: x
        real(8), dimension(:), intent(out) :: y
        ! Code begins
        integer :: i, n
        integer :: num_threads
        n = size(x, 1)

        !$omp parallel do private(i) firstprivate(n) shared(x, y)
        do i = 1, n
            if (i == 1) then
                ! The if clause can be removed for serious use.
                ! It is here for debugging only.
                num_threads = OMP_get_num_threads()
                print *, 'num_threads running:', num_threads
            end if
            y(i) = sin(x(i)) + cos(x(i) + exp(x(i))) + log(x(i))
        end do
        !$omp end parallel do
    end subroutine test
end module fortranmodule

The wrapper module looks like this:

module w_fortranmodule
use fortranmodule, only: test
implicit none

contains
    subroutine w_test(x, y, ysize)
        real(8), intent(in), dimension(ysize)   :: x
        real(8), intent(out), dimension(ysize)  :: y
        integer                                 :: ysize
        !f2py intent(hide) :: ysize=shape(x, 0)
        !f2py depend(ysize) y
        call test(x, y)
    end subroutine w_test
end module w_fortranmodule

I use a setup.py script, it is rather simple:

import os
from numpy.distutils.misc_util import Configuration
from numpy.distutils.core import setup

def configuration(parent_package='',top_path=None):
    config = Configuration(None,parent_package,top_path)
    libraries = [
            #'gomp',
            #'blas',
            ]
    config.add_extension('wrapper',
                         ['fortranmodule.f95','wrapper.f95'],
                         libraries = libraries,
                         f2py_options = [],
                         define_macros = [('F2PY_REPORT_ON_ARRAY_COPY','1')],
                         # this is the flag gfortran needs to process OpenMP directives
                         extra_compile_args = ['-fopenmp'],
                         extra_link_args = [],
                         )
    return config

if __name__ == "__main__":
    setup(configuration=configuration)

I like to have a Makefile to remember the terminal commands and flags for later use.

.PHONY: all clean test

all:
        env F90FLAGS=-fopenmp python setup.py build_ext --inplace --fcompiler=gnu95

test: all
        @echo Running with 1 thread...
        env OMP_NUM_THREADS=1 python script.py
        @echo Running with 2 thread...
        env OMP_NUM_THREADS=2 python script.py
        @echo Running with 4 thread...
        env OMP_NUM_THREADS=4 python script.py
        @echo Running with 8 thread...
        env OMP_NUM_THREADS=8 python script.py

clean:
        rm -rf build
        rm -rf wrapper.so

All you need to do now, is basically to run "make test". Hopefully, your output should look something like mine:

Running with 1 thread...
env OMP_NUM_THREADS=1 python script.py
 num_threads running:           1
Time elapsed: 2.17034s
Running with 2 thread...
env OMP_NUM_THREADS=2 python script.py
 num_threads running:           2
Time elapsed: 1.13914s
Running with 4 thread...
env OMP_NUM_THREADS=4 python script.py
 num_threads running:           4
Time elapsed: 0.60494s
Running with 8 thread...
env OMP_NUM_THREADS=8 python script.py
 num_threads running:           8
Time elapsed: 0.64650s

These test results were run on a 4-core Fedora 12 PC; as we can see, we get a speedup of about 3.5 for 4 threads. Using more threads than cores is of no help, as expected.

My test program script.py reads as follows:

import time
import numpy
from wrapper import w_fortranmodule

N = 10**7
def main():
    xs = numpy.linspace(1, 10, N)
    tstart = time.time()
    ys = w_fortranmodule.w_test(xs)
    dt = time.time() - tstart
    print 'Time elapsed: %.5fs' % dt

if __name__ == '__main__':
    main()

Tested OS-es and gfortran versions: Mac OS X 10.6 with gfortran 4.4.4. Fedora 12 with gfortran 4.4.4. (Suggestions and/or test results from other OS-es, compilers, and versions are of course welcome!)

SciPy: Dynetrekk/f2py_OpenMP_draft (last edited 2015-10-24 17:48:24 by anonymous)