This is an archival dump of old wiki content --- see scipy.org for current material

Overview

An easy way to add attributes or user-defined methods to an array is to define a subclass of ndarray. Typically, your new class needs to define at least a __new__ method and a __array_finalize__.

The __new__ method

The __new__ is the class constructor. A call to __new__(cls,...) creates a new instance of the class cls, but doesn't initialize the instance. According to the python documentation,

If new() returns an instance of cls, then the new instance's init() method will be invoked like "init(self[, ...])", where self is  the new instance and the remaining arguments are the same as were passed to  new().
If new() does not return an instance of cls, then the new instance's init() method will not be invoked.
new() is intended mainly to allow subclasses of immutable types (like int, str, or tuple) to customize instance creation. 

It turns out that even if ndarrays are not immutable stricto sensu, they have some characteristics of immutable types: they use a definite portion of memory which is allocated at their creation, and this portion of memory cannot be drastically altered. For example, you can substitute one element or a group of elements to others that have the same type, but you cannot extend a standard ndarray as you could with lists or strings, nor can you mix types (unless particular cases, but that's another story). In other terms, from a python point of view, the __new__ method of your subclass does not call __init__. The docstring of the ndarray.__new__ method is quite clear:

No __init__ method is needed because the array is fully initialized after the __new__ method.

However, we need to keep in mind that any attribute that we define in the __new__ method will be shared among all the instances. If we want instance-specific attributes, we still need some specific initialization. We cannot use the __init__ method, as it won't be called. That's where __array_finalize__ comes to play.

Some words of caution

The definition of default values for subclass attributes (as opposed to instance attributes) in the __new__ method is strongly discouraged for several reasons:

The __array_finalize__ method

According to the numpy documentation, the .__array_finalize__(self, obj) method

is called whenever the system internally allocates a new array from obj, where obj is a subclass (subtype) of the (big)ndarray [. It can be used to change attributes of self after construction (so as to ensure a 2-d matrix for example), or to update meta-information from the “parent.” Subclasses inherit a default implementation of this method that does nothing."

In other terms, __array_finalize__ is called:

Note that if a method returns nothing, then __array_finalize__ won't be called.

__array_finalize__ is thus where we need to transform the attributes from class-generic (as they were defined by __new__) to instance-specific. More generally, it is where we must define the default attributes of the subclass.

Example

The short example below is an example of subclassed ndarray, where an extra info tag can be used to store some metainformation. To illustrate the calls to __new__ and __array_finalize__, some messages will be printed.

   1 import numpy as N
   2 
   3 class InfoArray(N.ndarray):
   4    
   5     def __new__(subtype, data, info=None, dtype=None, copy=False):
   6         print "__new__ received %s" % type(data)
   7         # Make sure we are working with an array, and copy the data if requested
   8         subarr = N.array(data, dtype=dtype, copy=copy)
   9         
  10         # Transform 'subarr' from an ndarray to our new subclass.
  11         subarr = subarr.view(subtype)
  12 
  13         # Use the specified 'info' parameter if given
  14         if info is not None:
  15             subarr.info = info
  16         # Otherwise, use data's info attribute if it exists
  17         elif hasattr(data, 'info'):
  18                 subarr.info = data.info
  19 
  20         # Finally, we must return the newly created object:
  21         return subarr
  22 
  23     def __array_finalize__(self,obj):
  24         # We use the getattr method to set a default if 'obj' doesn't have the 'info' attribute
  25         self.info = getattr(obj, 'info', {})
  26         # We could have checked first whether self.info was already defined:
  27         #if not hasattr(self, 'info'):
  28         #    self.info = getattr(obj, 'info', {})
  29                                                                                                                                                     
  30     def __repr__(self):
  31         desc="""\
  32 array(data=
  33   %(data)s,
  34       tag=%(tag)s)"""
  35         return desc % {'data': str(self), 'tag':self.info }

Such a class can be used like this:

   1 >>> x = InfoArray(N.arange(10), info={'name':'x'})
   2 __new__ received <type 'numpy.ndarray'>
   3 >>> x
   4 array(data=
   5   [0 1 2 3 4 5 6 7 8 9],
   6       tag={'name': 'x'})
   7 >>> y = InfoArray(N.arange(10), info={'name':'y'})
   8 >>> assert (x.info['name']=='x')
   9 >>> assert (y.info['name']=='y')
  10 # Now, let's try the view method...
  11 >>> z = N.arange(10).view(InfoArray)
  12 >>> assert (isinstance(z, InfoArray))
  13 >>> assert (z.info == {})   
  14 # Explanation: 
  15 # We created an InfoArray from a standard ndarray with the 'view' method.
  16 # As a ndarray does not have a 'info' attribute, the default {} is used.
  17 >>> z = x.view(InfoArray)
  18 >>> assert(z.info == x.info)
  19 # Here, we created a new InfoArray from an existing one with the 'view' method.
  20 # Therefore, the 'info' attribute is propagated to the view.
  21 #
  22 # Now, let's apply a numpy function to an InfoArray: the 'view' method will be called internally...
  23 # ... and the result will inherit the 'info' attribute of the original InfoArray
  24 >>> z = N.sqrt(x)
  25 >>> z
  26 array(data=
  27   [ 0.          1.          1.41421356  1.73205081  2.          2.23606798
  28   2.44948974  2.64575131  2.82842712  3.        ],
  29       tag={'name': 'x'})

Note that this InfoArray class is fairly basic, and can lead to surprises. For example, we haven't defined how the metadata must be updated when combining two InfoArrays. The default behavior is to use the info tag of the first element:

   1 >>> assert((x+y).info['name']=='x')
   2 True
   3 >>> assert((y+x).info['name']=='y')
   4 True

If we want to change this behavior, we need to update the __add__ method (or whatever method is required).

The __array_wrap__ method

When a numpy ufunc is called on a subclass of ndarray, the __array_wrap__ method is called to transform the result into a new instance of the subclass. By default, __array_wrap__ will call __array_finalize__, and the attributes will be inherited.

By defining a specific __array_wrap__ method for our subclass, we can tweak the output. The __array_wrap__ method requires one argument, the object on which the ufunc is applied, and an optional parameter context. This parameter is returned by some ufuncs as a 3-element tuple: (name of the ufunc, argument of the ufunc, domain of the ufunc).

For example, let's modify our InfoArray class, so that we can keep track of the modifications:

   1 def __array_wrap__(self,obj, context=None):
   2     result = obj.view(type(self))
   3     result.info.update(self.info)
   4     if context is not None:
   5             modif = result.info.get('modif','modified by')
   6             modif += ": %s" % context[0]
   7             result.info.update({'modif': modif})
   8     return result

Using the same example as earlier:

>>> z = N.sqrt(x)
>>> z
array(data=
  [ 0.          1.          1.41421356  1.73205081  2.          2.23606798
  2.44948974  2.64575131  2.82842712  3.        ],
      tag={'modif': "modified by: <ufunc 'sqrt'>", 'name': 'x'})
# Let's add 1 in place
>>> z +=1
>>> z
array(data=
  [ 1.          2.          2.41421356  2.73205081  3.          3.23606798
  3.44948974  3.64575131  3.82842712  4.        ],
      tag={'modif': "modified by: <ufunc 'sqrt'>: <ufunc 'add'>", 'name': 'x'})

A more realistic example of a __array_wrap__ methods is available in the maskedarray package. There, the mask of the output is modified to take the domain of the ufunc into account.

SciPy: Subclasses (last edited 2015-10-24 17:48:23 by anonymous)