This is an archival dump of old wiki content --- see for current material

Array Creation

There are 5 general mechanisms for creating arrays:

  1. Conversion from other Python structures (e.g., lists, tuples)
  2. Intrinsic numpy array array creation objects (e.g., arange, ones, zeros, etc.)
  3. Reading arrays from disk, either from standard or custom formats
  4. Creating arrays from raw bytes through the use of strings or buffers
  5. Use of special library functions (e.g., random)

This section will not cover means of replicating, joining, or otherwise expanding or mutating existing arrays. Nor will it cover creating object arrays or record arrays. Both of those are covered in their own sections.

Converting Python array-like objects to numpy arrays

[perhaps the following few paragraphs should be shorter and refer to the array() docstring? Yeah, I think so. I got carried away...]

In general, numerical data arranged in an array-like structure in Python can be converted to arrays through the use of the array function. The most obvious examples are lists and tuples. If you provide a list or tuple consisting only of numbers to array() it will create a numpy array of the appropriate type and dimension. Typically, the type of the array will be that of the most general numerical type found in the list or tuple. For example, if the list contains both integers and floats, the array will be of float type. Likewise, if a tuple contains both floats and complex numbers, the resulting numpy array will be of type complex. In both these cases the type will be of the highest precision available (float64 and complex64). Simple lists or tuples of numbers will produce a one- dimensional array with a size corresponding to the input list or tuple

Multidimensional arrays can likewise be created. If nested lists are provided, so long as all the nested lists (or tuples) are consistent in size (e.g. a list of 10 lists, each containing 20 numbers), a multiply-dimensioned numpy array will be created, e.g., 10x20 array for the specific case mentioned. The dtype keyword can be used to coerce the type of the array to something other than the standard default. There are also other options to array that permit creating arrays with special characteristics. Read section xxx for details on these special characteristics. These are not normally needed for ordinary use.

The array() function can take arguments other than lists and tuples. If the object satisfies the "sequence" protocol, i.e., it can be indexed like lists or tuples, it can be also converted into a numpy array. If you pass a numpy array as an argument to array(), you will just get a reference to the argument (nothing is done) so array() can be used generically arguments if you write a function that works with arrays or things that map to arrays. Finally, some objects may support the array-protocol and allow conversion to arrays this way. A simple way to find out if the object can be converted to a numpy array using array() is simply to try it interactively and see if it works! (The Python Way).


>>> x = array([2,3,1,0])
>>> x
array([2, 3, 1, 0])
>>> x = array([[1,2.0],[0,0],(1+1j,3.)]) # note mix of tuple and
lists, and types
>>> x
array([[ 1.+0.j,  2.+0.j],
       [ 0.+0.j,  0.+0.j],
       [ 1.+1.j,  3.+0.j]])

Intrinsic numpy array creation

Numpy has built-in functions for creating arrays from scratch:

zeros(shape) will create an array filled with 0 values with the specified shape. The default dtype is float64.

>>> zeros((2, 3))
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

ones(shape) will create an array filled with 1 values. It is identical to zeros in all other respects.

arange() will create arrays with regularly incrementing values. Check the docstring for complete information on the various ways it can be used. A few examples will be given here:

>>> arange(10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

>>> arange(2, 10, dtype=float)
array([ 2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])

>>> arange(2, 3, 0.1)
array([ 2. ,  2.1,  2.2,  2.3,  2.4,  2.5,  2.6,  2.7,  2.8,  2.9])

Note that there are some subtleties regarding the last usage that the user should be aware of that are described in the arange docstring.

indices() will create a set of arrays (stacked as a one-higher dimensioned array), one per dimension with each representing variation in that dimension. An examples illustrates much better than a verbal description:

>>> indices((3,3))
array([[[0, 0, 0],
        [1, 1, 1],
        [2, 2, 2]],

       [[0, 1, 2],
        [0, 1, 2],
        [0, 1, 2]]])

This is particularly useful for evaluating functions of multiple dimensions on a regular grid.

Reading arrays from disk

This is presumably the most common case of large array creation. The details, of course, depend greatly on the format of data on disk and so this section can only give general pointers on how to handle various formats.

Standard binary formats

Various fields have standard formats for array data. The following lists the ones with known python libraries to read them and return numpy arrays (there may be others for which it is possible to read and convert to numpy arrays so check the last section as well)

Format Package
HDF5 PyTables
Others xxx

Examples of formats that cannot be read directly but for which it is not hard to convert are libraries like PIL (able to read and write many image formats such as jpg, png, etc).

Common ascii formats

Comma Separated Value files (CSV) are widely used (and an export and import option for programs like Excel). There are a number of ways of reading these files in Python. The most convenient ways of reading these are found in pylab (part of matplotlib) in the xxx function. (list alternatives xxx)

More generic ascii files can be read using the io package in scipy. xxx a few more details needed...

Custom binary formats

There are a variety of approaches one can use. If the file has a relatively simple format then one can write a simple I/O library and use the numpy fromfile() function and .tofile() method to read and write numpy arrays directly (mind your byteorder though!) If a good C or C++ library exists that read the data, one can wrap that library with a variety of techniques (see xxx) though that certainly is much more work and requires significantly more advanced knowledge to interface with C or C++.

Use of special libraries

There are libraries that can be used to generate arrays for special purposes and it isn't possible to enumerate all of them. The most common uses are use of the many array generation functions in random that can generate arrays of random values, and some utility functions to generate special matrices (e.g. diagonal, see xxx)

SciPy: Developer_Zone/numpy.doc_Module/ArrayCreation (last edited 2015-10-24 17:48:24 by anonymous)