Code and Documentation Style Guide - The Missing Bits#

This is a collection of coding and documentation guidelines for SciPy that are not explicitly stated in the existing guidelines and standards, including

Some of these are trivial, and might not seem worth discussing, but in many cases, the issue has come up in a pull request review in either the SciPy or NumPy repositories. If a style issue is important enough that a reviewer will require a change before merging, then it is important enough to be documented–at least for cases where the issue can be resolved with a simple rule.

Coding Style and Guidelines#

Note that docstrings should be generally made up of ASCII characters in spite of being Unicode. The following code block from the file tools/check_unicode.py tells the linter which additional characters are allowed:

18latin1_letters = set(chr(cp) for cp in range(192, 256))
19greek_letters = set('αβγδεζηθικλμνξoπρστυϕχψω' + 'ΓΔΘΛΞΠΣϒΦΨΩ')
20box_drawing_chars = set(chr(cp) for cp in range(0x2500, 0x2580))
21extra_symbols = set('®ő∫≠≥≤±∞²³·→√')
22allowed = latin1_letters | greek_letters | box_drawing_chars | extra_symbols

Required keyword names#

For new functions or methods with more than a few arguments, all parameters after the first few “obvious” ones should require the use of the keyword when given. This is implemented by including * at the appropriate point in the signature.

For example, a function foo that operates on a single array but that has several optional parameters (say method, flag, rtol and atol) would be defined as:

def foo(x, *, method='basic', flag=False, rtol=1.5e-8, atol=1-12):
    ...

To call foo, all parameters other than x must be given with an explicit keyword, e.g. foo(arr, rtol=1e-12, method='better').

This forces callers to give explicit keyword parameters (which most users would probably do anyway even without the use of *), and it means additional parameters can be added to the function anywhere after the *; new parameters do not have to be added after the existing parameters.

Return Objects#

For new functions or methods that return two or more conceptually distinct elements, return the elements in an object type that is not iterable. In particular, do not return a tuple, namedtuple, or a “bunch” produced by scipy._lib._bunch.make_tuple_bunch, the latter being reserved for adding new attributes to iterables returned by existing functions. Instead, use an existing return class (e.g. OptimizeResult), a new, custom return class.

This practice of returning non-iterable objects forces callers to be more explicit about the element of the returned object that they wish to access, and it makes it easier to extend the function or method in a backward compatible way.

If the return class is simple and not public (i.e. importable from a public module), it may be documented like:

Returns
-------
res : MyResultObject
    An object with attributes:

    attribute1 : ndarray
        Customized description of attribute 1.
    attribute2 : ndarray
        Customized description of attribute 2.

Here “MyResultObject” above does not link to external documentation because it is simple enough to fully document all attributes immediately below its name.

Some return classes are sufficiently complex to deserve their own rendered documentation. This is fairly standard if the return class is public, but return classes should only be public if 1) they are intended to be imported by end-users and 2) if they have been approved by the forum. For complex, private return classes, please see how binomtest summarizes BinomTestResult and links to its documentation, and note that BinomTestResult cannot be imported from stats.

Depending on the complexity of “MyResultObject”, a normal class or a dataclass can be used. When using dataclasses, do not use dataclasses.make_dataclass, instead use a proper declaration. This allows autocompletion to list all the attributes of the result object and improves static analysis. Finally, hide private attributes if any:

@dataclass
class MyResultObject:
    statistic: np.ndarray
    pvalue: np.ndarray
    confidence_interval: ConfidenceInterval
    _rho: np.ndarray = field(repr=False)

Test functions from numpy.testing#

In new code, don’t use assert_almost_equal, assert_approx_equal or assert_array_almost_equal. This is from the docstrings of these functions:

It is recommended to use one of `assert_allclose`,
`assert_array_almost_equal_nulp` or `assert_array_max_ulp`
instead of this function for more consistent floating point
comparisons.

For more information about writing unit tests, see the NumPy Testing Guidelines.

Testing expected exceptions/ warnings#

When writing a new test that a function call raises an exception or emits a warning, the preferred style is to use pytest.raises/pytest.warns as a context manager, with the code that is supposed to raise the exception in the code block defined by the context manager. The match keyword argument is given with enough of the expected message attached to the exception/warning to distinguish it from other exceptions/warnings of the same class. Do not use np.testing.assert_raises or np.testing.assert_warns, as they do not support a match parameter.

For example, the function scipy.stats.zmap is supposed to raise a ValueError if the input contains nan and nan_policy is "raise". A test for this is:

scores = np.array([1, 2, 3])
compare = np.array([-8, -3, 2, 7, 12, np.nan])
with pytest.raises(ValueError, match='input contains nan'):
    stats.zmap(scores, compare, nan_policy='raise')

The match argument ensures that the test doesn’t pass by raising a ValueError that is not related to the input containing nan.