scipy.stats.boxcox_normmax#

scipy.stats.boxcox_normmax(x, brack=None, method='pearsonr', optimizer=None, *, ymax=BIG_FLOAT)[source]#

Compute optimal Box-Cox transform parameter for input data.

Parameters:
xarray_like

Input array. All entries must be positive, finite, real numbers.

brack2-tuple, optional, default (-2.0, 2.0)

The starting interval for a downhill bracket search for the default optimize.brent solver. Note that this is in most cases not critical; the final result is allowed to be outside this bracket. If optimizer is passed, brack must be None.

methodstr, optional

The method to determine the optimal transform parameter (boxcox lmbda parameter). Options are:

‘pearsonr’ (default)

Maximizes the Pearson correlation coefficient between y = boxcox(x) and the expected values for y if x would be normally-distributed.

‘mle’

Maximizes the log-likelihood boxcox_llf. This is the method used in boxcox.

‘all’

Use all optimization methods available, and return all results. Useful to compare different methods.

optimizercallable, optional

optimizer is a callable that accepts one argument:

funcallable

The objective function to be minimized. fun accepts one argument, the Box-Cox transform parameter lmbda, and returns the value of the function (e.g., the negative log-likelihood) at the provided argument. The job of optimizer is to find the value of lmbda that minimizes fun.

and returns an object, such as an instance of scipy.optimize.OptimizeResult, which holds the optimal value of lmbda in an attribute x.

See the example below or the documentation of scipy.optimize.minimize_scalar for more information.

ymaxfloat, optional

The unconstrained optimal transform parameter may cause Box-Cox transformed data to have extreme magnitude or even overflow. This parameter constrains MLE optimization such that the magnitude of the transformed x does not exceed ymax. The default is the maximum value of the input dtype. If set to infinity, boxcox_normmax returns the unconstrained optimal lambda. Ignored when method='pearsonr'.

Returns:
maxlogfloat or ndarray

The optimal transform parameter found. An array instead of a scalar for method='all'.

Examples

>>> import numpy as np
>>> from scipy import stats
>>> import matplotlib.pyplot as plt

We can generate some data and determine the optimal lmbda in various ways:

>>> rng = np.random.default_rng()
>>> x = stats.loggamma.rvs(5, size=30, random_state=rng) + 5
>>> y, lmax_mle = stats.boxcox(x)
>>> lmax_pearsonr = stats.boxcox_normmax(x)
>>> lmax_mle
2.217563431465757
>>> lmax_pearsonr
2.238318660200961
>>> stats.boxcox_normmax(x, method='all')
array([2.23831866, 2.21756343])
>>> fig = plt.figure()
>>> ax = fig.add_subplot(111)
>>> prob = stats.boxcox_normplot(x, -10, 10, plot=ax)
>>> ax.axvline(lmax_mle, color='r')
>>> ax.axvline(lmax_pearsonr, color='g', ls='--')
>>> plt.show()
../../_images/scipy-stats-boxcox_normmax-1_00_00.png

Alternatively, we can define our own optimizer function. Suppose we are only interested in values of lmbda on the interval [6, 7], we want to use scipy.optimize.minimize_scalar with method='bounded', and we want to use tighter tolerances when optimizing the log-likelihood function. To do this, we define a function that accepts positional argument fun and uses scipy.optimize.minimize_scalar to minimize fun subject to the provided bounds and tolerances:

>>> from scipy import optimize
>>> options = {'xatol': 1e-12}  # absolute tolerance on `x`
>>> def optimizer(fun):
...     return optimize.minimize_scalar(fun, bounds=(6, 7),
...                                     method="bounded", options=options)
>>> stats.boxcox_normmax(x, optimizer=optimizer)
6.000...