- class scipy.stats.gaussian_kde(dataset, bw_method=None, weights=None)[source]#
Representation of a kernel-density estimate using Gaussian kernels.
Kernel density estimation is a way to estimate the probability density function (PDF) of a random variable in a non-parametric way.
gaussian_kdeworks for both uni-variate and multi-variate data. It includes automatic bandwidth determination. The estimation works best for a unimodal distribution; bimodal or multi-modal distributions tend to be oversmoothed.
Datapoints to estimate from. In case of univariate data this is a 1-D array, otherwise a 2-D array with shape (# of dims, # of data).
- bw_methodstr, scalar or callable, optional
The method used to calculate the estimator bandwidth. This can be ‘scott’, ‘silverman’, a scalar constant or a callable. If a scalar, this will be used directly as kde.factor. If a callable, it should take a
gaussian_kdeinstance as only parameter and return a scalar. If None (default), ‘scott’ is used. See Notes for more details.
- weightsarray_like, optional
weights of datapoints. This must be the same shape as dataset. If None (default), the samples are assumed to be equally weighted
Bandwidth selection strongly influences the estimate obtained from the KDE (much more so than the actual shape of the kernel). Bandwidth selection can be done by a “rule of thumb”, by cross-validation, by “plug-in methods” or by other means; see ,  for reviews.
gaussian_kdeuses a rule of thumb, the default is Scott’s Rule.
Scott’s Rule , implemented as
nthe number of data points and
dthe number of dimensions. In the case of unequally weighted points,
neffthe effective number of datapoints. Silverman’s Rule , implemented as
(n * (d + 2) / 4.)**(-1. / (d + 4)).
or in the case of unequally weighted points:
(neff * (d + 2) / 4.)**(-1. / (d + 4)).
Good general descriptions of kernel density estimation can be found in  and , the mathematics for this multi-dimensional implementation can be found in .
With a set of weighted samples, the effective number of datapoints
neffis defined by:
neff = sum(weights)^2 / sum(weights^2)
as detailed in .
gaussian_kdedoes not currently support data that lies in a lower-dimensional subspace of the space in which it is expressed. For such data, consider performing principle component analysis / dimensionality reduction and using
gaussian_kdewith the transformed data.
D.W. Scott, “Multivariate Density Estimation: Theory, Practice, and Visualization”, John Wiley & Sons, New York, Chicester, 1992. (1,2)
B.W. Silverman, “Density Estimation for Statistics and Data Analysis”, Vol. 26, Monographs on Statistics and Applied Probability, Chapman and Hall, London, 1986.
B.A. Turlach, “Bandwidth Selection in Kernel Density Estimation: A Review”, CORE and Institut de Statistique, Vol. 19, pp. 1-33, 1993.
D.M. Bashtannyk and R.J. Hyndman, “Bandwidth selection for kernel conditional density estimation”, Computational Statistics & Data Analysis, Vol. 36, pp. 279-298, 2001.
Gray P. G., 1969, Journal of the Royal Statistical Society. Series A (General), 132, 272
Generate some random two-dimensional data:
>>> import numpy as np >>> from scipy import stats >>> def measure(n): ... "Measurement model, return two coupled measurements." ... m1 = np.random.normal(size=n) ... m2 = np.random.normal(scale=0.5, size=n) ... return m1+m2, m1-m2
>>> m1, m2 = measure(2000) >>> xmin = m1.min() >>> xmax = m1.max() >>> ymin = m2.min() >>> ymax = m2.max()
Perform a kernel density estimate on the data:
>>> X, Y = np.mgrid[xmin:xmax:100j, ymin:ymax:100j] >>> positions = np.vstack([X.ravel(), Y.ravel()]) >>> values = np.vstack([m1, m2]) >>> kernel = stats.gaussian_kde(values) >>> Z = np.reshape(kernel(positions).T, X.shape)
Plot the results:
>>> import matplotlib.pyplot as plt >>> fig, ax = plt.subplots() >>> ax.imshow(np.rot90(Z), cmap=plt.cm.gist_earth_r, ... extent=[xmin, xmax, ymin, ymax]) >>> ax.plot(m1, m2, 'k.', markersize=2) >>> ax.set_xlim([xmin, xmax]) >>> ax.set_ylim([ymin, ymax]) >>> plt.show()
The dataset with which
Number of dimensions.
Number of datapoints.
Effective number of datapoints.
New in version 1.2.0.
The bandwidth factor, obtained from kde.covariance_factor. The square of kde.factor multiplies the covariance matrix of the data in the kde estimation.
The covariance matrix of dataset, scaled by the calculated bandwidth (kde.factor).
The inverse of covariance.
Evaluate the estimated pdf on a set of points.
Evaluate the estimated pdf on a set of points.
Multiply estimated density by a multivariate Gaussian and integrate over the whole space.
Computes the integral of a 1D pdf between two bounds.
integrate_box(low_bounds, high_bounds[, maxpts])
Computes the integral of a pdf over a rectangular interval.
Computes the integral of the product of this kernel density estimate with another.
Evaluate the estimated pdf on a provided set of points.
Evaluate the log of the estimated pdf on a provided set of points.
Randomly sample a dataset from the estimated pdf.
Compute the estimator bandwidth with given method.
Computes the coefficient (kde.factor) that multiplies the data covariance matrix to obtain the kernel covariance matrix.