scipy.stats.entropy#
- scipy.stats.entropy(pk, qk=None, base=None, axis=0)[source]#
Calculate the Shannon entropy/relative entropy of given distribution(s).
If only probabilities pk are given, the Shannon entropy is calculated as
H = -sum(pk * log(pk))
.If qk is not None, then compute the relative entropy
D = sum(pk * log(pk / qk))
. This quantity is also known as the Kullback-Leibler divergence.This routine will normalize pk and qk if they donâ€™t sum to 1.
- Parameters:
- pkarray_like
Defines the (discrete) distribution. Along each axis-slice of
pk
, elementi
is the (possibly unnormalized) probability of eventi
.- qkarray_like, optional
Sequence against which the relative entropy is computed. Should be in the same format as pk.
- basefloat, optional
The logarithmic base to use, defaults to
e
(natural logarithm).- axisint, optional
The axis along which the entropy is calculated. Default is 0.
- Returns:
- S{float, array_like}
The calculated entropy.
Notes
Informally, the Shannon entropy quantifies the expected uncertainty inherent in the possible outcomes of a discrete random variable. For example, if messages consisting of sequences of symbols from a set are to be encoded and transmitted over a noiseless channel, then the Shannon entropy
H(pk)
gives a tight lower bound for the average number of units of information needed per symbol if the symbols occur with frequencies governed by the discrete distribution pk [1]. The choice of base determines the choice of units; e.g.,e
for nats,2
for bits, etc.The relative entropy,
D(pk|qk)
, quantifies the increase in the average number of units of information needed per symbol if the encoding is optimized for the probability distribution qk instead of the true distribution pk. Informally, the relative entropy quantifies the expected excess in surprise experienced if one believes the true distribution is qk when it is actually pk.A related quantity, the cross entropy
CE(pk, qk)
, satisfies the equationCE(pk, qk) = H(pk) + D(pk|qk)
and can also be calculated with the formulaCE = -sum(pk * log(qk))
. It gives the average number of units of information needed per symbol if an encoding is optimized for the probability distribution qk when the true distribution is pk. It is not computed directly byentropy
, but it can be computed using two calls to the function (see Examples).See [2] for more information.
References
[1]Shannon, C.E. (1948), A Mathematical Theory of Communication. Bell System Technical Journal, 27: 379-423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x
[2]Thomas M. Cover and Joy A. Thomas. 2006. Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing). Wiley-Interscience, USA.
Examples
The outcome of a fair coin is the most uncertain:
>>> import numpy as np >>> from scipy.stats import entropy >>> base = 2 # work in units of bits >>> pk = np.array([1/2, 1/2]) # fair coin >>> H = entropy(pk, base=base) >>> H 1.0 >>> H == -np.sum(pk * np.log(pk)) / np.log(base) True
The outcome of a biased coin is less uncertain:
>>> qk = np.array([9/10, 1/10]) # biased coin >>> entropy(qk, base=base) 0.46899559358928117
The relative entropy between the fair coin and biased coin is calculated as:
>>> D = entropy(pk, qk, base=base) >>> D 0.7369655941662062 >>> D == np.sum(pk * np.log(pk/qk)) / np.log(base) True
The cross entropy can be calculated as the sum of the entropy and relative entropy`:
>>> CE = entropy(pk, base=base) + entropy(pk, qk, base=base) >>> CE 1.736965594166206 >>> CE == -np.sum(pk * np.log(qk)) / np.log(base) True