Continuous Statistical Distributions#

Overview#

All distributions will have location (L) and Scale (S) parameters along with any shape parameters needed, the names for the shape parameters will vary. Standard form for the distributions will be given where L=0.0 and S=1.0. The nonstandard forms can be obtained for the various functions using (note U is a standard uniform random variate).

Function Name

Standard Function

Transformation

Cumulative Distribution Function (CDF)

F(x)

F(x;L,S)=F((xL)S)

Probability Density Function (PDF)

f(x)=F(x)

f(x;L,S)=1Sf((xL)S)

Percent Point Function (PPF)

G(q)=F1(q)

G(q;L,S)=L+SG(q)

Probability Sparsity Function (PSF)

g(q)=G(q)

g(q;L,S)=Sg(q)

Hazard Function (HF)

ha(x)=f(x)1F(x)

ha(x;L,S)=1Sha((xL)S)

Cumulative Hazard Function (CHF)

Ha(x)= log11F(x)

Ha(x;L,S)=Ha((xL)S)

Survival Function (SF)

S(x)=1F(x)

S(x;L,S)=S((xL)S)

Inverse Survival Function (ISF)

Z(α)=S1(α)=G(1α)

Z(α;L,S)=L+SZ(α)

Moment Generating Function (MGF)

MY(t)=E[eYt]

MX(t)=eLtMY(St)

Random Variates

Y=G(U)

X=L+SY

(Differential) Entropy

h[Y]=f(y)logf(y)dy

h[X]=h[Y]+logS

(Non-central) Moments

μn=E[Yn]

E[Xn]=Lnk=0N(nk)(SL)kμk

Central Moments

μn=E[(Yμ)n]

E[(XμX)n]=Snμn

mean (mode, median), var

μ,μ2

L+Sμ,S2μ2

skewness

γ1=μ3(μ2)3/2

γ1

kurtosis

γ2=μ4(μ2)23

γ2

Moments#

Non-central moments are defined using the PDF

μn=xnf(x)dx.

Note, that these can always be computed using the PPF. Substitute x=G(q) in the above equation and get

μn=01Gn(q)dq

which may be easier to compute numerically. Note that q=F(x) so that dq=f(x)dx. Central moments are computed similarly μ=μ1

μn=(xμ)nf(x)dx=01(G(q)μ)ndq=k=0n(nk)(μ)kμnk

In particular

μ3=μ33μμ2+2μ3=μ33μμ2μ3μ4=μ44μμ3+6μ2μ23μ4=μ44μμ36μ2μ2μ4

Skewness is defined as

γ1=β1=μ3μ23/2

while (Fisher) kurtosis is

γ2=μ4μ223,

so that a normal distribution has a kurtosis of zero.

Median and mode#

The median, mn is defined as the point at which half of the density is on one side and half on the other. In other words, F(mn)=12 so that

mn=G(12).

In addition, the mode, md , is defined as the value for which the probability density function reaches it’s peak

md=argmaxxf(x).

Fitting data#

To fit data to a distribution, maximizing the likelihood function is common. Alternatively, some distributions have well-known minimum variance unbiased estimators. These will be chosen by default, but the likelihood function will always be available for minimizing.

If f(x;θ) is the PDF of a random-variable where θ is a vector of parameters ( e.g. L and S ), then for a collection of N independent samples from this distribution, the joint distribution the random vector x is

f(x;θ)=i=1Nf(xi;θ).

The maximum likelihood estimate of the parameters θ are the parameters which maximize this function with x fixed and given by the data:

θes=argmaxθf(x;θ)=argminθlx(θ).

Where

lx(θ)=i=1Nlogf(xi;θ)=Nlogf(xi;θ)¯

Note that if θ includes only shape parameters, the location and scale-parameters can be fit by replacing xi with (xiL)/S in the log-likelihood function adding NlogS and minimizing, thus

lx(L,S;θ)=NlogSi=1Nlogf(xiLS;θ)=NlogS+lxSL(θ)

If desired, sample estimates for L and S (not necessarily maximum likelihood estimates) can be obtained from samples estimates of the mean and variance using

S^=μ^2μ2L^=μ^S^μ

where μ and μ2 are assumed known as the mean and variance of the untransformed distribution (when L=0 and S=1 ) and

μ^=1Ni=1Nxi=x¯μ^2=1N1i=1N(xiμ^)2=NN1(xx¯)2¯

Standard notation for mean#

We will use

y(x)¯=1Ni=1Ny(xi)

where N should be clear from context as the number of samples xi

References#

In the tutorials several special functions appear repeatedly and are listed here.

Symbol

Description

Definition

γ(s,x)

lower incomplete Gamma function

0xts1etdt

Γ(s,x)

upper incomplete Gamma function

xts1etdt

B(x;a,b)

incomplete Beta function

0xta1(1t)b1dt

I(x;a,b)

regularized incomplete Beta function

Γ(a+b)Γ(a)Γ(b)0xta1(1t)b1dt

ϕ(x)

PDF for normal distribution

12πex2/2

Φ(x)

CDF for normal distribution

xϕ(t)dt=12+12erf(x2)

ψ(z)

digamma function

ddzlog(Γ(z))

ψn(z)

polygamma function

dn+1dzn+1log(Γ(z))

Iν(y)

modified Bessel function of the first kind

Ei(z)

exponential integral

xettdt

ζ(n)

Riemann zeta function

k=11kn

ζ(n,z)

Hurwitz zeta function

k=01(k+z)n

pFq(a1,,ap;b1,,bq;z)

Hypergeometric function

n=0(a1)n(ap)n(b1)n(bq)nznn!

Continuous Distributions in scipy.stats#