Kendall’s tau test#

Kendall’s tau is a measure of the correspondence between two rankings.

Consider the following data from [1], which studied the relationship between free proline (an amino acid) and total collagen (a protein often found in connective tissue) in unhealthy human livers.

The x and y arrays below record measurements of the two compounds. The observations are paired: each free proline measurement was taken from the same liver as the total collagen measurement at the same index.

import numpy as np
# total collagen (mg/g dry weight of liver)
x = np.array([7.1, 7.1, 7.2, 8.3, 9.4, 10.5, 11.4])
# free proline (μ mole/g dry weight of liver)
y = np.array([2.8, 2.9, 2.8, 2.6, 3.5, 4.6, 5.0])

These data were analyzed in [2] using Spearman’s correlation coefficient, a statistic similar to Kendall’s tau in that it is also sensitive to ordinal correlation between the samples. Let’s perform an analogous study using Kendall's tau.

from scipy import stats
res = stats.kendalltau(x, y)
res.statistic
np.float64(0.5499999999999999)

The value of this statistic tends to be high (close to 1) for samples with a strongly positive ordinal correlation, low (close to -1) for samples with a strongly negative ordinal correlation, and small in magnitude (close to zero) for samples with weak ordinal correlation.

The test is performed by comparing the observed value of the statistic against the null distribution: the distribution of statistic values derived under the null hypothesis that total collagen and free proline measurements are independent.

For this test, the null distribution for large samples without ties is approximated as the normal distribution with variance (2*(2*n + 5))/(9*n*(n - 1)), where n = len(x).

import matplotlib.pyplot as plt
n = len(x)  # len(x) == len(y)
var = (2*(2*n + 5))/(9*n*(n - 1))
dist = stats.norm(scale=np.sqrt(var))
z_vals = np.linspace(-1.25, 1.25, 100)
pdf = dist.pdf(z_vals)
fig, ax = plt.subplots(figsize=(8, 5))

def plot(ax):  # we'll reuse this
    ax.plot(z_vals, pdf)
    ax.set_title("Kendall Tau Test Null Distribution")
    ax.set_xlabel("statistic")
    ax.set_ylabel("probability density")

plot(ax)
plt.show()
../../_images/3cec142b33c0e5fd04676b4ed609a68015857b3133308e222fc5e5b3eaadd218.png

The comparison is quantified by the p-value: the proportion of values in the null distribution as extreme or more extreme than the observed value of the statistic. In a two-sided test in which the statistic is positive, elements of the null distribution greater than the transformed statistic and elements of the null distribution less than the negative of the observed statistic are both considered “more extreme”.

fig, ax = plt.subplots(figsize=(8, 5))
plot(ax)
pvalue = dist.cdf(-res.statistic) + dist.sf(res.statistic)
annotation = (f'p-value={pvalue:.4f}\n(shaded area)')
props = dict(facecolor='black', width=1, headwidth=5, headlength=8)
_ = ax.annotate(annotation, (0.65, 0.15), (0.8, 0.3), arrowprops=props)
i = z_vals >= res.statistic
ax.fill_between(z_vals[i], y1=0, y2=pdf[i], color='C0')
i = z_vals <= -res.statistic
ax.fill_between(z_vals[i], y1=0, y2=pdf[i], color='C0')
ax.set_xlim(-1.25, 1.25)
ax.set_ylim(0, 0.5)
plt.show()
../../_images/6bf6a49ae48afb9860e8a8fb5e83f93d8b60258a3a3ca9297e898ee4451e0cf7.png
res.pvalue
np.float64(0.09108705741631495)

Note that there is slight disagreement between the shaded area of the curve and the p-value returned by scipy.stats.kendalltau. This is because our data has ties, and we have neglected a tie correction to the null distribution variance that scipy.stats.kendalltau performs. For samples without ties, the shaded areas of our plot and p-value returned by scipy.stats.kendalltau would match exactly.

If the p-value is “small” - that is, if there is a low probability of sampling data from independent distributions that produces such an extreme value of the statistic - this may be taken as evidence against the null hypothesis in favor of the alternative: the distribution of total collagen and free proline are not independent. Note that:

  • The inverse is not true; that is, the test is not used to provide evidence for the null hypothesis.

  • The threshold for values that will be considered “small” is a choice that should be made before the data is analyzed [3] with consideration of the risks of both false positives (incorrectly rejecting the null hypothesis) and false negatives (failure to reject a false null hypothesis).

  • Small p-values are not evidence for a large effect; rather, they can only provide evidence for a “significant” effect, meaning that they are unlikely to have occurred under the null hypothesis.

For samples without ties of moderate size, scipy.stats.kendalltau can compute the p-value exactly. However, in the presence of ties, scipy.stats.kendalltau resorts to an asymptotic approximation. Nonetheless, we can use a permutation test to compute the null distribution exactly: Under the null hypothesis that total collagen and free proline are independent, each of the free proline measurements were equally likely to have been observed with any of the total collagen measurements. Therefore, we can form an exact null distribution by calculating the statistic under each possible pairing of elements between x and y.

def statistic(x):  # explore all possible pairings by permuting `x`
    return stats.kendalltau(x, y).statistic  # ignore pvalue
ref = stats.permutation_test((x,), statistic,
                             permutation_type='pairings')
fig, ax = plt.subplots(figsize=(8, 5))
plot(ax)
bins = np.linspace(-1.25, 1.25, 25)
ax.hist(ref.null_distribution, bins=bins, density=True)
ax.legend(['aymptotic approximation\n(many observations)',
           'exact null distribution'])
plot(ax)
plt.show()
../../_images/b834058b520462beaf588b290d3d7bd177864e1563fde5c97e4651c6a220ea8d.png
ref.pvalue
np.float64(0.12222222222222222)

Note that there is significant disagreement between the exact p-value calculated here and the approximation returned by scipy.stats.kendalltau above. For small samples with ties, consider performing a permutation test for more accurate results.

References#