A very gentle introduction to e-values

5 minute read

Published:

Preliminaries

This blog post is intended to provide some intuition behind e-values, a framework for statistical hypothesis testing that has gained traction in the statistics and ML community over the last 5 years or so. The goal with this post is to briefly communicate what e-values are and how they’re connected to testing. I assume familiarity with statistics and the basic concepts of hypothesis testing, so be sure to brush up on those before reading. I will also use Markov’s inequality, which states that for any non-negative random variable $Z$ and $z>0$, \(P(Z \geq z) \leq \frac{\mathbb E[Z]}{z}.\)

Introduction

The null hypothesis is some state of nature that you are interested in testing. For example, in a t-test in regression, the null hypothesis is $H_0 : \beta_j = 0$, meaning that the $j$th variable has no impact on the mean of the outcome. We then construct a test statistic, in this case, $Z_j = \hat \beta_j/{ \hat \sigma \sqrt{v_j} }$, where $v_j$ is the $j$th entry along the diagonal of $(X^T X)^{-1}$. The idea behind this test statistic is that, if $H_0$ were true, then its magnitude would be small. More formally, under $H_0,$ $Z_j \sim t_{n-p-1}$, or a t-distribution with $n-p-1$ degrees of freedom, which is a symmetric distribution about zero. If $T_j=|Z_j|$ is large, then clearly this is evidence against the null, particularly for large $n$.

In practice, we reject the null when $T_j$ is greater than or equal to $k$, which is the $1-\alpha/2$ quantile of a $t_{n-p-1}$ distribution, where $\alpha \in (0,1)$. Why this quantile specifically? This is because we know the distribution of $Z_j$ under the null exactly, and so the test $\phi = \textbf{1}(T_j>k)$ is a level-$\alpha$ test, i.e. $\mathbb E^0[\phi] = \alpha$, where that expectation is taken with respect to the distribution of the data when $\beta_j=0$. In general, a level-$\alpha$ test $\phi$ satisfies $\mathbb E^P[\phi] \leq \alpha$ for all distributions $P$ covered by the null hypothesis.

For many models, we may not know the distribution of the test statistic when $H_0$ is true. In that case, how can we design a level-$\alpha$ test (without replying on an asymptotic approximation)?

E-Variables

More generally, say that the data are $Y=(Y_1, \dots, Y_n)$, and I have access to some nonnegative test statistic $E=E(Y)$. The notation here reflects that $E(Y)$ is a function of $Y$, and all probability statements below are meant to be expressed in terms of the distribution of $Y$. Say that I have a simple null hypothesis $H_0$, which just means that there is a single distribution $P$ for $Y$ that corresponds to $H_0$, which has pdf/pmf $p$.

Now, unlike the regression context, say that I do not know the distribution of $E$ if $Y \sim P$. However, I do know the following: \(\mathbb E^P[E(Y)] = \int E(y) p(y) \textrm{d} y \leq 1.\) If the above holds, then we call $E(Y)$ an e-variable, and $E(y)$, or the value that $E(Y)$ takes after observing the data, is an e-value. Now, consider the test $\phi = \textbf{1}(E(Y) \geq 1/\alpha)$. Like our earlier example, this test rejects $H_0$ when a test statistic, $E(Y)$, is large, but the cut-off is not some quantile, instead, it is a universal cut-off that does not depend on $P$.

But is $\phi$ a level-$\alpha$ test? The answer is yes, and this is because of Markov’s inequality. Since $\mathbb E^P[E(Y)] \leq 1$, \(\mathbb E^P[\phi] = P(E(Y) \geq 1/\alpha) \leq \frac{\mathbb E^P[E(Y)]}{1/\alpha} \leq \alpha \cdot 1 = \alpha.\) So, a level-$\alpha$ test can always be performed if we have access to an e-variable. The only knowledge we need of $E(Y)$ when $Y \sim P$ is that its expected value is no more than $1$. If $H_0$ were composite (i.e. multiple distributions for $Y$ would correspond to that state of nature), then we need $\mathbb E^P[E(Y)] \leq 1$ for all distributions $P$ in $H_0$.

Likelihood Ratios

A key example of an e-variable is a likelihood ratio. For now, let’s assume that $Y \in \mathbb R^n$ and $p(y) >0$ for any $y \in \mathbb R^n$. Returning to the simple hypothesis example of above, let $Q$ be any distribution with density $q$ (which also has full support on $\mathbb R^n$). A likelihood ratio is simply \(E(Y) = \frac{q(y)}{p(y)}.\) Then $E(Y)$ is an e-variable, since \(\mathbb E^P[E(Y)] = \int \frac{q(y)}{p(y)} p(y) \textrm{d} y = \int q(y) \textrm{d} y = 1,\) and so $\phi = \textbf{1}(E(Y) \geq 1/\alpha)$ is a level-$\alpha$ test. The choice of $Q$ depends on the alternative hypothesis $H_1$ (and is related to the e-power of $E(Y)$), but note that the e-variable property holds for any $Q$ with full support.

What do we do if $H_0$ has more than one distribution included in it? Suppose now that we model $Y$ with a parametric family, say ${P_\theta: \theta \in \Theta_0}$ for some parameter space $\Theta_0$. Let $\hat \theta_0$ be the maximum likelihood estimator for $\theta$ over $\Theta_0$. $\hat \theta_0$ maximizing the likelihood over the null is paramount for validity. Next, let \(E(Y) = \frac{q(y)}{p_{\hat \theta_0}(y)}.\) The denominator is now the maximized likelihood over $\Theta_0$. It turns out that $E(Y)$ is also an e-variable. This is because, for any $\theta \in \Theta_0$, $p_\theta(y) \leq p_{\hat \theta_0}(y)$. In turn, we have that \(E(Y) = \frac{q(y)}{p_{\hat \theta_0}(y)} \leq \frac{q(y)}{p_{\theta}(y)},\) implying, \(\mathbb E^{P_\theta}[E(Y)] \leq \int \frac{q(y)}{p_{\theta}(y)} p_\theta(y) \textrm{d}y = \int q(y) \textrm{d} y = 1.\) Hence, if we can maximize the likelihood over the null, then we can specify an e-variable via a likelihood ratio.

Further Reading

There is a wide (and rapidly expanding) literature on e-values, and this blog post only covers the basic properties and simple examples. I would recommend starting with Hypothesis Testing with E-Values and following the references therein, as well as Universal Inference.