<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://adombowsky.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://adombowsky.github.io/" rel="alternate" type="text/html" /><updated>2026-04-10T15:12:31-07:00</updated><id>https://adombowsky.github.io/feed.xml</id><title type="html">Alexander Dombowsky</title><subtitle>personal description</subtitle><author><name>Alexander Dombowsky, PhD</name><email>alexander.dombowsky@gladstone.ucsf.edu</email></author><entry><title type="html">A very gentle introduction to e-values</title><link href="https://adombowsky.github.io/posts/2026/01/evaluesintro/" rel="alternate" type="text/html" title="A very gentle introduction to e-values" /><published>2026-01-24T00:00:00-08:00</published><updated>2026-01-24T00:00:00-08:00</updated><id>https://adombowsky.github.io/posts/2026/01/evalues</id><content type="html" xml:base="https://adombowsky.github.io/posts/2026/01/evaluesintro/"><![CDATA[<h1 id="preliminaries">Preliminaries</h1>
<p>This blog post is intended to provide some intuition behind e-values, a framework for hypothesis testing that has gained traction in the statistics and ML community over the last few years. The goal of this post is to briefly communicate <em>what</em> e-values are and how they’re <em>connected</em> to testing. I assume familiarity with statistics and the basic concepts of hypothesis testing, so be sure to brush up on those before reading. I will also use <em>Markov’s inequality</em>, which states that for any non-negative random variable $Z$ and $z&gt;0$,</p>

\[P(Z \geq z) \leq \frac{\mathbb E[Z]}{z}.\]

<h1 id="introduction">Introduction</h1>
<p>The null hypothesis is some state of nature that you are interested in testing. For example, in a t-test in regression, the null hypothesis is $H_0 : \beta_j = 0$, meaning that the $j$th variable has no impact on the mean of the outcome. We then construct a test statistic, in this case, $Z_j = \hat \beta_j/{ \hat \sigma \sqrt{v_j} }$, where $v_j$ is the $j$th entry along the diagonal of $(X^T X)^{-1}$. The idea behind this test statistic is that, if $H_0$ were true, then its magnitude would be small. More formally, under $H_0,$ $Z_j \sim t_{n-p-1}$, or a t-distribution with $n-p-1$ degrees of freedom, which is a symmetric distribution about zero. If $T_j=|Z_j|$ is large, then clearly this is evidence against the null, particularly for large $n$.</p>

<p>In practice, we reject the null when $T_j$ is greater than or equal to $k$, which is the $1-\alpha/2$ quantile of a $t_{n-p-1}$ distribution, where $\alpha \in (0,1)$. Why this quantile specifically? This is because we know the distribution of $Z_j$ under the null <em>exactly</em>, and so the test $\phi = \textbf{1}(T_j&gt;k)$ is a <em>level-$\alpha$ test</em>, i.e. $\mathbb E^0[\phi] = \alpha$, where that expectation is taken with respect to the distribution of the data when $\beta_j=0$. In general, a level-$\alpha$ test $\phi$ satisfies $\mathbb E^P[\phi] \leq \alpha$ for all distributions $P$ covered by the null hypothesis.</p>

<p>For many models, we may not know the distribution of the test statistic when $H_0$ is true. In that case, how can we design a level-$\alpha$ test (without replying on an asymptotic approximation)?</p>

<h1 id="e-variables">E-Variables</h1>
<p>More generally, say that the data are $Y=(Y_1, \dots, Y_n)$, and I have access to some nonnegative test statistic $E=E(Y)$. The notation here reflects that $E(Y)$ is a function of $Y$, and all probability statements below are meant to be expressed in terms of the distribution of $Y$. Say that I have a <em>simple</em> null hypothesis $H_0$, which just means that there is a single distribution $P$ for $Y$ that corresponds to $H_0$, which has pdf/pmf $p$.</p>

<p>Now, unlike the regression context, say that I do <em>not</em> know the distribution of $E$ if $Y \sim P$. However, I <em>do</em> know the following:</p>

\[\mathbb E^P[E(Y)] = \int E(y) p(y) \textrm{d} y \leq 1.\]

<p>If the above holds, then we call $E(Y)$ an <em>e-variable</em>, and $E(y)$, or the value that $E(Y)$ takes after observing the data, is an <em>e-value</em>. Now, consider the test $\phi = \textbf{1}(E(Y) \geq 1/\alpha)$. Like our earlier example, this test rejects $H_0$ when a test statistic, $E(Y)$, is large, but the cut-off is not some quantile, instead, it is a universal cut-off that does not depend on $P$.</p>

<p>But is $\phi$ a level-$\alpha$ test? The answer is yes, and this is because of Markov’s inequality. Since $\mathbb E^P[E(Y)] \leq 1$,</p>

\[\mathbb E^P[\phi] = P(E(Y) \geq 1/\alpha) \leq \frac{\mathbb E^P[E(Y)]}{1/\alpha} \leq \alpha \cdot 1 = \alpha.\]

<p>So, a level-$\alpha$ test can always be performed if we have access to an e-variable. The only knowledge we need of $E(Y)$ when $Y \sim P$ is that its expected value is no more than $1$. If $H_0$ were <em>composite</em> (i.e. multiple distributions for $Y$ would correspond to that state of nature), then we need $\mathbb E^P[E(Y)] \leq 1$ for all distributions $P$ in $H_0$.</p>

<h1 id="relationship-to-p-values">Relationship to P-Values</h1>

<p>Markov’s inequality gives us a way to create a level-$\alpha$ test. There may be something about the decision rule that looks familiar. For simplicity, assume that $E(Y)&gt;0$ almost surely for any $P$ in the null hypothesis. If we reject when $E(Y) \geq 1/\alpha$, this is equivalent to rejecting when $1/E(Y) \leq \alpha$. This is the <em>same decision rule</em> as a p-value; we reject the null hypothesis when a p-value is less than or equal to $\alpha$. We can go one-step farther: this shows that $1/E(Y)$ is a p-variable, or the random realization of a p-value (we’d have that $1/E(y)$ is a p-value here). Formally,</p>

<p>\(P(E(Y) \geq 1/\alpha) \leq \alpha \iff P(1/E(Y) \leq \alpha) \leq \alpha\).</p>

<h1 id="likelihood-ratios">Likelihood Ratios</h1>

<p>A key example of an e-variable is a likelihood ratio. For now, let’s assume that $Y \in \mathbb R^n$ and $p(y) &gt;0$ for any $y \in \mathbb R^n$. Returning to the simple hypothesis example of above, let $Q$ be any distribution with density $q$ (which also has full support on $\mathbb R^n$). A likelihood ratio is simply</p>

\[E(Y) = \frac{q(y)}{p(y)}.\]

<p>Then $E(Y)$ is an e-variable, since</p>

\[\mathbb E^P[E(Y)] = \int \frac{q(y)}{p(y)} p(y) \textrm{d} y = \int q(y) \textrm{d} y = 1,\]

<p>and so $\phi = \textbf{1}(E(Y) \geq 1/\alpha)$ is a level-$\alpha$ test. The choice of $Q$ depends on the alternative hypothesis $H_1$ (and is related to the <em>e-power</em> of $E(Y)$), but note that the e-variable property holds for any $Q$ with full support.</p>

<p>What do we do if $H_0$ has more than one distribution included in it? Suppose now that we model $Y$ with a parametric family, say ${P_\theta: \theta \in \Theta_0}$ for some parameter space $\Theta_0$. Let $\hat \theta_0$ be the maximum likelihood estimator for $\theta$ over $\Theta_0$. $\hat \theta_0$ maximizing the likelihood <em>over the null</em> is paramount for validity. Next, let</p>

\[E(Y) = \frac{q(y)}{p_{\hat \theta_0}(y)}.\]

<p>The denominator is now the maximized likelihood over $\Theta_0$. It turns out that $E(Y)$ is also an e-variable. This is because, for any $\theta \in \Theta_0$, $p_\theta(y) \leq p_{\hat \theta_0}(y)$. In turn, we have that</p>

\[E(Y) = \frac{q(y)}{p_{\hat \theta_0}(y)} \leq \frac{q(y)}{p_{\theta}(y)},\]

<p>implying,</p>

\[\mathbb E^{P_\theta}[E(Y)] \leq \int \frac{q(y)}{p_{\theta}(y)} p_\theta(y) \textrm{d}y = \int q(y) \textrm{d} y = 1.\]

<p>Hence, if we can maximize the likelihood over the null, then we can specify an e-variable via a likelihood ratio.</p>

<h1 id="combining-e-variables">Combining E-Variables</h1>

<p>If we observe an iid sample $Y_1, \dots, Y_n$ and compute e-variables $E(Y_1), \dots, E(Y_n)$, it is simple to combine these e-variables into a single e-variable. If $\mathcal P$ is the set of distributions covered by the null hypothesis, by definition $\mathbb E^P[Y_i] \leq 1$ for any $P \in \mathcal P$. It is useful to combine a list of e-variables into a single e-variable, say, $E_n(Y_1, \dots, Y_n)$, because we can construct a level-$\alpha$ test for the null by rejecting when $E_n(Y_1, \dots, Y_n) \geq 1/\alpha$. First, the average of the e-variables is an e-variable.</p>

\[\mathbb E^P \bigg [ \frac{1}{n} \sum_{i=1}^n E(Y_i) \bigg] = \frac{1}{n} \sum_{i=1}^n \mathbb E^P[E(Y_i)] \leq \frac{1}{n} \sum_{i=1}^n = \frac{n}{n} = 1.\]

<p>In fact, any convex combination of e-variables is also an e-variable using the same derivation. We don’t technically need that $Y_1, \dots, Y_n$ are independent for the e-variable criterion to hold. If the null hypothesis also assumes that $Y_1, \dots, Y_n$ are independent, the product is an e-variable.</p>

\[\mathbb E^P \bigg [ \prod_{i=1}^n E(Y_i) \bigg]  = \prod_{i=1}^n \mathbb E^P[E(Y_i)] \leq \prod_{i=1}^n = 1.\]

<p>More generally, $\prod_{i=1}^n ( 1 - \lambda_i + \lambda_i E(Y_i) )$ is an e-variable, where $\lambda_i \in (0,1)$ are constants. In summary, we can always combine e-variables using convex combinations and, if we also assume independence, weighted products.</p>

<h1 id="further-reading">Further Reading</h1>
<p>There is a wide (and rapidly expanding) body of work on e-values, and this blog post only covers the basics. Many of the topics I touched on have a broader mathematical framework, and there are several fascinating properties of e-values I have not discussed above, such as their usage in post-hoc inference and sequential testing. I would recommend starting with <a href="https://arxiv.org/abs/2410.23614">Hypothesis Testing with E-Values</a> and following the references, as well as <a href="https://www.pnas.org/doi/abs/10.1073/pnas.1922664117">Universal Inference</a>.</p>]]></content><author><name>Alexander Dombowsky, PhD</name><email>alexander.dombowsky@gladstone.ucsf.edu</email></author><category term="e-values" /><summary type="html"><![CDATA[You're probably familiar with p-values, but how much do you know about e-values? In this post, I give a light primer on e-values and their relationship to testing.]]></summary></entry></feed>