Challenge #1: An Identity for (Posterior) Marginal pdfs
Hello everyone! I’ve been planning to start a blog for a long time, but of course could never find the time. So I’ve opted to kick off with a small challenge.
A while back, when Pat Laub and I were finishing our PhDs, we wrote a paper along with Zdravko Botev (now published in Mathematics and Computers in Simulation) on the Monte Carlo estimation of the density of the sum of random variables. We came up with new estimators and control variates for the problem that worked pretty well. While it was on a pretty niche topic, the underlying idea was small but neat: the pdf of a sum is the derivative of its cdf, so we tackled the problem using derivative estimation (and a special change of variables to make it amenable to such techniques).
Somewhere along the line, I figured out you could use the same approach to estimate marginal pdfs of an unnormalised probability density point wise, and derived this interesting identity for the pointwise value of the marginal pdfs
\[p_{x_i}(s) = s^{-1}\mathbb{E}\left[X_i \cdot \left[\frac{\partial \log p(\mathbf{x})}{\partial x_i}\right]\Bigg|_{\mathbf{x}=\mathbf{X}}\mathbb{I}\{X_i <s\}\right] + s^{-1}\mathbb{P}(X_i < s), \]
where the expectation is taken with respect to the joint distribution of $\mathbf{X}$. An important thing to note is that the left hand side is the properly normalized marginal pdf evaluated at $s$, whereas the right hand side does not require the joint pdf to be normalised (due to the log-gradient).
Don’t stare at the identity too long, or you may convince yourself that there is some profound meaning to it! On that note, the identity and the control variates we derived in the paper reveal a strange connection to the Stein Operator acting on a certain function (in fact, using a different change of variables that isn’t mentioned in the paper yields yet another valid identity that is also related to the Stein Operator, coincidence?), so perhaps there is some deep connection.
Anyways, using a small example we showed that you can use a Monte Carlo estimate of the identity above to obtain better estimates of the marginal pdfs than those of a kernel density estimator, which was kind of cool but not all that useful. Since then, I’ve been plagued by the feeling that there should be some other (less trivial) use of this identity.
My only useful idea was the following: Noting that $p(\mathbf{x}) = \tilde{p}(\mathbf{x})/Z$, where $\tilde{p}$ is the unnormalised density and $Z$ is the normalising constant, we have the identity that \[Z = \tilde{p}(\mathbf{x})/p(\mathbf{x}) = \tilde{p}(\mathbf{x})/p(x_1)p(x_2|x_1)\cdots p(x_d | x_{d-1},\ldots, x_1),\]and so one could estimate the normalising constant estimating $d$ individual marginal pdfs. This essentially is a variant of Chib’s Method, where we use an estimator based on the identity above as opposed to a Conditional Monte Carlo estimator — unfortunately, it didn’t work that well. So…
The Challenge: Come up with a use for the identity above (or estimator thereof)! If needed, feel free to also assume that it is possible to obtain an unbiased estimator even when using MCMC to get the samples (it is, but that is one for another time).
Edit: The initial post omitted the indicator function from the identity! It is now corrected and matches the identity in the paper.
Additional Comment: Applying the identical derivation to the derivative of one minus the tail cdf yields the similar, yet distinct, identity
\[p_{x_i}(s) = -s^{-1}\mathbb{E}\left[X_i \cdot \left[\frac{\partial \log f(\mathbf{x})}{\partial x_i}\right]\Bigg|_{\mathbf{x}=\mathbf{X}}\mathbb{I}\{X_i >s\}\right] – s^{-1}\mathbb{P}(X_i > s). \]