A Lambda-Calculus Foundation for Universal Probabilistic Programming

Authors: Johannes Borgström, Ugo Dal Lago, Andrew D. Gordon, Marcin Szymczak

We develop the operational semantics of a probabilistic lambda- calculus with continuous distributions, as a foundation for universal probabilistic programming languages such as Church, Anglican, and Venture. Our first contribution is to adapt the classic operational semantics of lambda-calculus to the continuous case, via creating a measure space on terms and defining step-indexed approximations. We prove equivalence of big-step and small-step formulations of this distributional semantics. Our second contribution is to formalize the implementation technique of trace MCMC for our calculus and to show correctness. A key step is defining a sampling semantics of a term as a function from a trace of random samples to a value, and showing that the distribution induced by integrating over all traces equals the distributional semantics. Another step is defining sufficient conditions for the distribution induced by trace MCMC to converge to the distributional semantics. To the best of our knowledge, this is the first rigorous correctness proof for trace MCMC for a higher-order functional language.

Extended abstract

Draft paper

PPS slides

This entry was posted in Uncategorized. Bookmark the permalink.

7 Responses to A Lambda-Calculus Foundation for Universal Probabilistic Programming

  1. Jeremy Gibbons says:

    Can you upload the abstract you submitted for review, so that we can read it en route to Florida?

  2. Marcin Szymczak says:

    I have uploaded our extended abstract and added a link in the post above.

  3. Marcin Szymczak says:

    I have added a link to an up-to-date draft of our paper

  4. I hope I can ask a quick question without having read your paper: Do you assume that each of your primitive distributions has bounded density?

    As we discussed during discussion, I’m also interested in details of the difficulties you encountered when trying to use Tierney 1998’s general formulation of the acceptance “ratio”.

    • Hi Ken,
      The distributional semantics doers not assume existence of a density.
      The trace semantics does, but only requires that it integrates to <1 (and yields a kernel, which may imply additional requirements).

    • We use Tierney94 for its strong convergence results. Perhaps they can be ported to the more general setting of Tierney98, but we have not investigated this.

  5. I am also curious for a comparison between the ways functions are treated here and treated at http://pps2016.soic.indiana.edu/2015/12/11/semantics-of-higher-order-probabilistic-programs/#comment-22

    Maybe one way to make the comparison is to answer Mitch’s challenge at http://pps2016.soic.indiana.edu/2016/01/07/semantics-challenge/

Leave a Reply