We define a core calculus for the purpose of investigating reasoning principles of probabilistic programming languages. By using a variation of a technique called higher-order abstract syntax (HOAS), which is common in the implementation of domain-specific languages, the calculus captures the semantics of a stochastic language with observation while being agnostic to the details of its deterministic portions. By remaining agnostic to the non-stochastic portions of the language, this style of semantics enables the discovery of general reasoning principles for the principled manipulation of probabilistic program fragments by programmers, compilers, and analysis tools. This generality allows us to reason about probabilistic program fragments without the need to resort to the underlying measure theory in every instance, by instead enabling reasoning in terms of the core calculus in a way that we believe to be applicable to various surface-level languages.

Theo Giannakopoulos (BAE Systems) and Mitchell Wand & Andrew Cobb (Northeastern University)

**Q&A**:

- Q: Why develop two core calculi?

A: Using a calculus that does not support recursion makes the initial discovery process easier, since it avoids the technical overhead of dealing with recursion. As long as we can make progress within the non-recursive calculus, we would prefer to do so, especially when it seems that our results do not rely on the lack of recursion. However, most PPLs support recursion and looping constructs, and so we will eventually need to be working in a calculus that includes those constructs. - Q: Why use HOAS trees to express the uniform dist-equivalence property?

A: The property can be expressed independently of the surface language, e.g. by working directly in Panangaden’s category of Markov kernels. However, having a surface language is convenient for expressing properties and for providing a framework for later introducing additional constructs. - Q: Why can’t be expressed in terms of , while and can?

A: is a way to express a “measure literal” in the language. In particular, is the embedding of the zero measure and is the embedding of the indicator function for 42. On the other hand, is an operator which combines two measures, and so is not a measure itself.

This is very interesting to me – I hadn’t seen your example from Fig 1 before. The conclusion has to be that “equivalent up to a normalization factor” does not mean “equivalent”. And indeed that seems to be a reasonable position to take. Seen from a different perspective, proper distributions (which sum to unity) represent terms in a theory involving just a weighted choice operator, whereas subdistributions (which may sum to anything in the unit interval) represent terms in an extended theory that also involves conditioning; one would not expect a term in the extended theory that makes use of conditioning to be equivalent to a term in the simpler theory that cannot.

Whereas proper distributions form a monad, the “Giry monad” (actually due to Lawvere), I believe that subdistributions form a *graded monad*. A graded monad T over a monoid (M,1,*) is built not on an ordinary functor T, but an indexed family of functors T_m for m in M. The unit (“return” in Haskell) yields the unit index, and multiplication (“join” in Haskell) multiplies indices:

eta_A : A -> T_1 A

mu_A^(m,n) : T_m (T_n A) -> T_(m*n) A

The laws of a graded monad are the same as those for an ordinary monad; it’s just the types that are more refined. There’s a paper on using graded monads for effect calculi by Dominic Orchard and Nobuko Yoshida at POPL (talk at 4.30pm on Thursday). I’ve been using them for some recent work on database queries. For subdistributions, I believe that we want the monoid of probabilities with multiplication, where the index captures the total weight of the subdistribution.