Structural operational semantics for probabilistic programming languages typically characterize each elementary step in an execution terms of

- The resulting value and/or changes to the environment.
- Effects on the probability of the execution.

In many probabilistic programming languages, the elementary steps in an execution can be separated into 3 categories

- Deterministic steps, which have no effect on the probability of the execution.
- Stochastic steps, which may have an effect the probability of the execution, but can typically also be executed naively to sample from the prior.
- Conditioning steps, which have no effect on the program state other than altering its probability under the target density.

In a probabilistic program system in which the back end implements Monte Carlo (i.e. sampling-based) inference methods, steps of type 1 execute in the same manner regardless of the chosen inference method, whereas steps of the second and third type require a method-specific implementation. In this setting it makes sense to separate “inference” semantics, that is to say the operational semantics of the method-specific steps, from the “language” semantics, which is to say the operational semantics associated with the deterministic steps in the computation.

In this abstract describe a family of “black box” methods for parameter learning, which are related to both black box variational inference [1,2] and classic policy gradient methods such as REINFORCE [3] under an appropriate planning as inference implementation [4]. We abstract away from the details of the language by defining an interface between a deterministic computation *P* which performs all steps of type 1 and relegates to an inference back end *B *when encountering steps of type 2 and 3. This makes it possible to reason about the inference semantics for these algorithms, which are based on importance sampling semantics, from the operational semantics of the modeling language.

We refer to a longer paper [5] on Arxiv for a more detailed study of the use of this class of methods in the context of upper-level policy search problems, in which the probabilistic program defines a stochastic simulator for both the agent and the problem domain.

[1] Wingate, David, and Theo Weber. 2013. “Automated Variational Inference in Probabilistic Programming.” arXiv Preprint arXiv:1301.1299: 1–7. http://arxiv.org/abs/1301.1299.

[2] Ranganath, Rajesh, Sean Gerrish, and David M Blei. 2014. “Black Box Variational Inference.” In *Artificial Intelligence and Statistics*.

[3] Williams, Ronald J. 1992. “Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning.” Machine Learning 8 (3-4): 229–256. doi:10.1007/BF00992696.

[4] Toussaint, Marc, Stefan Harmeling, and Amos Storkey. 2006. “Probabilistic Inference for Solving (PO)MDPs.” Neural Computation 31 (December): 357–373. http://eprints.pascal-network.org/archive/00003924/.

[5] van de Meent, Jan-Willem, David Tolpin, Brooks Paige, and Frank Wood. 2015. “Black-Box Policy Search with Probabilistic Programs.” arXiv: 1507.04635. http://arxiv.org/abs/1507.04635.