Deep Probabalistic Models: Tutorial 1

Tutorial 1: Flow-Based Models and GAN

Welcome to the first tutorial for this week! This tutorial is a brief computational exploration of flow-based models and GANs within PyTorch.

The goal of this tutorial is simple: for you to have you play around with flows and GANs on some simple examples (the full Iris data, and a swiss roll dataset).

This notebook has also been written in a manner that will serve as a nice reference implementation for you in the future.

N.B. Please be sure to run each code cell as your progress through the notebook.

Flow-Based Models

Guided Learning

The following code cell will import the famous "Iris" data.

Flows typically fit quicker if the data is first standardized (remove the sample mean and divide by sample variance for each column), so we do that below using a StandardScaler from sklearn

Run the code below to visualize the data with four different bivariate plots (visualizing two dimensions at a time). You will notice that the data has an interesting shape and thus will require a flexible model!

Next, we will fit a flow based model to the four-dimensional data (note that the figure in the lectures was only fit to the 2D data of petal length vs. petal width, so we are doing something a little different to that).

The code below creates a 4D base distribution, as well as a single autoregressive transform with each of the four splines having 8 bins.

The code below computes the total number of parameters used...

That is a lot of parameters for fitting a 4D density (by contrast, you can fit a mixture of two Gaussians with only 16!), but such is the nature of using neural networks.

The train_flow_model function below takes five inputs:

The code below will train the flow we set up for Xdist previously. It may take a minute or so depending on the number of steps and subsample batch size.

Plotting the (estimated) loss over time seems to imply that the training has converged.

Pro Tip: If training is not performing well, or is unstable, try reducing the learning rate, and/or increasing the number of subsamples used at each iteration. The former is typically required for deeper (more transforms) flows.

Transformation Objects

Prior to looking at our fit, we will talk about how to use the transformation object we created that defines our distributional family.

To create new data (or assess how well the original data gets transformed back to a normal), we wish to send samples through the learned $T$ and its inverse.

Recall that we created a transformation object T above.

Transformation objects have both a forward and inverse method defined on them. The following methods are worth noting:

  1. T(Z): returns $T(Z)$
  2. T.inv(X): returns $T^{-1}(X)$

The above is demonstrated below by generating a sample $Z \sim {\cal N}(0,{\rm I})$, computing $X = T(Z)$, and then computing $Z = T^{-1}(X)$. Note that putting the sample through the transform and then the resulting sample through the inverse transform correctly yields the original sample back.

The above also works with an arbitrary amount of samples. The .sample method of the base distribution just needs to be passed the requested number of samples inside of [ ]. Calling distZ.sample() is the same as distZ.sample([1]). Below, we generate 3 samples.

Below, we sample 500 times from the learned distribution, and plot the observations in bivariate plots with the original data.

Note that, as is the case with minimizing $KL(p||q)$, the learned distribution is very conservative where it places mass.

If the model has fit the training data well, we should expect all the bivariate plots of the inverse transformed data to look like samples from a $N(\mathbf{0}, I)$ distribution. Below we plot the inverse transformed dataset, along with samples from a $N(\mathbf{0}, I)$ for comparison.

Multiple Transformations

For the next example, we will use multiple iterations of Real NVP with reverse permutation operations to fit data of a very challenging shape.

In this section, you will explore using flows of simpler transformations. The code below only does one Real NVP transformation, so it will leave half the variables unchanged as is.

Note that training using Real NVP without splines is very fast!

Run the cell below to plot the result.Recall that the variable on the $x$-axis is marginally ${\cal N}(0,1)$ as we only used one layer of Real NVP!

Flow-Based Models: Exercise (Practical)

Modify the Real NVP flow code above and train it so you obtain a good fit to the points.

You may wish to change:

Flow-Based Models: Exercise (Analytical)

Johnson's SU-distribution is a four-parameter one-dimensional distribution arising from the transformation of a standard normal:

$$X = \mu + \sigma \sinh\left(\frac{Z - \gamma}{\delta} \right)$$

where $\mu \in \mathbb{R}$, $\gamma \in \mathbb{R}$, $\sigma > 0$, $\delta >0$, and $Z \sim {\cal N}(0,1)$.

Using that $\sinh^{-1}(x) = \log\left(x + \sqrt{x^2+1}\right)$, derive the probability density function of $X$ using the change of variables theorem.

Generative Adversarial Networks

In this part of the tutorial, we will try to fit data from the swiss roll dataset using a GAN. As this dataset seems to follow the manifold hypothesis very well (data is a small perturbation from a 2D manifold in this case), we will try to fit it using a GAN which only has a dimension of two (2). We plot the same dataset viewed from two different angles.