Teaching - Sampling: From MCMC to Generative Modeling (In construction)

Course

Dates: (every tuesday from 1rst of April to 6th of May included) 1, 8, 15, 22, 29th april, 6th of May. At least 2 practical sessions. Bring your computers.

(Intro): Motivation, Divergences between probability distributions, Reminders on basics from Markov Chain Monte Carlo (MCMC).
(Generative Modeling, first generation): VAEs, Normalizing Flows (NF), GANs.
(Bayesian inference): Bayesian problems, Langevin/OU diffusions, Variational Inference, Bayesian deep learning.
(Generative Modeling, second generation): Diffusion models.

Additional Resources:

Simulation and Monte Carlo methods by Nicolas Chopin (ENSAE 2A)

(Practical) Resources.

We may not have time to go through everything, but here are several practical Python notebooks that we may go through together or by yourself; or can even serve as a basis for a project if implemented in combination with novel ideas or datasets.

Langevin Monte carlo/Metropolis Hasting/Hamiltonian Monte Carlo
- Stochastic Langevin with linear model on simulated data, link
- Bayesian Logistic Regression on simulated data, link
Normalizing Flows
- Normalizing Flow in 2D on Mixture of Gaussians, link - check ReadMe to run it on Google Colab.
- To go further on Normalizing Flows (on images), link
Deep Generative Models
- GANs and VAEs, Deep Learning Indaba 2019, link
- Denoising Diffusion, Deep Learning Indaba 2022, link
- Denoising Diffusion, Xavier Bresson, link, commented code here
- Flow matching tutorial by S. Martin, R. Emonet, A. Gagneux, M. Massias, Q. Bertrand.
- Torch CFM library from A. Tong et al, 2023.

Project.

The course project will give the students a chance to explore MCMC and/or generative modeling in greater detail. Course projects will be done in groups of up to 3 students and can fall into one or more of the following categories:

Application of MCMC or deep generative models on a novel task/dataset.
Algorithmic improvements into the evaluation, learning and/or inference of deep generative models.
Theoretical analysis of any aspect of existing deep generative models.

The report must include theoretical, methodological and experimental considerations. You will be evaluated on these three aspects. Please indicate the contribution of each member of the team in the report. We do not expect the students to reproduce some of the high-dimensional experiments presented in some of these papers as we are aware of the compute limitations.

The deadline for submission is 5th of May 2025. You will be asked to present your result during an oral defence (eg with slides and/or notebook). Please send your
* pdf
* colab link with experiments/code
to anna.korba@ensae.fr.

Some examples:

Implement in Python Langevin Monte Carlo with birth and death dynamics as in this paper or that paper. No need to fully undertand the paper, but at least the pseudocode and reproduce the experiments. Can you identify setting where birth-death dynamics improve over standard LMC?
Study the results of Bayesian logistic regression on real data (see here for an example of what can be done) on a classification dataset (see UCI repository, or some examples as the ones in Section 5 of this paper).
Investigate the performance of Bayesian Neural Networks (see a pytorch implementation here) on different datasets (eg Fashion Mnist, or other classification datasets as the ones in Section 5 of this paper).
Investigate the data mollification effect as in this paper to compare different generative models on a set of experiments.