NIPS 2018
Sun Dec 2nd through Sat the 8th, 2018 at Palais des Congrès de Montréal

### Reviewer 1

In recent years, a lot of effort has been dedicated to the study of the discretization of the Langevin diffusion, known as the Langevin Monte Carlo (LMC) to sample from a distribution p. Previous works on LMC have mainly focused on log-concave densities to establish precise upper bounds on the number of iterations sufficient for the algorithm to be close to p (in total variation or Wasserstein distance). This article proposes to extend these polynomial bounds to a mixture of strongly log concave densities. For that purpose, the authors combine the Langevin diffusion with a simulated tempering that enables to mix between the modes of the distribution. The article has two main contributions. First, it suggests a new algorithm (simulated tempering combined with the Langevin diffusion) to sample from a mixture of strongly log-concave densities. Second, it provides polynomial bounds for the running time of the algorithm. I found the idea of this paper interesting but I found it very hard to read and understand. The authors should really make an effort to improve the presentation of their methods and the proof of their theoretical results. I did not understand at first reading the algorithm ! The authors do not provide any numerical illustrations of their algorithm. I think a simple example would be welcome. Other comments 1. line 44, typos: ”a log-concave distribution”, ”its density function” 2. line 47, what is a posterior distribution of the means for a mixture of gaussians ? 3. line 51, ”As worst-case results are prohibited by hardness results” please provide references. 4. line 73, typo ”The corresponding distribution e −f (x) are strongly log-concave distributions”. Do you mean e −f 0 (x) ? 5. Theorem 1.2. Please define the running time. 6. line 151, ”higher temperature distribution”, is β ∝ 1/T where T is the temperature ? Please clarify ”Using terminology from statistical physics, β = 1/τ is the inverse temperature” line 173 before this first statement line 151. 7. line 158, ”with stationary distribution.” an expression missing ?, e −β i f ? 8. line 167, typo, ”is to preserve” 9. Proposition 2.1. The discretization of the Langevin dynamics yields a non-reversible Markov chain, of invariant distribution close but different from p the target distribution. 10. line 195, Algorithm 1. x ← x − η 0 β i ∇f (x) + ..., confusion with n ← n + 1 and subsequent indices. Algorithm 2, what is δ ? 11. line 200, typo simulated.