This paper builds upon Guu et al. (2018)’s prototype-driven text generation approach. Two major changes are made: first, modeling a sparse distribution over prototypes with a Dirichlet prior over a multinomial, and second, actually learning this sparse distribution. At training time, the paper uses amortized variational inference, further approximating the gradients using REINFORCE to deal with the large number of prototypes. At inference time, they can keep fewer training examples in memory by filtering only those whose posterior probability is larger than a threshold. Thus both the memory required to store training examples and the time spent on retrieving training examples is reduced. All four referees support accepting this paper, although two of them find it a borderline paper. The main weakness pointed out is that the proposed method is very specific to Guu et al., 2018. However, the reviewers agree this is a fairly nice contribution. The author response and the discussion clarified some concerns (mainly the point about memory efficiency raised by one of the reviewers). I agree this is a nice focused contribution, hence I recommend acceptance.