The paper introduces an improvement to the quadratic control variant of Miller et al. for reducing the reparameterization gradient variance. The key idea is to use a parameterized quadratic approximation to the model and learn the parameters using a double-descent scheme. This is shown to perform better than the Taylor-expansion based approach of Miller et al., with an important advantage of reducing the gradient variance of not only the mean but also the scale parameters. The reviewers had some concerns about the novelty and the empirical evaluation of the method but these have been convincingly addressed in the author response. The authors are urged to incorporate the suggestion made by the reviewers to improve the paper further.