Reviews: Accurate Uncertainty Estimation and Decomposition in Ensemble Learning

The authors should discuss connections of the density regression approach in Section 5 of http://www3.stat.sinica.edu.tw/ss_newpaper/SS-2018-0231_na.pdf. The cited paper operates on the conditional density instead of the cdf by expressing f(y | x, \mu) = phi( \gamma(y); \mu(x), \sigma^2) \gamma(y) where \gamma is assumed to be a diffeomorphism. I think the two models are inherently very similar as the cdf transform that the authors proposed can be linked to \gamma. I do think there are certain advantages of operating on the density rather than on the cdf, the primary reason being computational convenience. The constrained Gaussian process that the authors propose seems to be complicated to implement; the HMC approach that the authors alluded to is not automated requiring to choose carefully the step sizes. Moreover, no comparisons are provided with the fully flexible classical methods. The authors should at least consider some of the off the shelf conditional density estimators, e.g. the np-package as a point of comparison. Also, flexibility in the mean function is not discussed, does the approach have a natural extension to include moderately high dimensional predictors?

Reviewer 2

The motivation for this work stems from the fact that ensemble models with individual models in the ensemble given, and only their weights to be estimated, may not fit the data well. The contributions of this paper can be used to improve an ensemble model and enable assessing the shortcomings of the ensemble. Outside the data, the model reduces to the original ensemble, while close to the data the bias term and the correction to the model’s noise distribution may significantly affect the output. The elements of the solution have been existing before, and their application to improve ensemble models seems to be the main novel innovation of the article. The topic seems well motivated and could be of interest to people trying to apply and understand ensemble models in practice. The theoretical derivations are interesting and establish a solid foundation for the choices made, although somewhat straightforward. A strength of the article is that it is well-written and mathematical details are presented clearly and accurately (also the Supplement). A weakness in presentation is that the quality of figures is clearly insufficient (legends, labels etc. are very difficult if not impossible to read in many cases). ---AFTER REBUTTAL AND DISCUSSION--- I thank the authors for the detailed rebuttal. To summarize, the contribution of the article is to show how to make an ensemble model more flexible, by introducing a residual process for bias correction and a transformation of the noise distribution. Furthermore, the impact of different sources of uncertainty can be estimated, and this is likely to be useful when interpreting the ensemble's predictions in practice (as demonstrated in the Application section). This is a sensible and interesting contribution, and potentially useful to the community, although somewhat straightforward. Novelty of the work and its relation to the Dasgupta paper is a concern: a transformation of the noise distribution seems to be presented in both. The rebuttal argues that the present work has benefits in terms of interpretability and accuracy of the ensemble weights, but this would require more treatment (than is possible in a rebuttal) to be fully convincing. The works appear to be done independently of each other. A strength of the paper is the clarity of mathematical detail. On the other hand, the clarity of figures is not sufficient for NeurIPS, and I hope the authors invest the effort to improve the figures if the paper gets accepted. Acknowledging the concerns in presentation and novelty, I still think the paper is acceptable but borderline, and stick to my original score 6.

Reviewer 3

### 1) Content: ### The paper proposes a Bayesian Nonparametric Ensemble (BNE) approach that augments an existing (deterministic) ensemble with nonparametric models to account for different sources of uncertainty in the data. The augmented model is specifically designed to allow for the disambiguation between aleatoric and epistemic uncertainty as well as between different sources of epistemic uncertainty (parametric vs. structural). The model design is well-motivated and described in great detail with additional details and derivation included in the appendix. Additionally, the authors provide a proof that the BNE posterior distribution will concentrate around the true data-generating distribution as the number of training samples is being increased. The paper also includes an illustrative toy example and an empirical comparison between BNE and related models. ### 2) Empirical evaluation: ### The empirical evaluation is insightful, and the plots are clear and complement the main text well. However, the empirical evaluation is also very limited in its scope. I am convinced that BNE outperforms the three other methods on the toy dataset as well on the PM2.5 exposure dataset, but note that BNE is by construction more expressive than BAE. Furthermore, we note that the RMSE results included in the Appendix H2 show that the loo RMSEs of BNE and BME are within one anothers standard deviations. The empirical comparison could be made more convincing (see under improvements below). That being said, the empirical evaluation presented in the paper is limited but, within its scope, thorough and insightful, and I believe it is borderline sufficient. I would urge the authors to extend the empirical comparisons before publication in line with the suggested improvements below. ### 3) Presentation ### The paper is superbly written. The presentation is clean and the writing is extremely clear. The submission is accompanied by a detailed appendix with easy-to-follow and detailed derivations and additional experimental results. The plots could be improved (see under improvements below).

Paper ID:	4806
Title:	Accurate Uncertainty Estimation and Decomposition in Ensemble Learning

Reviewer 1

Reviewer 2

Reviewer 3