Paper ID: | 6304 |
---|---|

Title: | Bayesian Optimization with Unknown Search Space |

Applying Bayesian optimization to expensive black-box problems needs to specify the bound of search space. However, when tackling a completely new problem, there is no prior knowledge to guarantee that the specified search space contains the global optimum. The paper proposes an approach to deal with this situation. In the approach, the user first specifies an initial search space; then the bound of search space automatically expands as the iteration proceeds; finally the algorithm will return a solution achieving \epsilon-accuracy. The key is how to expand the search space. The approach uses upper confidence bound (UCB) as acquisition function, and selects the expanded search space containing at least one point whose acquisition function value is within \epsilon from the acquisition function maximum. Also because the gap between the best selected point and the global optimum of the current search space can be bounded, the approach can return a solution achieving \epsilon-accuracy. I have checked all the proofs, and believe most of them are correct. However, I also found some places which are unclear and may be incorrect. Questions: 1. In the proof of Theorem 4.1, \epsilon is set less than \| \sqrt(\beta_t)\theta -\min_{x\in R^d}a_{UCB}(x;D_{t-1})\|. What if \| \sqrt(\beta_t)\theta -\min_{x\in R^d}a_{UCB}(x;D_{t-1})\|=0? 2. \gamma is the minimum value among a set of numbers above eq(14), but is a specific value in eq(14). How do you determine its value in eq(14)? 3. Why do the second and the third inequalities hold in eq(15)? 4. How can eq(15) hold with probability at least 1-\delta? max_{x\in D_t}a_{LCB} is used to replace max_{x\in D_t}f(x) in the second inequality. Did you consider the probability for max_{x\in D_t}a_{LCB}<= max_{x\in D_t}f(x) to hold? Question about experiments: 1. a_k and b_k in \beta_t are unknown. What are their values in your experimental settings? Minor comments: 1. The posterior mean in eq(1) is not correct. “y” -> “y-m”? 2. Proposition 5.1 is proposed in the paper, but it is presented as a lemma in the supplementary.

This paper proposes an algorithm to expand the search space for Bayesian Optimization in order to find an optima giving us some guarantees. The paper is complete in the sense that it provides theoretical claims to support their evidence, a proposition of an algorithm and some experiments Although this is nice is another extension of UCB (they are a lot since it is nice to play with it from the theoretical point of view) and this task of expanding the search space has been treated in a lot of NIPS papers this year, with other strategies that I believe that are more practical and general. Related to: The UCB acquisition function, \epsilon accuracy. Strengths: Provides theoretical and empirical content. Weaknesses: Not the most practical approach to perform this task. It is a combination of well known techniques. Some clarity and organization. Does this submission add value to the NIPS community? : Although other approaches perform similar tasks this is another view that might be taken into account. Perhaps it is not the most practical one although. Quality: Is this submission technically sound?: Not a lot, but what is exposes is sound in the sense that I have not read the \epsilon accuracy in BO before. Are claims well supported by theoretical analysis or experimental results?: Yes they are, it is a strength of the paper. Is this a complete piece of work or work in progress?: I would say that it is a complete piece of work. Are the authors careful and honest about evaluating both the strengths and weaknesses of their work?: Yes, I believe. Clarity: Is the submission clearly written?: I have some suggestions that may be corrected in my humble opinion. Is it well organized?: Yes it is. Does it adequately inform the reader?: I believe it to be so. Originality: Are the tasks or methods new?: They already exist in their literature. Is the work a novel combination of well-known techniques?: Combination of well-known techniques. Is it clear how this work differs from previous contributions?: Yes, but not so clear to add real value in practice. Is related work adequately cited?: Yes it is with one exception. Significance: Are the results important?: I would not bet that this approach is going to be used massively in practice. Are others likely to use the ideas or build on them?: I do not know, maybe, but I like other expansion approaches proposed to NIPS this year more. Does the submission address a difficult task in a better way than previous work?: Previous maybe, current no. Does it advance the state of the art in a demonstrable way?: Yes, they provide experiments and theoretical content. Does it provide unique data, unique conclusions about existing data, or a unique theoretical or experimental approach?: Yes, their work is genuine. Arguments for acceptance: It is a complete paper with empirical and theoretical content. Arguments against acceptance: Other space expansions approaches proposed to NIPS this year are more general and practical, maybe there is no room in NIPS for all of them. Being NIPS the best ML conference, the quality of the paper may be borderline. If it were other conference I would recommend this paper for acceptance blindly. Typos: -> I would not put \epsilon accuracy in the abstract, it it too technical. -> Phrase 3 in the abstract is not syntactically correct "our method can find a point whose function value within \epsilon of the objective..." -> Second phrase of introduction is also not syntactically correct. -> I miss a formal definition of the problem in the intro. -> Global \epsilon accuracy must be explained, introduced formally or cited. -> Sections of the paper must be introduced at the end of section 1. -> Problem setup assumes maximization. More detailed comments and suggestions: I would like to congratulate the authors for the paper and recommend it for acceptance with a weak accept (6) because I think that it is a good paper but NIPS has very high standards and it is a pity to say that the other papers that I have reviewed that achieve this same task are more practical, more pragmatic. I would use them in practice and I am afraid that this one not. This is the reason why I qualify this paper with a 6.

The current version provides limited explanation and analysis of its empirical results and feels a bit preliminary now. The assumptions and implications of the bounds are not discussed. The theoretical choice of {beta_t} in Theorem 5.1 is typically overly conservative and thus, the practical schedule is used. This is common to GP-UCB based algorithms and should be criticized more because the theoretical analysis and the practical usage is different. In particular, it is more important to select the schedule of beta_t. My main concern is the experimental results. You only analyzed the low-dimensional case (d=2-3) in real problems with the small number of iterations. Optimization should be inefficient if you expand space. Extending the search space seems to need more trails of optimization. When you use the high-dimensional case (d=10) and more iterations case, e.g., 100 iterations, the existing method can outperform your method. Using warped Gaussian process in input space is an approach to reconstruct search space, which is a limited input search space but warped by bijective transformations. Snoek+, INPUT WARPING FOR BAYESIAN OPTIMIZATION OF NON-STATIONARY FUNCTIONS