
Submitted by Assigned_Reviewer_1
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
=== Summary ===
In this paper, the authors propose to bound geometric quantities that appears in statistical error bounds of structured estimation problems with atomic norms.
It was previously shown that the estimation error of Dantziglike estimator, with random Gaussian design, could be bounded by using three key quantities: the Gaussian width of the unit ball of the regularizer, the Gaussian width of the spherical cap of the tangent cone and the restricted norm compatibility constant. The goal of this article is to provide easy to compute bounds for these three quantities, in the case of signinvariant atomic norms.
In section 3, the authors introduce upper bounds on the three geometric quantities, which are the main contribution of the paper. The Gaussian width of the unit ball can be bounded by decomposing the set of atoms as the union of simpler sets, for which the Gaussian width are easy to compute. This result already appeared in [14]. The Gaussian width of the spherical cap of the tangent cone can be bounded using a single subgradient of the regularizer, taken at the true parameter vector w^*. Finally, the restricted norm compatibility constant can be bounded using two other "norm compatibility constants" which are simpler to compute (because the set on which they are defined are simpler). The authors show on two examples, the L1 and OWL norms, how these results can easily be applied.
In section 4, the authors provide lower bounds for the two last quantities. The lower bounds does not match the upper bounds, but show that the upper bound are relatively tight, and thus usefull.
In section 5, the bounds obtained in this paper are applied to the ksupport norm, improving previously known results.
=== Significance ===
I believe this paper is addressing an important problem, providing interesting and useful new tools for studying sparse estimation problems with atomic norms. The paper is clearly written and well organized. The use of examples helps to understand the results and how to apply those in practice. Finally, the bounds derived in the paper allowed to obtain new results for the ksupport norm, showing the relevance of the results.
Q2: Please summarize your review in 12 sentences
I believe this is a well written and interesting paper, providing easy to compute bounds to study the statistical error of Dantziglike estimator with atomic norms.
Submitted by Assigned_Reviewer_2
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
This paper studies atomic norms used in structured sparse regression. For a given atomic set A (satisfying a signchange invariance assumption), the paper bounds the Gaussian width of the unit norm ball, of the tangent cone, and the L2compatibility numberquantities which are sufficient to characterize the rate of convergence of the estimation problem under certain settings.
The new technique is general. When applied to the L1 norm, both unweighted and weighted, it is shown to match bounds derived from existing techniques. The authors also apply the new technique on ksupport norms and derive novel bounds; the new bounds have the interesting implication that k should be underspecified in practice.
The unit norm ball analysis builds on a lemma from [14] and does not seem a significantly novel contribution. The tangent cone and the compatibility analyses look nontrivial. They build upon a simple but powerful observation that the tangent cone of an atomic norm can be upper bounded by the tangent cone of a weighted L1 norm (Lemma 3).
My one suggestion is that the authors perform numerical simulations with ksupport norms and test whether the bound matches actual behavior. It would be interesting to see whether overspecifying k would hurt recovery.
I find the paper interesting and the proof relatively easy to follow. The paper is wellorganized and the results clearly stated and explained.
[14] An inequality with application to structured sparsity and multitask dictionary learning.
Q2: Please summarize your review in 12 sentences
This paper gives a new method of bounding the Gaussian width of various sets important for deriving the rates of consistency of predictors with structured sparsity penalties. The new method seems innovative and gives a novel convergence rate for regressions with ksupport norms.
Submitted by Assigned_Reviewer_3
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
This paper is about sample complexity and error bounds for Dantzigtype estimators in the context of structured regression. A central ingredient of such bounds are geometric properties like the the Gaussian width of the norm ball and similar related quantities. For the class of atomic norms that are invariant under sign changes, novel bounds of these geometric properties are provided, which can be easily computed in practice. Further, tightness of the bounds is analyzed by showing that these upper bounds are close to the corresponding lower bounds.
The main focus of this work concerns a topic which is definitely relevant and interesting to the Machine Learning community. The papers is written in a clear and transparent way. In my opinion, the bounds obtained are both of theoretical and practical interest. However, I have to admit that I did not fully understood some technical details like the proof of Theorem 7.
Q2: Please summarize your review in 12 sentences
An interesting theory paper about a highly relevant topic. Well written.
Submitted by Assigned_Reviewer_4
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
The authors consider sparse linear model estimation with atomic norms and provide some bounds on Gaussian widths arising in their analysis. As a special case, the authors derive bounds for l1 and ordered wighted l1 norm and provides some analysis on the recently proposed ksupport norms.
The atomic norm is attractive for its generality to capture a broad class of linear inverse problems [Chandrasekaran et al, 2012 FOCM]. The present paper, however, focuses only on sparse linear models and the results, except for the recently proposed ksupport norm, are already available. The proof techniques for calculating Gaussian widths are also fairly standard, although the dual norm calculation for ksupport norm analysis is new.
Is this technique generalizable to other instances of atomic norm analysis, e.g., nuclear norm penalized estimation of lowrank matrices?
The paper is wellwritten, the proofs are clear and easy to follow.
Q2: Please summarize your review in 12 sentences
The paper is wellwritten and there is no gap in the mathematical proofs. However the scope of the problem is narrow: instead of providing analysis for general atomic norm analysis [Chandrasekaran et al, 2012 FOCM] unifying in a variety of linear inverse problems, the authors only consider sparse linear regression for which the key results are already available.
Q1:Author
rebuttal: Please respond to any concerns raised in the reviews. There are
no constraints on how you want to argue your case, except for the fact
that your text should be limited to a maximum of 5000 characters. Note
however, that reviewers and area chairs are busy and may not read long
vague rebuttals. It is in your own interest to be concise and to the
point.
Assigned Reviewer 1: We thank the reviewer for
the encouraging comments. First we agree that the unit norm ball analysis
is mainly built on [14], but we demonstrate that the application of the
inequality needs care, e.g., a naive application to OWL will result in a
O(\sqrt{p}) bound since there are 3^p1 atoms, while applying the
inequality hierarchically yields the desired sharp bound. Second, the
ksupport norm analysis actually explains prior experimental results,
e.g., figure 2(c) in [10], where the L2 error starts increasing when k
crosses the true sparsity. We will make this aspect more
clear.
Assigned Reviewer 2: We thank the reviewer for
encouraging comments. We will highlight the main contributions in more
detail in an extended version of the work  thanks for the
suggestion.
Assigned Reviewer 5: We thank the reviewer for the
encouraging comments. It is true that [Chandrasekaran et al, 2012] focuses
on both vector and matrix scenarios. However, the Gaussian width of the
tangent cone is calculated individually for certain norms, most of which
are decomposable. Our work establishes such a general bound for any atomic
norm as long as the norm is signinvariant, which is true for commonly
used norms. We also have the restricted norm compatibility analysis, which
is not needed for the estimator in [Chandrasekaran et al, 2012]. Regarding
(structured) lowrank matrix estimation, our result can be extended to
unitarily invariant matrix norms (see "The Convex Analysis of Unitarily
Invariant Matrix Functions" by A.S. Lewis, 1995), which already include
nuclear, Frobenius, spectral norms, and the spectral ksupport norm [15].
Some technical details will be a little different from vector setting,
e.g., the notion of signchange invariance. We can add a remark and the
reference to Lewis'95 to highlight the generality of our results. For
other statistical models beyond linear, e.g., generalized linear models,
the error bounds also involve these geometric measures (see [3]), so our
results apply.
Assigned Reviewer 6: We thank the reviewer for
the comments. We want to clarify the key differences between our paper and
existing work such as [10]. The existing work primarily focuses on
deriving error bounds such as equation (5) using geometric measures (see
[3, 9, 10]). However, there is no general and easy way of bounding these
geometric measures. Although some results are available for specific
decomposable norms such as L1, results are unknown for general
nondecomposable norms. Our work introduces a general approach to bounding
the geometric measures, and is applicable to any atomic norm as long as
they are sign invariant, which is true for commonly used norms. To the
best of our knowledge, our main idea of constructing a superset cone
induced by weighted L1 seminorm is new. It is different from the
techniques used in [10], which yields a weaker bound for ksupport norm.
For ksupport norm, we also provide a new algorithm to compute the polar
operator, which can be useful in the evaluation of ksupport norm, or
certain optimization frameworks that require solving polar operator (see
[24]). Moreover, the geometric measures we analyzed also appear in the
error bounds of other statistical estimation problems (see [3]), e.g.,
generalized linear models, so our results apply as well.
Assigned
Reviewer 7: We thank the reviewer for the encouraging comments. We
agree that some of the proofs were rather brief (due to the page limit)
and we skipped some relevant results from [9] that we used in the proofs.
We plan to do an extended version with selfcontained proofs and
sufficient details on the relevant background, e.g., the notion of
tangent/normal cone and statistical dimension.
Assigned Reviewer
8: We thank the reviewer for the encouraging comments. The reviewer has
summarized the work very clearly as part of the review, and we appreciate
this.

