Review for NeurIPS paper: Neural Star Domain as Primitive Representation

NeurIPS 2020

Neural Star Domain as Primitive Representation

Review 1

Summary and Contributions: This work introduces a new representation for 3D shapes based on primitives defined on a starred domain. Each primitive is represented by a unified explicit-implicit surface representation, allowing for the representation of the overall shape a simple union of primitives while guaranteeing fast surface inference.

Strengths: -the unified explicit-implicit surface representation allows for to improve both reconstruction accuracy and surface extraction performance - with respect to previous attempts to learn shape as a collection of primitives, the proposed method can handle non convex component while being accurate

Weaknesses: - I found parts of the proposed method hard to understand (see detailed comments below) - How does the proposed method deal with complicated topologies? all the visualisation provided in the paper contain genus-0 shapes. In the supplementary (Fig.3) results with different genus are provided, however, it seems like the proposed parameterisation struggles at modelling high frequency components. I would have expected a "localised" approach to be more suited for such high frequency shape details. The rebuttal addresses my concerns, so I have updated my recommendation consequently.

Correctness: Yes

Clarity: I found the paper very hard to follow and to contain some typos.

Relation to Prior Work: Yes

Reproducibility: Yes

Additional Feedback: Section 3.1: From this section, it's unclear if we are dealing only with a separate occupancy function for each point P_i, if there i a global function O (it doesn't seem so from the rest of the paper) and what P_i represent. On line 94: ... surface points \hat P which can be decomposed into a collection of N primitives. Can the authors explain more clearly the problem setting? Typos: line 2: Accurate line 95: i-th line 139: Neural -------------------------- the rebuttal addressees most of my concerns, hence I have upgraded my initial recommendation.

Review 2

Summary and Contributions: This paper tackles the task of representing an input shape as a composition of primitives and introduces a novel primitive representation (neural star domain [NSD]) towards this goal. An NSD primitive is instantiated via a (predicted) distance to surface in each direction i.e. each direction u is associated with a scalar r(u) that determines the surface position in that direction. Given an input image, this work presents an approach to predict the shape and position of a fixed number of NSD primitives s.t. their composition is similar to the underlying 3D shape.

Strengths: - I really like some aspects of the proposed primitive representation. In particular, I am very excited about the fact that both implicit and explicit computations are simple i.e. one can both a) analytically compute the occupancy of a query point w.r.t a given primitive, and b) analytically (and differentiably!) sample points on the primitive surface. While similar properties are true of simpler shape primitives e.g. cuboids/superquadrics, it is really nice to see this behavior in a more expressive representation. Just to clarify why I find this interesting, I would note that alternate representations e.g. in CVXNet, SIFs, DSIFs make the computation a) simple, but not b) i.e. sampling surface points. - The empirical results and ablations are convincing. On a note related to the above, Table 3 demonstrates that both the implicit and explicit aspects of the primitive representation are useful for designing the loss used for training. Similarly, the ablations in Table 4 are helpful and emphasize the efficiency of the approach. Finally, the empirical results show that the proposed approach is able to well-represent the underlying shapes in comparison to alternate primitives. - From a technical perspective, the idea of predicting the coefficients of the spherical harmonic functions is interesting, elegant, and novel (in this context). That said, I am curious if an implict network that, conditioned on a latent variable capturing the primitive shape, maps an input direction to a radius would be better?

Weaknesses: - While I really like the NSD representation, I am not convinced they are 'parsimoious' in a meaningful way e.g. an entire car 3D shape is an NSD, all convex shapes are NSDs. Essentially, I am concerned the space of `primitives' is too powerful ensure that its elements are simple. For example, both the primitives shown in Fig 2 are actually very complex! While this is not an issue if our only goal is to represent shapes accurately, the resulting parsing maybe undesirable for other applications e.g. shape editing. - I find the 'composite indicator function' definition (Eq 5) to be quite adhoc and actually counter-intuitive. In particular, \hat{O}_i is itself output of a sigmoid (let's say \hat{O}_i = sigmoid(V_i)). Under current definition, where \hat{O} = sigmoid(sum_i \hat{O}_i), for \hat{O} to be close to 1, the point x would need to be inside many primitives, and not just 1 - this is undesirable and encourages overlapping primitives (and is apparent from visualizations). Instead of summing \hat{O}_i, why not simply use \hat{O} = sigmoid(sum_i V_i), so if the point x is clearly inside even 1 primitive, O would be close to 1? - I assume the sentiment that Eq 6 is trying to convey is that if a point sampled from primitive i's surface is in interior of any other primitive (say primitive j), then discard it. However, the equation as written does not do this. Instead, it says that: Considering all j one at a time, if a point sampled on i's surface is in interior on this specific primitive j (not 'any' primitive j), then discard it. As the equation is currently written, a point on primitive i will be discarded only if it is inside ALL other primitives, not just any other primitive - I would guess that this is not what the paper intends and would encourage double-checking this.

Correctness: The technical details and empirical setup seem correct.

Clarity: The paper is generally well written and easy to follow.

Relation to Prior Work: I think the relation to prior work is presented accurately.

Reproducibility: Yes

Additional Feedback: I really like the proposed primitive representations and feel the approach is technically interesting and novel. In particular, I am excited about the implicit and explicit computation being feasible for this primitive representation. While I do feel the proposed primitives are not really simple or the obtained decompositions semantically too meaningful/useful, I think the technical merits of the work outweigh these and I'd recommend accepting in the hope that this or representations can be more broadly used.

Review 3

Summary and Contributions: This paper proposes to represent 3D shapes as a union of star-shaped primitives (i.e., primitive shapes where any point can be reached from some "center" via a straight line without intersections). These star-shaped neural primitives, provide a good approximation to implicit functions, maybe with slightly higher accuracy than existing solutions (e.g., convexes from BSP-net).

Strengths: This paper offers a novel shape representation for implicit functions that (a) provides a more accurate representation of the signal, (b) makes it easier to reconstruct the 3D surface.

Weaknesses: The paper mostly builds on existing ideas (i.e., OccupancyNet, BSP-NET), but I think it still makes sufficient contribution to make it an interesting work for NeurIPS audience.

Correctness: Yes

Clarity: Yes

Relation to Prior Work: OK

Reproducibility: Yes

Additional Feedback: