NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Reviewer 1
The paper appears to be rather original, in terms of task and architecture, although conventional in terms of machine learning theory. The proposed architecture appears to work reasonably well, and a few contrasting conditions are explored with varying performance. The main conditions are the so-called Bat-G network which features a late integration of spatial and temporal features, versus a so-called SAE network which performs earlier integration, and works less well. The effect of temporal versus spatial features is also compared. However the system contains a great many design decisions that are not explicitly tested. No conventional sonar/ultrasound methods are compared against, so it is difficult to determine how this work compares to prior methods of d al significance of the scores is rather unclear. For example, if the voxels at the visible surface were all correct, it seems the system could still achieve a bad score by misclassifying other voxels that are not visible. The writing is at times a bit ungrammatical (e.g., figure 5 caption "the objects have vertexes" instead of vertices). A glaring mistake is having figure 1 appear before the abstract. It is not possible to fully understand the architecture of the SAE network without seeing the supplementary material.
Reviewer 2
The proposed approach emits ultrasonic pulses and records the reflected echoes with multiple microphones. The audio input from the microphones is converted to spectrograms by using short-time fourier transform. The spectrograms are given as input to an encoder-decoder neural network architecture which is inspired by bat's auditory system and which is trained to output a volumetric voxel presentation of the scanned objects. The paper presents also a dataset where different objects are recorded with a custom-made echo scanner which can be rotated around the objects. The shape (and pose) of the scanned objects is known during data acquisition and can be used as a ground truth for training the network. I think that this paper addresses an interesting problem area which is apparently not much covered in machine learning previously. The proposed approach seems novel and contains lots of innovative design choices from experimental measurement setup to computational signal processing architecture. The obtained results look good. Since I am not an expert in the field of ultrasound or bats, I am a bit cautious to give strong recommendation.
Reviewer 3
Originality - I think the primary originality of the paper is limited to the engineering set-up specific to obtaining the dataset. It appears quite comprehensive and addresses a number of different geometric patterns that could prove to be a challenge for ultrasound-based image reconstruction. The biomimetic portion of the network also comes across as novel. Although, the specific impact of that portion of the network architecture is not clear. The rest of the paper applies standard supervised learning techniques to a labeled dataset and is not novel. Quality: This is a reasonably well written paper and attempts to solve a real-world problem. However, since the theme of the paper is a bat-inspired network, the paper fails to address the importance of the biomimetic part of the design. Figure 6 appears to perform ablation studies to judge the importance of the spectral and temporal and spectral cues. The authors claim that the comparison with SAE provides justification for the use of the biomimetic path. However, it would have been a more compelling argument if the performance of Bat-G was reported with the biomimetic connection removed. Clarity: the paper is well motivated and the experimental set-up is clearly described. The paper does a relatively poorer job of explaining the choice of baselines that were studied - e.g., why SAE, is it SotA on ultrasound image reconstruction? In the absence of such explanations, it is hard to judge the impact of the paper. Significance: I think the core significance of the paper comes from the large ultrasound dataset. The significance of the methods described in the paper is less clear. Post-rebuttal comments: I have increased my score based on the rebuttal.