Reviews: Knowledge Extraction with No Observable Data

This paper proposes to solve the novel task of knowledge distillation in the absence of training data. Reviewer 1 thought the paper was a novel application and combination of GANs, knowledge distallation and model compression. They thought the approach was technically sound and sensible. They had some issues with the way to sample the target class and suggested a more consistent approach. They also highlighted some concerns with the absence of baselines for SVD-compressed networks with and without fine-tuning. They felt that overall the work was significant but expressed some concerns with the scale of experiments, noting that the most complex model/dataset was ResNet14 on FashionMNIST and SVHN. Reviewer 2 thought that the problem presented was interested in challenging and, like Reviewer 1, felt that there were some weaknesses in the experiments. Reviewer 2 also wanted to see more motivation to the problem presented. Reviewer 3 thought the problem was novel and relevant while significance was moderate due to the presence of related architectures. Both Reviewer 1 and 3 commented that the paper was clear and would be easy to re-implement. The authors responded to Reviewer 1 and 3’s request for clarity around the use of Tucker-2 decomposition for initialization. They also presented a further analysis of sampling label vectors. They also ran additional experiments on CIFAR10 and CIFAR100, reporting that is challenging to achieve a good performance there, leaving it to be an open problem. The reviewers felt that the author response improved clarity but still raised some questions, particularly with respect to the gains on unstructured data (emphasized by the poor performance on CIFAR). The reviewers and AC see no issue with the negative results being reported and agree on acceptance, with the paper a stepping stone to more advanced methods. We all recommend that the paper moderates its claims.

Paper ID:	1557
Title:	Knowledge Extraction with No Observable Data