The paper proposes task-oriented feature distillation, which introduces additional distillation objective with task loss from intermediate layers of the network. Although the idea is simple, it’s reasonable and well motivated. The experimental results show improved classification performance over the baselines. On the negative side, the novelty of the method seems incremental. With all these being said, the reviewers were mostly satisfied with the rebuttal and converged in favor of accepting the paper.