This paper aims at finding optimal representations for supervised learning from the perspective of information bottleneck (IB), and proposes an extension of IB, decodable information bottleneck (DIB), which considers the predictive family into account when learning representations. The original IB tries to learn representations that have sufficient information about the label but minimal information about the input features. The authors argue that DIB produces optimal representations, in the sense of achieving optimal expected test performance, and in addition, that DIB is easier to estimate than IB. Extensive experimental results are presented to verify the theoretical claims. According to the reviewers, the idea proposed in the paper is well motivated and interesting, and the empirical evaluation is convincing and thorough. All reviewers also agree that the authors properly addressed the concerns raised in the original reviews.