Gal Chechik, Naftali Tishby
The problem of extracting the relevant aspects of data, in face of multiple conﬂicting structures, is inherent to modeling of complex data. Extract- ing structure in one random variable that is relevant for another variable has been principally addressed recently via the information bottleneck method . However, such auxiliary variables often contain more in- formation than is actually required due to structures that are irrelevant for the task. In many other cases it is in fact easier to specify what is irrelevant than what is, for the task at hand. Identifying the relevant structures, however, can thus be considerably improved by also mini- mizing the information about another, irrelevant, variable. In this paper we give a general formulation of this problem and derive its formal, as well as algorithmic, solution. Its operation is demonstrated in a synthetic example and in two real world problems in the context of text categoriza- tion and face images. While the original information bottleneck problem is related to rate distortion theory, with the distortion measure replaced by the relevant information, extracting relevant features while removing irrelevant ones is related to rate distortion with side information.