Noam Slonim, Naftali Tishby
We introduce a novel distributional clustering algorithm that max(cid:173) imizes the mutual information per cluster between data and giv(cid:173) en categories. This algorithm can be considered as a bottom up hard version of the recently introduced "Information Bottleneck Method". The algorithm is compared with the top-down soft ver(cid:173) sion of the information bottleneck method and a relationship be(cid:173) tween the hard and soft results is established. We demonstrate the algorithm on the 20 Newsgroups data set. For a subset of two news(cid:173) groups we achieve compression by 3 orders of magnitudes loosing only 10% of the original mutual information.