{"title": "Learning Monotonic Transformations for Classification", "book": "Advances in Neural Information Processing Systems", "page_first": 681, "page_last": 688, "abstract": null, "full_text": "Learning Monotonic Transformations for\n\nClassi(cid:12)cation\n\nAndrew G. Howard\n\nTony Jebara\n\nDepartment of Computer Science\n\nDepartment of Computer Science\n\nColumbia University\nNew York, NY 10027\n\nColumbia University\nNew York, NY 10027\n\nahoward@cs.columbia.edu\n\njebara@cs.columbia.edu\n\nAbstract\n\nA discriminative method is proposed for learning monotonic transforma-\ntions of the training data while jointly estimating a large-margin classi(cid:12)er.\nIn many domains such as document classi(cid:12)cation, image histogram classi(cid:12)-\ncation and gene microarray experiments, (cid:12)xed monotonic transformations\ncan be useful as a preprocessing step. However, most classi(cid:12)ers only explore\nthese transformations through manual trial and error or via prior domain\nknowledge. The proposed method learns monotonic transformations auto-\nmatically while training a large-margin classi(cid:12)er without any prior knowl-\nedge of the domain. A monotonic piecewise linear function is learned which\ntransforms data for subsequent processing by a linear hyperplane classi(cid:12)er.\nTwo algorithmic implementations of the method are formalized. The (cid:12)rst\nsolves a convergent alternating sequence of quadratic and linear programs\nuntil it obtains a locally optimal solution. An improved algorithm is then\nderived using a convex semide(cid:12)nite relaxation that overcomes initializa-\ntion issues in the greedy optimization problem. The e(cid:11)ectiveness of these\nlearned transformations on synthetic problems, text data and image data\nis demonstrated.\n\n1 Introduction\n\nMany (cid:12)elds have developed heuristic methods for preprocessing data to improve perfor-\nmance. This often takes the form of applying a monotonic transformation prior to using\na classi(cid:12)cation algorithm. For example, when the bag of words representation is used in\ndocument classi(cid:12)cation, it is common to take the square root of the term frequency [6, 5].\nMonotonic transforms are also used when classifying image histograms. In [3], transforma-\ntions of the form xa where 0 (cid:20) a (cid:20) 1 are demonstrated to improve performance. When\nclassifying genes from various microarray experiments it is common to take the logarithm of\nthe gene expression ratio [2]. Monotonic transformations can also capture crucial properties\nof the data such as threshold and saturation e(cid:11)ects.\n\nIn this paper, we propose to simultaneously learn a hyperplane classi(cid:12)er and a monotonic\ntransformation. The solution produced by our algorithm is a piecewise linear monotonic\nfunction and a maximum margin hyperplane classi(cid:12)er similar to a support vector machine\n(SVM) [4]. By allowing for a richer class of transforms learned at training time (as opposed\nto a rule of thumb applied during preprocessing), we improve classi(cid:12)cation accuracy. The\nlearned transform is speci(cid:12)cally tuned to the classi(cid:12)cation task. The main contributions\nof this paper include, a novel framework for estimating a monotonic transformation and\na hyperplane classi(cid:12)er simultaneously at training time, an e(cid:14)cient method for (cid:12)nding a\n\n\f,1nx\n\n,2nx\n\n,n Dx\n\n1w\n\n2w\n\nDw\n\nny\n\nb\n\nFigure 1: Monotonic transform applied to each dimension followed by a hyperplane classi(cid:12)er.\n\nlocally optimal solution to the problem, and a convex relaxation to (cid:12)nd a globally optimal\napproximate solution.\n\nThe paper is organized as follows. In section 2, we present our formulation for learning a\npiecewise linear monotonic function and a hyperplane. We show how to learn this combined\nmodel through an iterative coordinate ascent optimization using interleaved quadratic and\nlinear programs to (cid:12)nd a local minimum. In section 3, we derive a convex relaxation based\non Lasserre\u2019s method [8]. In section 4 synthetic experiments as well as document and image\nclassi(cid:12)cation problems demonstrate the diverse utility of our method. We conclude with a\ndiscussion and future work.\n\n2 Learning Monotonic Transformations\n\nFor an unknown distribution P (~x; y) over inputs ~x 2