{"title": "Lattice Regression", "book": "Advances in Neural Information Processing Systems", "page_first": 594, "page_last": 602, "abstract": "We present a new empirical risk minimization framework for approximating functions from training samples for low-dimensional regression applications where a lattice (look-up table) is stored and interpolated at run-time for an efficient hardware implementation. Rather than evaluating a fitted function at the lattice nodes without regard to the fact that samples will be interpolated, the proposed lattice regression approach estimates the lattice to minimize the interpolation error on the given training samples. Experiments show that lattice regression can reduce mean test error  compared to Gaussian process regression for digital color management of printers, an application for which linearly interpolating a look-up table (LUT) is standard. Simulations confirm that lattice regression performs consistently better than the naive approach to learning the lattice, particularly when the density of training samples is low.", "full_text": "Lattice Regression\n\nDepartment of Electrical Engineering\n\nDepartment of Electrical Engineering\n\nMaya R. Gupta\n\nUniversity of Washington\n\nSeattle, WA 98195\n\nEric K. Garcia\n\nUniversity of Washington\n\nSeattle, WA 98195\n\ngarciaer@ee.washington.edu\n\ngupta@ee.washington.edu\n\nAbstract\n\nWe present a new empirical risk minimization framework for approximating func-\ntions from training samples for low-dimensional regression applications where a\nlattice (look-up table) is stored and interpolated at run-time for an ef\ufb01cient im-\nplementation. Rather than evaluating a \ufb01tted function at the lattice nodes without\nregard to the fact that samples will be interpolated, the proposed lattice regression\napproach estimates the lattice to minimize the interpolation error on the given\ntraining samples. Experiments show that lattice regression can reduce mean test\nerror by as much as 25% compared to Gaussian process regression (GPR) for dig-\nital color management of printers, an application for which linearly interpolating\na look-up table is standard. Simulations con\ufb01rm that lattice regression performs\nconsistently better than the naive approach to learning the lattice. Surprisingly,\nin some cases the proposed method \u2014 although motivated by computational ef\ufb01-\nciency \u2014 performs better than directly applying GPR with no lattice at all.\n\n1\n\nIntroduction\n\nIn high-throughput regression problems, the cost of evaluating test samples is just as important as the\naccuracy of the regression and most non-parametric regression techniques do not produce models\nthat admit ef\ufb01cient implementation, particularly in hardware. For example, kernel-based methods\nsuch as Gaussian process regression [1] and support vector regression require kernel computations\nbetween each test sample and a subset of training examples, and local smoothing techniques such as\nweighted nearest neighbors [2] require a search for the nearest neighbors.\nFor functions with a known and bounded domain, a standard ef\ufb01cient approach to regression is to\nstore a regular lattice of function values spanning the domain, then interpolate each test sample\nfrom the lattice vertices that surround it. Evaluating the lattice is independent of the size of any\noriginal training set, but exponential in the dimension of the input space making it best-suited to\nlow-dimensional applications. In digital color management \u2014 where real-time performance often\nrequires millions of evaluations every second \u2014 the interpolated look-up table (LUT) approach is\nthe most popular implementation of the transformations needed to convert colors between devices,\nand has been standardized by the International Color Consortium (ICC) with a speci\ufb01cation called\nan ICC pro\ufb01le [3].\nFor applications where one begins with training data and must learn the lattice, the standard approach\nis to \ufb01rst estimate a function that \ufb01ts the training data, then evaluate the estimated function at the\nlattice points. However, this is suboptimal because the effect of interpolation from the lattice nodes\nis not considered when estimating the function. This begs the question: can we instead learn lattice\noutputs that accurately reproduce the training data upon interpolation?\nIterative post-processing solutions that update a given lattice to reduce the post-interpolation error\nhave been proposed by researchers in geospatial analysis [4] and digital color management [5]. In\n\n1\n\n\fthis paper, we propose a solution that we term lattice regression, that jointly estimates all of the\nlattice outputs by minimizing the regularized interpolation error on the training data. Experiments\nwith randomly-generated functions, geospatial data, and two color management tasks show that\nlattice regression consistently reduces error over the standard approach of evaluating a \ufb01tted function\nat the lattice points, in some cases by as much as 25%. More surprisingly, the proposed method can\nperform better than evaluating test points by Gaussian process regression using no lattice at all.\n\n2 Lattice Regression\n\nThe motivation behind the proposed lattice regression is to jointly choose outputs for lattice nodes\nthat interpolate the training data accurately. The key to this estimation is that the linear interpolation\noperation can be directly inverted to solve for the node outputs that minimize the squared error of\nthe training data. However, unless there is ample training data, the solution will not necessarily\nbe unique. Also, to decrease estimation variance it may be bene\ufb01cial to avoid \ufb01tting the training\ndata exactly. For these reasons, we add two forms of regularization to the minimization of the\ninterpolation error. In total, the proposed form of lattice regression trades off three terms: empirical\nrisk, Laplacian regularization, and a global bias. We detail these terms in the following subsections.\n\nj=1 mj and mj is the number of nodes along dimension j. Each node consists of an input-\noutput pair (ai \u2208 Rd, bi \u2208 Rp) and the inputs {ai} form a grid that contains D within its convex\n\n2.1 Empirical Risk\nWe assume that our data is drawn from the bounded input space D \u2282 Rd and the output space Rp;\n\ncollect the training inputs xi \u2208 D in the d \u00d7 n matrix X =(cid:2)x1, . . . , xn\nyi \u2208 Rp in the p \u00d7 n matrix Y = (cid:2)y1, . . . , yn\nm =(cid:81)d\nhull. Let A be the d \u00d7 m matrix A =(cid:2)a1, . . . , am\nsurrounding node outputs {bc1(x), . . . , bcq(x)}, i.e. \u02c6f(x) =(cid:80)\n\n(cid:3) and the training outputs\n(cid:3). Consider a lattice consisting of m nodes where\n(cid:3) and B be the p \u00d7 m matrix B =(cid:2)b1, . . . , bm\n(cid:3).\n\nFor any x \u2208 D, there are q = 2d nodes in the lattice that form a cell (hyper-rectangle) containing\nx from which an output will be interpolated; denote the indices of these nodes by c1(x), . . . , cq(x).\nFor our purposes, we restrict the interpolation to be a linear combination {w1(x), . . . , wq(x)} of the\ni wi(x)bci(x). There are many inter-\npolation methods that correspond to distinct weightings (for instance, in three dimensions: trilinear,\npyramidal, or tetrahedral interpolation [6]). Additionally, one might consider a higher-order inter-\npolation technique such as tricubic interpolation, which expands the linear weighting to the nodes\ndirectly adjacent to this cell. In our experiments we investigate only the case of d-linear interpo-\nlation (e.g. bilinear/trilinear interpolation) because it is arguably the most popular variant of linear\ninterpolation, can be implemented ef\ufb01ciently, and has the theoretical support of being the maximum\nentropy solution to the underdetermined linear interpolation equations [7].\nGiven the weights {w1(x), . . . , wq(x)} corresponding to an interpolation of x, let W (x) be the\nm \u00d7 1 sparse vector with cj(x)th entry wj(x) for j = 1, . . . , 2d and zeros elsewhere. Further, for\noutputs B\u2217 that minimize the total squared-(cid:96)2 distortion between the lattice-interpolated training\noutputs BW and the given training outputs Y are\n\ntraining inputs {x1, . . . , xn}, let W be the m \u00d7 n matrix W =(cid:2)W (x1), . . . , W (xn)(cid:3). The lattice\n\n(cid:16)(cid:0)BW \u2212 Y(cid:1)(cid:0)BW \u2212 Y(cid:1)T(cid:17)\n\n.\n\n(1)\n\nB\u2217 = arg min\n\ntr\n\nB\n\n2.2 Laplacian Regularization\n\nAlone, the empirical risk term is likely to pose an underdetermined problem and over\ufb01t to the train-\ning data. As a form of regularization, we propose to penalize the average squared difference of the\noutput on adjacent lattice nodes using Laplacian regularization. A somewhat natural regularization\nof a function de\ufb01ned on a lattice, its inclusion guarantees1 an unique solution to (1).\nThe graph Laplacian [8] of the lattice is fully de\ufb01ned by the m\u00d7m lattice adjacency matrix E where\nEij = 1 for nodes directly adjacent to one another and 0 otherwise. Given E, a normalized version\n\n1For large enough values of the mixing parameter \u03b1.\n\n2\n\n\fof the Laplacian can be de\ufb01ned as L = 2(diag(1T E) \u2212 E)/(1T E1), where 1 is the m \u00d7 1 all-ones\nvector. The average squared error between adjacent lattice outputs can be compactly represented as\n\ntr(cid:0)BLBT(cid:1) =\n\n(cid:18)\n\n1(cid:80)\n\np(cid:88)\n\nk=1\n\n(cid:88)\n\n(cid:19)\n\n(Bki \u2212 Bkj)2\n\n.\n\nij Eij\n\n{i,j | Eij =1}\n\nThus, inclusion of this term penalizes \ufb01rst-order differences of the function at the scale of the lattice.\n\n2.3 Global Bias\n\nAlone, the Laplacian regularization of Section 2.2 rewards smooth transitions between adjacent\nlattice outputs but only enforces regularity at the resolution of the nodes, and there is no incentive\nin either the empirical risk or Laplacian regularization term to extrapolate the estimated function\nbeyond the boundary of the cells that contain training samples. When the training data samples\ndo not span all of the grid cells, the lattice node outputs reconstruct a clipped function. In order\nto endow the algorithm with an improved ability to extrapolate and regularize towards trends in\nthe data, we also include a global bias term in the lattice regression optimization. The global bias\nterm penalizes the divergence of lattice node outputs from some global function \u02dcf : Rd \u2192 Rp that\napproximates the training data and this can be learned using any regression technique.\nGiven \u02dcf, we bias the lattice regression nodes towards \u02dcf\u2019s predictions for the lattice nodes by mini-\nmizing the average squared deviation:\n\n(cid:16)(cid:0)B \u2212 \u02dcf(A)(cid:1)(cid:0)B \u2212 \u02dcf(A))T(cid:17)\n\n.\n\n1\nm\n\ntr\n\nWe hypothesized that the lattice regression performance would be better if the \u02dcf was itself a good\nregression of the training data. Surprisingly, experiments comparing an accurate \u02dcf, an inaccurate \u02dcf,\nand no bias at all showed little difference in most cases (see Section 3 for details).\n\n2.4 Lattice Regression Objective Function\n\nCombined, the empirical risk minimization, Laplacian regularization, and global bias form the pro-\nposed lattice regression objective. In order to adapt an appropriate mixture of these terms, the regu-\nlarization parameters \u03b1 and \u03b3 trade-off the \ufb01rst-order smoothness and the divergence from the bias\nfunction, relative to the empirical risk. The combined objective solves for the lattice node outputs\nB\u2217 that minimize\n\n(cid:0)BW \u2212 Y(cid:1)(cid:0)BW \u2212 Y(cid:1)T + \u03b1BLBT + \u03b3\n\n(cid:0)B \u2212 \u02dcf(A)(cid:1)(cid:0)B \u2212 \u02dcf(A))T(cid:17)\n\n(cid:16) 1\n\nn\n\narg min\n\ntr\n\nB\n\nm\n\nwhich has the closed form solution\n\n(cid:18) 1\n\n(cid:19)(cid:18) 1\n\nn\n\n(cid:19)\u22121\n\n,\n\n,\n\n(2)\n\nB\u2217 =\n\nY W T + \u03b3\nm\n\n\u02dcf(A)\n\nn\n\nW W T + \u03b1L + \u03b3\nm\n\nI\n\nwhere I is the m \u00d7 m identity matrix.\nNote that this is a joint optimization over all lattice nodes simultaneously. Since the m \u00d7 m matrix\nthat is inverted in (2) is sparse (it contains no more than 3d nonzero entries per row2), (2) can be\nsolved using sparse Cholesky factorization [9]. On a 64bit 2.6GHz processor using the Matlab\ncommand mldivide, we found that we could compute solutions for lattices that contained on\nthe order of 104 nodes (a standard size for digital color management pro\ufb01ling [6]) in < 20s using\n< 1GB of memory but could not compute solutions for lattices that contained on the order of 105\nnodes.\n\n2For a given row, the only possible non-zero entries of W W T correspond to nodes that are adjacent in one\n\nor more dimensions and these non-zero entries overlap with those of L.\n\n3\n\n\f3 Experiments\n\nThe effectiveness of the proposed method was analyzed with simulations on randomly-generated\nfunctions and tested on a real-data geospatial regression problem as well as two real-data color\nmanagement tasks. For all experiments, we compared the proposed method to Gaussian process\nregression (GPR) applied directly to the \ufb01nal test points (no lattice), and to estimating test points\nby interpolating a lattice where the lattice nodes are learned by the same GPR. For the color man-\nagement task, we also compared a state-of-the art regression method used previously for this appli-\ncation: local ridge regression using the enclosing k-NN neighborhood [10]. In all experiments we\nevaluated the performance of lattice regression using three different global biases: 1) an \u201caccurate\u201d\nbias \u02dcf was learned by GPR on the training samples; an \u201cinaccurate\u201d bias \u02dcf was learned by a global\nd-linear interpolation3; and 3) the no bias case, where the \u03b3 term in (2) is \ufb01xed at zero.\nTo implement GPR, we used the MATLAB code provided by Rasmussen and Williams at http:\n//www.GaussianProcess.org/gpml. The covariance function was set as the sum of a\nsquared-exponential with an independent Gaussian noise contribution and all data were demeaned\nby the mean of the training outputs before applying GPR. The hyperparameters for GPR were\nset by maximizing the marginal likelihood of the training data (for details, see Rasmussen and\nWilliams [1]). To mitigate the problem of choosing a poor local maxima, gradient descent was\nperformed from 20 random starting log-hyperparameter values drawn uniformly from [\u221210, 10]3\nand the maximal solution was chosen. The parameters for all other algorithms were set by minimiz-\ning the 10-fold cross-validation error using the Nelder-Mead simplex method, bounded to values in\nthe range [1e\u22123, 1e3]. The starting point for this search was set at the default parameter setting for\neach algorithm: \u03bb = 1 for local ridge regression4 and \u03b1 = 1, \u03b3 = 1 for lattice regression. Experi-\nments on the simulated dataset comparing this approach to the standard cross-validation over a grid\nof values [1e\u22123, 1e\u22122, . . . , 1e3] \u00d7 [1e\u22123, 1e\u22122, . . . , 1e3] showed no difference in performance, and\nthe former was nearly 50% faster.\n\n3.1 Simulated Data\n\nWe analyzed the proposed method with simulations on randomly-generated piecewise-polynomial\nfunctions f : Rd \u2192 R formed from splines. These functions are smooth but have features that\noccur at different length-scales; two-dimensional examples are shown in Fig. 1. To construct each\nfunction, we \ufb01rst drew ten iid random points {si} from the uniform distribution on [0, 1]d, and ten\niid random points {ti} from the uniform distribution on [0, 1]. Then for each of the d dimensions\n\nwe \ufb01rst \ufb01t a one-dimensional spline \u02dcgk : R \u2192 R to the pairs {(cid:0)si)k, ti)}, where (si)k denotes the\n(cid:1), which was then scaled and shifted to have range spanning [0, 100]:\nfunction \u02dcg(x) =(cid:80)d\n\nkth component of si. We then combined the d one-dimensional splines to form the d-dimensional\n\n(cid:0)(x)k\n\nk=1 \u02dcgk\n\n(cid:18) \u02dcg(x) \u2212 minz\u2208[0,1]d \u02dcg(z)\n\n(cid:19)\n\nmaxz\u2208[0,1]d \u02dcg(z)\n\nf(x) = 100\n\n.\n\nFigure 1: Example random piecewise-polynomial functions created by the sum of one-dimensional\nsplines \ufb01t to ten uniformly drawn points in each dimension.\n\n3We considered the very coarse m = 2d lattice formed by the corner vertices of the original lattice and\n\nsolved (1) for this one-cell lattice, using the result to interpolate the full set of lattice nodes, forming \u02dcf (A).\n\n4Note that no locality parameter is needed for this local ridge regression as the neighborhood size is auto-\n\nmatically determined by enclosing k-NN [10].\n\n4\n\n\fFor input dimensions d \u2208 {2, 3}, a set of 100 functions {f1, . . . , f100} were randomly generated as\ndescribed above and a set of n \u2208 {50, 1000} randomly chosen training inputs {x1, . . . , xn} were\n\ufb01t by each regression method. A set of m = 10, 000 randomly chosen test inputs {z1, . . . , zm}\nwere used to evaluate the accuracy of each regression method in \ufb01tting these functions. For the rth\nrandomly-generated function fr, denote the estimate of the jth test sample by a regression method\nas (\u02c6yj)r. For each of the 100 functions and each regression method we computed the root mean-\nsquared errors (RMSE) where the mean is over the m = 10, 000 test samples:\n\n(cid:0)fr(zj) \u2212 (\u02c6yj)r\n\n(cid:1)2(cid:19)1/2\n\n.\n\n(cid:18) 1\n\nm\n\nm(cid:88)\n\nj=1\n\ner =\n\nThe mean and statistical signi\ufb01cance (as judged by a one-sided Wilcoxon with p = 0.05) of {er}\nfor r = 1, . . . , 100 is shown in Fig. 2 for lattice resolutions of 5, 9 and 17 nodes per dimension.\n\nLegend RGPR direct (cid:4)GPR lattice (cid:4)LR GPR bias (cid:4)LR d-linear bias (cid:4)LR no bias\n\nRanking by\nStatistical\nSigni\ufb01cance\n\nR\n(cid:4)\n(cid:4)(cid:4)\n(cid:4)\n\nR\n(cid:4)\n\n(cid:4)\n\n(cid:4)(cid:4)(cid:4) R(cid:4)(cid:4)(cid:4)\n\nRanking by\nStatistical\nSigni\ufb01cance\n\n(cid:4)(cid:4)(cid:4)\n\nR\n(cid:4)\n\nR\nR\n(cid:4)(cid:4) (cid:4)(cid:4)\n(cid:4)\n(cid:4)\n(cid:4)\n(cid:4)\n\n(a) d = 2, n = 50\n\n(b) d = 2, n = 1000\n\nRanking by\nStatistical\nSigni\ufb01cance\n\nR(cid:4)(cid:4)\n(cid:4)(cid:4)\n\n(cid:4)(cid:4)\nR(cid:4)(cid:4)\n\nR(cid:4)(cid:4)\n\n(cid:4)\n(cid:4)\n\nRanking by\nStatistical\nSigni\ufb01cance\n\nR\n(cid:4)\n(cid:4)(cid:4)(cid:4) (cid:4)(cid:4)\n(cid:4)\n\nR\n(cid:4)\n\nR(cid:4)\n(cid:4)\n(cid:4)\n(cid:4)\n\n(c) d = 3, n = 50\n\n(d) d = 3, n = 1000\n\nFigure 2: Shown is the average RMSE of the estimates given by each regression method on the\nsimulated dataset. As summarized in the legend, shown is GPR applied directly to the test samples\n(dotted line) and the bars are (from left to right) GPR applied to the nodes of a lattice which is then\nused to interpolate the test samples, lattice regression with a GPR bias, lattice regression with a d-\nlinear regression bias, and lattice regression with no bias. The statistical signi\ufb01cance corresponding\nto each group is shown as a hierarchy above each plot: method A is shown as stacked above method\nB if A performed statistically signi\ufb01cantly better than B.\n\nIn interpreting the results of Fig. 2, it is important to note that the statistical signi\ufb01cance test com-\npares the ordering of relative errors between each pair of methods across the random functions.\n\n5\n\n591701020Lattice Nodes Per DimensionError591701020Lattice Nodes Per DimensionError591701020Lattice Nodes Per DimensionError591701020Lattice Nodes Per DimensionError\fThat is, it indicates whether one method consistently outperforms another in RMSE when \ufb01tting the\nrandomly drawn functions.\nConsistently across the random functions, and in all 12 experiments, lattice regression with a GPR\nbias performs better than applying GPR to the nodes of the lattice. At coarser lattice resolutions, the\nchoice of bias function does not appear to be as important: in 7 of the 12 experiments (all at the low\nend of grid resolution) lattice regression using no bias does as well or better than that using a GPR\nbias.\nInterestingly, in 3 of the 12 experiments, lattice regression with a GPR bias achieves statistically\nsigni\ufb01cantly lower errors (albeit by a marginal average amount) than applying GPR directly to the\nrandom functions. This surprising behavior is also demonstrated on the real-world datasets in the\nfollowing sections and is likely due to large extrapolations made by GPR and in contrast, interpola-\ntion from the lattice regularizes the estimate which reduces the overall error in these cases.\n\n3.2 Geospatial Interpolation\n\nInterpolation from a lattice is a common representation for storing geospatial data (measurements\ntied to geographic coordinates) such as elevation, rainfall, forest cover, wind speed, etc. As a cursory\ninvestigation of the proposed technique in this domain, we tested it on the Spatial Interpolation Com-\nparison 97 (SIC97) dataset [11] from the Journal of Geographic Information and Decision Analysis.\nThis dataset is composed of 467 rainfall measurements made at distinct locations across Switzer-\nland. Of these, 100 randomly chosen sites were designated as training to predict the rainfall at\nthe remaining 367 sites. The RMSE of the predictions made by GPR and variants of the proposed\nmethod are presented in Fig 3. Additionally, the statistical signi\ufb01cance (as judged by a one-sided\nWilcoxon with p = 0.05) of the differences in squared error on the 367 test samples was computed\nfor each pair of techniques. In contrast to the previous section in which signi\ufb01cance was computed\non the RMSE across 100 randomly drawn functions, signi\ufb01cance in this section indicates that one\ntechnique produced consistently lower squared error across the individual test samples.\n\nLegend RGPR direct (cid:4)GPR lattice (cid:4)LR GPR bias (cid:4)LR d-linear bias (cid:4)LR no bias\n\nRanking by\nStatistical\nSigni\ufb01cance\n\nR\n(cid:4)(cid:4)\n(cid:4)(cid:4)\n\nR(cid:4)(cid:4)\n\n(cid:4)\n(cid:4)\n\n(cid:4)(cid:4)(cid:4)\n\n(cid:4)\nR\n\n(cid:4)\n\n(cid:4)(cid:4)\nR(cid:4)(cid:4)(cid:4) R(cid:4)(cid:4)\n\nFigure 3: Shown is the RMSE of the estimates given by each method for the SIC97 test samples.\nThe hierarchy of statistical signi\ufb01cance is presented as in Fig. 2.\n\nCompared with GPR applied to a lattice, lattice regression with a GPR bias again produces a lower\nRMSE on all \ufb01ve lattice resolutions. However, for four of the \ufb01ve lattice resolutions, there is no\nperformance improvement as judged by the statistical signi\ufb01cance of the individual test errors. In\ncomparing the effectiveness of the bias term, we see that on four of \ufb01ve lattice resolutions, using no\nbias and using the d-linear bias produce consistently lower errors than both the GPR bias and GPR\napplied to a lattice.\nAdditionally, for \ufb01ner lattice resolutions (\u2265 17 nodes per dimension) lattice regression either out-\nperforms or is not signi\ufb01cantly worse than GPR applied directly to the test points. Inspection of the\n\n6\n\n59173365050100Lattice Nodes Per DimensionRMSE\fmaximal errors con\ufb01rms the behavior posited in the previous section: that interpolation from the\nlattice imposes a helpful regularization. The range of values produced by applying GPR directly\nlies within [1, 552] while those produced by lattice regression (regardless of bias) lie in the range\n[3, 521]; the actual values at the test samples lie in the range [0, 517].\n\n3.3 Color Management Experiments with Printers\n\nDigital color management allows for a consistent representation of color information among diverse\ndigital imaging devices such as cameras, displays, and printers; it is a necessary part of many profes-\nsional imaging work\ufb02ows and popular among semi-professionals as well. An important component\nof any color management system is the characterization of the mapping between the native color\nspace of a device (RGB for many digital displays and consumer printers), and a device-independent\nspace such as CIE L\u2217a\u2217b\u2217 \u2014 abbreviated herein as Lab \u2014 in which distance approximates percep-\ntual notions of color dissimilarity [12].\nFor nonlinear devices such as printers, the color mapping is commonly estimated empirically by\nprinting a page of color patches for a set of input RGB values and measuring the printed colors\nwith a spectrophotometer. From these training pairs of (Lab, RGB) colors, one estimates the inverse\nmapping f : Lab \u2192 RGB that speci\ufb01es what RGB inputs to send to the printer in order to reproduce\na desired Lab color. See Fig. 4 for an illustration of a color-managed system. Estimating f is\nchallenging for a number of reasons: 1) f is often highly nonlinear; 2) although it can be expected\nto be smooth over regions of the colorspace, it is affected by changes in the underlying printing\nmechanisms [13] that can introduce discontinuities; and 3) device instabilities and measurement\nerror introduce noise into the training data. Furthermore, millions of pixels must be processed in\napproximately real-time for every image without adding undue costs for hardware, which explains\nthe popularity of using a lattice representation for color management in both hardware and software\nimaging systems.\n\nFigure 4: A color-managed printer system. For evaluation, errors are measured between the desired\n(L, a, b) and the resulting (\u02c6L, \u02c6a, \u02c6b) for a given device characterization.\n\nThe proposed lattice regression was tested on an HP Photosmart D7260 ink jet printer and a Samsung\nCLP-300 laser printer. As a baseline, we compared to a state-of-the-art color regression technique\nused previously in this application [10]: local ridge regression (LRR) using the enclosing k-NN\nneighborhood. Training samples were created by printing the Gretag MacBeth TC9.18 RGB image,\nwhich has 918 color patches that span the RGB colorspace. We then measured the printed color\npatches with an X-Rite iSis spectrophotometer using D50 illuminant at a 2\u25e6 observer angle and UV\n\ufb01lter. As shown in Fig. 4 and as is standard practice for this application, the data for each printer\nis \ufb01rst gray-balanced using 1D calibration look-up-tables (1D LUTs) for each color channel (see\n[10, 13] for details). We use the same 1D LUTs for all the methods compared in the experiment and\nthese were learned for each printer using direct GPR on the training data.\nWe tested each method\u2019s accuracy on reproducing 918 new randomly-chosen in-gamut5 test Lab\ncolors. The test errors for the regression methods the two printers are reported in Tables 1 and 2. As\nis common in color management, we report \u2206E76 error, which is the Euclidean distance between\nthe desired test Lab color and the Lab color that results from printing the estimated RGB output of\nthe regression (see Fig. 4).\nFor both printers, the lattice regression methods performed best in terms of mean, median and 95\n%-ile error. Additionally, according to a one-sided Wilcoxon test of statistical signi\ufb01cance with\n\n5We drew 918 samples iid uniformly over the RGB cube, printed these, and measured the resulting Lab\nvalues; these Lab values were used as test samples. This is a standard approach to assuring that the test samples\nare Lab colors that are in the achievable color gamut of the printer [10].\n\n7\n\nLearned Device CharacterizationRGB1D LUT1D LUT1D LUTR'G'B'Printer\u02c6L\u02c6b\u02c6abaL\fTable 1: Samsung CLP-300 laser printer\n\nLocal Ridge Regression (to \ufb01t lattice nodes)\nGPR (direct)\nGPR (to \ufb01t lattice nodes)\nLattice Regression (GPR bias)\nLattice Regression (Trilinear bias)\nLattice Regression (no bias)\n\nEuclidean Lab Error\n95 %-ile\nMedian\n\n4.10\n4.22\n4.17\n3.95\n3.75\n3.72\n\n9.80\n9.33\n9.62\n9.08\n8.39\n8.00\n\nMean\n4.59\n4.54\n4.54\n4.31\n4.14\n4.08\n\nMax\n14.59\n17.36\n15.95\n15.11\n15.59\n17.45\n\nTable 2: HP Photosmart D7260 inkjet printer\n\nLocal Ridge Regression (to \ufb01t lattice nodes)\nGPR (direct)\nGPR (to \ufb01t lattice nodes)\nLattice Regression (GPR bias)\nLattice Regression (Trilinear bias)\nLattice Regression (no bias)\nThe bold face indicates that the individual errors are statistically signi\ufb01cantly lower than the\nothers as judged by a one-sided Wilcoxon signi\ufb01cance test (p=0.05). Multiple bold lines indi-\ncate that there was no statistically signi\ufb01cant difference in the bolded errors.\n\n7.70\n6.36\n6.36\n5.96\n5.89\n4.89\n\nMax\n14.77\n11.08\n11.79\n10.25\n12.48\n10.51\n\nEuclidean Lab Error\n95 %-ile\nMedian\n\nMean\n3.34\n2.79\n2.76\n2.53\n2.34\n2.07\n\n2.84\n2.45\n2.36\n2.17\n1.84\n1.75\n\np = 0.05, all of the lattice regressions (regardless of the choice of bias) were statistically signif-\nicantly better than the other methods for both printers; on the Samsung, there was no signi\ufb01cant\ndifference between the choice of bias, and on the HP using the using no bias produced consistently\nlower errors. These results are surprising for three reasons. First, the two printers have rather dif-\nferent nonlinearities because the underlying physical mechanisms differ substantially (one is a laser\nprinter and the other is an inkjet printer), so it is a nod towards the generality of the lattice regres-\nsion that it performs best in both cases. Second, the lattice is used for computationally ef\ufb01ciency,\nand we were surprised to see it perform better than directly estimating the test samples using the\nfunction estimated with GPR directly (no lattice). Third, we hypothesized (incorrectly) that better\nperformance would result from using the more accurate global bias term formed by GPR than using\nthe very coarse \ufb01t provided by the global trilinear bias or no bias at all.\n\n4 Conclusions\n\nIn this paper we noted that low-dimensional functions can be ef\ufb01ciently implemented as interpola-\ntion from a regular lattice and we argued that an optimal approach to learning this structure from\ndata should take into account the effect of this interpolation. We showed that, in fact, one can di-\nrectly estimate the lattice nodes to minimize the empirical interpolated training error and added two\nregularization terms to attain smoothness and extrapolation. It should be noted that, in the experi-\nments, extrapolation beyond the training data was not directly tested: test samples for the simulated\nand real-data experiments were drawn mainly from within the interior of the training data.\nReal-data experiments showed that mean error on a practical digital color management problem\ncould be reduced by 25% using the proposed lattice regression, and that the improvement was sta-\ntistically signi\ufb01cant. Simulations also showed that lattice regression was statistically signi\ufb01cantly\nbetter than the standard approach of \ufb01rst \ufb01tting a function then evaluating it at the lattice points.\nSurprisingly, although the lattice architecture is motivated by computational ef\ufb01ciency, both our\nsimulated and real-data experiments showed that the proposed lattice regression can work better\nthan state-of-the-art regression of test samples without a lattice.\n\n8\n\n\fReferences\n[1] C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning (Adaptive\n\nComputation and Machine Learning), The MIT Press, 2005.\n\n[2] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, Springer-\n\nVerlag, New York, 2001.\n\n[3] D. Wallner, Building ICC Pro\ufb01les - the Mechanics and Engineering, chapter 4: ICC Pro\ufb01le\n\nProcessing Models, pp. 150\u2013167, International Color Consortium, 2000.\n\n[4] W. R. Tobler, \u201cLattice tuning,\u201d Geographical Analysis, vol. 11, no. 1, pp. 36\u201344, 1979.\n[5] R. Bala, \u201cIterative technique for re\ufb01ning color correction look-up tables,\u201d United States Patent\n\n5,649,072, 1997.\n\n[6] R. Bala and R. V. Klassen, Digital Color Handbook, chapter 11: Ef\ufb01cient Color Transforma-\n\ntion Implementation, CRC Press, 2003.\n\n[7] M. R. Gupta, R. M. Gray, and R. A. Olshen, \u201cNonparametric supervised learning by linear\ninterpolation with maximum entropy,\u201d IEEE Trans. on Pattern Analysis and Machine Intelli-\ngence (PAMI), vol. 28, no. 5, pp. 766\u2013781, 2006.\n\n[8] F. Chung, Spectral Graph Theory, Number 92 in Regional Conference Series in Mathematics.\n\nAmerican Mathematical Society, 1997.\n\n[9] T. A. Davis, Direct Methods for Sparse Linear Systems, SIAM, Philadelphia, September 2006.\n[10] M. R. Gupta, E. K. Garcia, and E. M. Chin, \u201cAdaptive local linear regression with application\nto printer color management,\u201d IEEE Trans. on Image Processing, vol. 17, no. 6, pp. 936\u2013945,\n2008.\n\n[11] G. Dubois, \u201cSpatial interpolation comparison 1997: Foreword and introduction,\u201d Special Issue\n\nof the Journal of Geographic Information and Descision Analysis, vol. 2, pp. 1\u201310, 1998.\n\n[12] G. Sharma, Digital Color Handbook, chapter 1: Color Fundamentals for Digital Imaging, pp.\n\n1\u2013114, CRC Press, 2003.\n\n[13] R. Bala, Digital Color Handbook, chapter 5: Device Characterization, pp. 269\u2013384, CRC\n\nPress, 2003.\n\n9\n\n\f", "award": [], "sourceid": 894, "authors": [{"given_name": "Eric", "family_name": "Garcia", "institution": null}, {"given_name": "Maya", "family_name": "Gupta", "institution": null}]}