{"title": "Min-Max Propagation", "book": "Advances in Neural Information Processing Systems", "page_first": 5565, "page_last": 5573, "abstract": "We study the application of min-max propagation, a variation of belief propagation, for approximate min-max inference in factor graphs. We show that for \u201cany\u201d high-order function that can be minimized in O(\u03c9), the min-max message update can be obtained using an efficient O(K(\u03c9 + log(K)) procedure, where K is the number of variables. We demonstrate how this generic procedure, in combination with efficient updates for a family of high-order constraints, enables the application of min-max propagation to efficiently approximate the NP-hard problem of makespan minimization, which seeks to distribute a set of tasks on machines, such that the worst case load is minimized.", "full_text": "Min-Max Propagation\n\nChristopher Srinivasa\nUniversity of Toronto\n\nBorealis AI\n\nchristopher.srinivasa\n\n@gmail.com\n\nInmar Givoni\nUniversity of\n\nToronto\n\ninmar.givoni\n@gmail.com\n\nSiamak Ravanbakhsh\n\nUniversity of\n\nBritish\n\nColumbia\n\nBrendan J. Frey\n\nUniversity of Toronto\n\nVector Institute\nDeep Genomics\n\nsiamakx@cs.ubc.ca\n\nfrey@psi.toronto.edu\n\nAbstract\n\nWe study the application of min-max propagation, a variation of belief propagation,\nfor approximate min-max inference in factor graphs. We show that for \u201cany\u201d high-\norder function that can be minimized in O(\u03c9), the min-max message update can be\nobtained using an ef\ufb01cient O(K(\u03c9 + log(K)) procedure, where K is the number\nof variables. We demonstrate how this generic procedure, in combination with\nef\ufb01cient updates for a family of high-order constraints, enables the application of\nmin-max propagation to ef\ufb01ciently approximate the NP-hard problem of makespan\nminimization, which seeks to distribute a set of tasks on machines, such that the\nworst case load is minimized.\n\n1\n\nIntroduction\n\nMin-max is a common optimization problem that involves minimizing a function with respect to\nsome variables X and maximizing it with respect to others Z: minX maxZ f (X, Z). For example,\nf (X, Z) may be the cost or loss incurred by a system X under different operating conditions Z, in\nwhich case the goal is to select the system whose worst-case cost is lowest. In Section 2, we show\nthat factor graphs present a desirable framework for solving min-max problems and in Section 3 we\nreview min-max propagation, a min-max based belief propagation algorithm.\nSum-product and min-sum inference using message passing has repeatedly produced groundbreaking\nresults in various \ufb01elds, from low-density parity-check codes in communication theory (Kschischang\net al., 2001), to satis\ufb01ability in combinatorial optimization and latent-factor analysis in machine\nlearning.\nAn important question is whether \u201cmin-max\u201d propagation can also yield good approximate solutions\nwhen dealing with NP-hard problems? In this paper we answer this question in two parts.\nI) Our main contribution is the introduction of an ef\ufb01cient min-max message passing procedure\nfor a generic family of high-order factors in Section 4. This enables us to approach new problems\nthrough their factor graph formulation. Section 5.2 leverages our solution for high-order factors\nto ef\ufb01ciently approximate the problem of makespan minimization using min-max propagation. II)\nTo better understand the pros and cons of min-max propagation, Section 5.1 compares it with the\nalternative approach of reducing min-max inference to a sequence of Constraint Satisfaction Problems\n(CSPs).\nThe feasibility of \u201cexact\u201d inference in a min-max semiring using the junction-tree method goes back\nto (Aji and McEliece, 2000). More recent work of (Vinyals et al., 2013) presents the application\nof min-max for junction-tree in a particular setting of the makespan problem. In this paper, we\ninvestigate the usefulness of min-max propagation in the loopy case and more importantly provide an\nef\ufb01cient and generic algorithm to perform message passing with high-order factors.\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\f2 Min-Max Optimization on Factor Graphs\n\nWe are interested in factorizable min-max problems minX maxZ f (X, Z) \u2013 i.e. min-max problems\nthat can be ef\ufb01ciently factored into a group of more simple functions. These have the following\nproperties:\n\n1. The cardinality of either X or Z (say Z) is linear in available computing resources (e.g. Z\n\nis an indexing variable a which is linear in the number of indices)\n2. The other variable can be decomposed, so that X = (x1, . . . , xN )\n3. Given Z, the function f () depends on only a subset of the variables in X and/or exhibits a\n\nform which is easier to minimize individually than when combined with f (X, Z)\n\nUsing a \u2208 F = {1, . . . , F} to index the values of Z and X\u2202a to denote the subset of variables that\nf () depends on when Z = a, the min-max problem can be formulated as,\n\nmin\nX\n\na\n\nmax\n\nfa(X\u2202a).\n\n(1)\nIn the following we use i, j \u2208 N = {1, . . . , N} to denote variable indices and a, b \u2208 {1, . . . , F} for\nfactor indices. A Factor Graph (FG) is a bipartite graphical representation of the above factorization\nproperties. In it, each function (i.e. factor fa) is represented by a square node and each variable is\nrepresented by a circular node. Each factor node is connected via individual edges to the variables\non which it depends. We use \u2202i to denote the set of neighbouring factor indices for variable i, and\nsimilarly we use \u2202a to denote the index set of variables connected to factor a.\nThis problem is related to the problems commonly analyzed using FGs (Bishop, 2006): the sum-\na fa(X\u2202a), and the max-product\na fa(X\u2202a) in which case we would respectively take product, sum, and product\n\nproblem, maxX\nrather than the max of the factors in the FG.\nWhen dealing with NP-hard problems, the FG contains one or more loop(s). While NP-hard problems\nhave been represented and (approximately) solved directly using message passing on FGs in the\nsum-product, min-sum, and max-product cases, to our knowledge this has never been done in the\nmin-max case.\n\nproduct problem,(cid:80)\n(cid:81)\n\na fa(X\u2202a), the min-sum problem, minX\n\n(cid:81)\n\nX\n\n(cid:80)\n\n3 Min-Max Propagation\n\nAn important question is how min-max can be computed on FGs. Consider the sum-product\nalgorithm on FGs which relies on the sum and product operations satisfying the distributive law\na(b + c) = ab + ac (Aji and McEliece, 2000).\nMin and max operators also satisfy the distributive law: min(max(\u03b1, \u03b2), max(\u03b1, \u03b3)) =\nmax(\u03b1, min(\u03b2, \u03b3)). Using (min, max,(cid:60)) semiring, the belief propagation updates are as follows.\nNote that these updates are analogous to sum-product belief propagation updates, where sum is\nreplaced by min and product operation is replaced by max.\nVariable-to-Factor Messages. The message sent from variable xi\nto function fa is\n\n\u00b5ia(xi) = max\nb\u2208\u2202i\\a\n\n\u03b7bi(xi)\n\n(2)\n\nFigure 1: Variable-to-factor mes-\nsage.\n\nwhere \u03b7bi(xi) is the message sent from function fb to variable xi (as\nshown in Fig. 1) and \u2202i \\ a is the set of all neighbouring factors of\nvariable i, with a removed.\nFactor-to-Variable Messages. The message sent from function fa\nto variable xi is computed using\n\n(cid:18)\n\n(cid:19)\n\n\u03b7ai(xi) = min\nX\u2202a\\i\n\nmax\n\nfa(X\u2202a), max\nj\u2208\u2202a\\i\n\n\u00b5ja(xj)\n\n(3)\n\nInitialization Using the Identity. In the sum-product algorithm,\nmessages are usually initialized using knowledge of the identity of\n\nFigure 2: Factor-to-variable\nmessage.\n\n2\n\n\fmax\n\nmax\n\nthe product operation. For example, if the FG is a tree with some node chosen as a root, messages can\nbe passed from the leaves to the root and back to the leaves. The initial message sent from a variable\nthat is a leaf involves taking the product for an empty set of incoming messages, and therefore the\nmessage is initialized to the identity of the group ((cid:60)+,\u00d7), which is\n1 , x) = x\u2200x \u2208 (cid:60) \u2013\nIn this case, we need the identity of the ((cid:60), max) semi-group, where max(\n1 = \u2212\u221e. Examining Eq. (3), we see that the message sent from a function that is a leaf will\nthat is\ninvolve maximizing over the empty set of incoming messages. So, we can initialize the message sent\nfrom function fa to variable xi using \u03b7ai(xi) = minX\u2202a\\xi\nMarginals. Min-max marginals, which involve \u201cminimizing\u201d over\nall variables except some xi, can be computed by taking the max of\nall incoming messages at xi as in Fig. 3:\n\n\u00d7\n1 = 1.\n\nfa(X\u2202a).\n\nm(xi) = min\nXN\\i\n\nmax\n\na\n\nfa(X\u2202a) = max\nb\u2208\u2202i\n\n\u03b7bi(xi)\n\n(4)\n\nThe value of xi that achieves the global solution is given by\narg minxi m(xi).\n\nFigure 3: Marginals.\n\n4 Ef\ufb01cient Update for High-Order Factors\n\nWhen passing messages from factors to variables, we are interested in ef\ufb01ciently evaluating Eq. (3).\nIn its original form, this computation is exponential in the number of neighbouring variables |\u2202a|.\nSince many interesting problems require high-order factors in their FG formulation, many have\ninvestigated ef\ufb01cient min-sum and sum-product message passing through special family of, often\nsparse, factors (e.g. Tarlow et al., 2010; Potetz and Lee, 2008).\nFor the time being, consider the factors over binary variables xi \u2208 {0, 1}\u2200i \u2208 \u2202a and further assume\nthat ef\ufb01cient minimization of the factor fa is possible.\nAssumption 1. The function fa : X\u2202a \u2192 (cid:60) can be minimized in time O(\u03c9) with any subset B \u2282 \u2202a\nof its variables \ufb01xed.\nIn the following we show how to calculate min-max factor-to-variable messages in O(K(\u03c9 +\nlog(K))), where K = |\u2202a| \u2212 1. In comparison to the limited settings in which high-order factors\nallow ef\ufb01cient min-sum and sum-product inference, we believe this result to be quite general.1\nThe idea is to break the problem in half, at each iteration. We show that for one of these halves, we\ncan obtain the min-max value using a single evaluation of fa. By reducing the size of the original\nproblem in this way, we only need to choose the \ufb01nal min-max message value from a set of candidates\nthat is at most linear in |\u2202a|.\nProcedure. According to Eq. (3), in calculating the factor-to-variable message \u03b7ai(xi) for a \ufb01xed\nxi = c, we are interested in ef\ufb01ciently solving the following optimization problem\n\nmax(cid:0)\u00b51(x1), \u00b52(x2), ..., \u00b5K(xK), f (X\u2202a\\i, xi = ci)(cid:1)\n\n(5)\nwhere, without loss of generality we are assuming \u2202a\\ i = {1, . . . , K}, and for better readability, we\ndrop the index a, in factors (fa), messages (\u00b5ka, \u03b7ai) and elsewhere, when it is clear from the context.\nThere are 2K con\ufb01gurations of X\u2202a\\i, one of which is the minimizing solution. We will\ndivide this set in half in each iteration and save the minimum in one of these halves in\nthe min-max candidate list C. The maximization part of the expression is equivalent to\nmax (max (\u00b51(x1), \u00b52(x2), ..., \u00b5K(xK)) , f (X\u2202a, xi = ci)).\nLet \u00b5j1(cj1) be the largest \u00b5 value that is obtained at some index j1, for some value cj1 \u2208 {0, 1}.\nIn other words \u00b5j1(cj1 ) = max (\u00b51(0), \u00b51(1), ..., \u00b5K(0), \u00b5K(1)). For future use, let j2, . . . , jK be\nthe index of the next largest message indices up to the K largest ones, and let cj2 , . . . , cjK be their\n1 Here we show that solving the minimization problem on any particular factor can be solved in a \ufb01xed\namount of time. In many applications, doing this might itself involve running another entire inference algorithm.\nHowever, please note that our algorithm is agnostic to such choices for optimization of individual factors.\n\nmin\nX\u2202a\\i\n\n3\n\n\f(cid:18)\n\n(cid:19)\n\ncorresponding assignment. Note that the same message (e.g. \u00b53(0), \u00b53(1)) could appear in this sorted\nlist at different locations.\nWe then partition the set of all assignments to X\u2202a\\i into two sets of size 2K\u22121 depending on the\nassignment to xj1: 1) xj1 = cj1 or; 2) xj1 = 1\u2212 cj1. The minimization of Eq. (5) can also be divided\nto two minimizations each having xj1 set to a different value. For xj1 = cj1, Eq. (5) simpli\ufb01es to\n\n\u03b7(j1) = max\n\n\u00b5j1(cj1 ), min\n\nX\u2202a\\{i,j1}\n\n(f (X\u2202a\\{i,j1}, xi = ci, xj1 = cj1 ))\n\n(6)\n\nwhere we need to minimize f, subject to a \ufb01xed xi, xj1. We repeat the procedure above at most K\ntimes, for j1, . . . , jk, . . . jK, where at each iteration we obtain a candidate solution, \u03b7(jm) that we add\nto the candidate set C = {\u03b7(j1), . . . , \u03b7(jK )}. The \ufb01nal solution is the smallest value in the candidate\nsolution set, minC.\nEarly Termination. If jk = jk(cid:48) for 1 \u2264 k, k(cid:48) \u2264 K it means that we have performed the minimization\nof Eq. (5) for both xjk = 0 and xjk = 1. This means that we can terminate the iterations and report\nthe minimum in the current candidate set. Adding the cost of sorting O(K log(K)) to the worst case\ncost of minimization of f () in Eq. (6) gives a total cost of O(K(log(K) + \u03c9)).\nArbitrary Discrete Variables. This algorithm is not limited to binary variables. The main difference\nin dealing with cardinality D > 2, is that we run the procedure for at most K(D \u2212 1) iterations, and\nin early termination, all variable values should appear in the top K(D \u2212 1) incoming message values.\nFor some factors, we can go further and calculate all factor-to-variable messages leaving fa in a time\nlinear in |\u2202a|. The following section derives such update rule for a type of factor that we use in the\nmake-span application of Section 5.2.\n\n4.1 Choose-One Constraint\nIf fa(X\u2202a) implements a constraint such that only a subset of con\ufb01gurations XA \u2282 X\u2202a, of the\npossible con\ufb01gurations of X\u2202a \u2208 X\u2202a are allowed, then the message from function fa to xi simpli\ufb01es\nto\n\n\u03b7ai(x(cid:48)\n\ni) =\n\nmin\n\nX\u2202a\u2208Aa|xi=x(cid:48)\n\nmax\nj\u2208\u2202a\\i\n\n\u00b5ja(xj)\n\n(7)\n\nbe set to one and all others to zero. Consider the constraint f (x1, ..., xK) = \u03b4((cid:80)\n\nIn many applications, this can be further simpli\ufb01ed by taking into account properties of the constraints.\nHere, we describe such a procedure for factors which enforce that exactly one of their binary variables\nk xk, 1) for binary\nvariables xk \u2208 {0, 1}, where \u03b4(x, x(cid:48)) evaluates to \u2212\u221e iff x = x(cid:48) and \u221e otherwise.2\nUsing X\\i = (x1, x2, ..., xi\u22121, xi+1, ..., xK) for X with xi removed, Eq. (7) becomes\n\ni\n\nX\\i|(cid:80)K\n(cid:26) maxk|k(cid:54)=i \u00b5k(0)\n\nmin\nk=1 xk=1\n\nmax\nk|k(cid:54)=i\n\n\u00b5k(xk)\n\nif xi = 1\n\n\u03b7i(xi) =\n\n=\n\nminX\\i\u2208{(1,0,...,0),(0,1,...,0),...,(0,0,...,1)} maxk|k(cid:54)=i \u00b5k(xk)\n\n(8)\n\nif xi = 0\n\nNaive implementation of the above update is O(K 2) for each xi , or O(K 3) for sending messages\nto all neighbouring xi. However, further simpli\ufb01cation is possible. Consider the calculation of\nmaxk|k(cid:54)=i \u00b5k(xk) for X\\i = (1, 0, . . . , 0) and X\\i = (0, 1, . . . , 0). All but the \ufb01rst two terms in\nthese two sets are the same (all zero), so most of the comparisons that were made when computing\nmaxk|k(cid:54)=i \u00b5k(xk) for the \ufb01rst set, can be reused when computing it for the second set. This extends\nto all K \u2212 1 sets (1, 0, . . . , 0), . . . , (0, 0, . . . , 1), and also extends across the message updates for\ndifferent xi\u2019s. After examining the shared terms in the maximizations, we see that all that is needed\nis\n\nk(1)\ni = arg max\nk|k(cid:54)=i\n\n\u00b5k(0),\n\nk(2)\ni = arg max\n\n\u00b5k(0),\n\nk|k(cid:54)=i,k(1)\n\ni\n\n(9)\n\nthe indices of the maximum and second largest values of \u00b5k(0) with i removed from consideration.\nNote that these can be computed for all neighbouring xi in time linear in K, by \ufb01nding the top three\n2Similar to any other semiring, \u00b1\u221e as the identities of min and max have a special role in de\ufb01ning\n\nconstraints.\n\n4\n\n\fvalues of \u00b5k(0) and selecting two of them appropriately depending on whether \u00b5i(0) is among the\nthree values. Using this notation, the above update simpli\ufb01es as follows:\n\n(0))(cid:1)\n(0))(cid:1)\n\nif xj = 0\n\nif xi = 0\n\nif xi = 1\n\n(0)\n\n(cid:40) \u00b5k(1)\nmin(cid:0) mink|k(cid:54)=i,k(1)\n(cid:40) \u00b5k(1)\nmin(cid:0) max(\u00b5k(1)\n\n(0)\n\nai\n\ni\n\ni\n\ni\n\nif xi = 1\n\n\u03b7i(xi) =\n\n=\n\nThe term mink|k(cid:54)=i,k(1)\nDe\ufb01ne the following:\n\ni\n\nmax(\u00b5k(1)\n\ni\n\n(0), \u00b5k(1)), max(\u00b5k(1)\n\ni\n\n(1), \u00b5k(2)\n\ni\n\n(0), mink|k(cid:54)=i,k(1)\n\n\u00b5k(1)), max(\u00b5k(1)\n\n(1), \u00b5k(2)\n\ni\n\ni\n\ni\n\n(10)\n\u00b5k(1) also need not be recomputed for every xi, since terms will be shared.\n\nsi = arg min\nk(cid:54)=i,k(1)\n\ni\n\n\u00b5k(1),\n\n(11)\n\n(cid:40) \u00b5k(1)\nmin(cid:0) max(\u00b5k(1)\n\n(0)\n\ni\n\ni\n\nif xi = 1\n\n\u03b7i(xi) =\n\n(0))(cid:1)\n\nwhich is the index of the smallest value of \u00b5k(1) with i and k(1)\nremoved from consideration. This\ncan be computed ef\ufb01ciently for all i in time that is linear in K by \ufb01nding the smallest three values of\n\u00b5k(1) and selecting one of them appropriately depending on whether \u00b5i(1) and/or \u00b5k(1)\nare among\nthe three values. The resulting message update for K-choose-1 constraint becomes\n\ni\n\ni\n\n(0), \u00b5si(1)), max(\u00b5k(1)\n\ni\n\n(1), \u00b5k(2)\n\ni\n\nif xi = 0\n\n(12)\n\nThis shows that messages to all neighbouring variables x1, ..., xK can be obtained in time that is\nlinear in K. This type of constraint also has a tractable form in min-sum and sum-product inference,\nalbeit of a different form (e.g. see Gail et al., 1981; Gupta et al., 2007).\n\n5 Experiments and Applications\n\nIn the \ufb01rst part of this section we compare min-max propagation with the only alternative min-max\ninference method over FGs that relies on sum-product reduction. In the second part, we formulate\nthe real-world problem of makespan minimization as a min-max inference problem, with high-order\nfactors. In this application, the sum-product reduction is not tractable; to formulate the makespan\nproblem using a FG we need to use high-order factors that do not allow an ef\ufb01cient (polynomial\ntime) sum-product message update. However, min-max propagation can be applied using the ef\ufb01cient\nupdates of the previous section.\n\n5.1 Sum-Product Reduction vs. Min-Max Propagation\n\nLike all belief propagation algorithms, min-max propagation is exact when the FG is tree. However,\nour \ufb01rst point of interest is how min-max propagation performs on loopy graphs. For this, we compare\nits performance against the sum-product (or CSP) reduction.\nSum-product reduction of (Ravanbakhsh et al., 2014) seeks the min-max value using bisection-search\nover all values in the range of all factors in the FG \u2013 i.e. Y = {fa(X\u2202a)\u2200a, X\u2202a}. In each step of\nthe search a value y \u2208 Y is used to reduce the min-max problem to a CSP. This CSP is satis\ufb01able\niff the min-max solution y\u2217 = minX maxa fa(X\u2202a) is less than the current y. The complexity of\nthis search procedure is O(log(|Y|)\u03c4 ), where \u03c4 is the complexity of solving the CSP. Following that\npaper, we use Perturbed Belief Propagation (PBP) (Ravanbakhsh and Greiner, 2015) to solve the\nresulting CSPs.\nExperimental Setup. Our setup is based on the following observations\nObservation 1. For any strictly monotonically increasing function g : (cid:60) \u2192 (cid:60),\n\narg min\n\nX\n\nmax\n\na\n\nfa(X\u2202a) = arg min\nX\n\nmax\n\na\n\ng(fa(X\u2202a))\n\nthat is only the ordering of the factor values affects the min-max assignment. Using the same\nargument, application of monotonic g() does not inherently change the behaviour of min-max\npropagation either.\n\n5\n\n\fn\no\ni\nt\nu\nl\no\nS\nx\na\nM\n-\nn\ni\nM\nn\na\ne\n\nM\n\nn\no\ni\nt\nu\nl\no\nS\nx\na\nM\n-\nn\ni\nM\nn\na\ne\n\nM\n\nConnectivity\n\nConnectivity\n\nConnectivity\n\nFigure 4: Min-max performance of different methods on Erdos-Renyi random graphs. Top: N=10, Bottom:\nN=100, Left: D=4, Middle: D=6, Right: D=8.\n\nObservation 2. Only the factor(s) which output(s) the max value, i.e. max factor(s), matter. For\nall other factors the variables involved can be set in any way as long as the factors\u2019 value remains\nsmaller or equal to that of the max factor.\n\nThis means that variables that do not appear in the max factor(s), which we call free variables, could\npotentially assume any value without affecting the min-max value. Free variables can be identi\ufb01ed\nfrom their uniform min-max marginals. This also means that the min-max assignment is not unique.\nThis phenomenon is unique to min-max inference and does not appear in min-sum and sum-product\ncounterparts.\nWe rely on this observation in designing benchmark random min-max inference problems: i) we use\nintegers as the range of factor values; ii) by selecting all factor values in the same range, we can use\nthe number of factors as a control parameter for the dif\ufb01culty of the inference problem.\nFor N variables x1, . . . , xN , where each xi \u2208 {1, . . . , D}, we draw Erdos-Renyi graphs with edge\nprobability p \u2208 (0, 1] and treat each edge as a pairwise factor. Consider the factor fa(xi, xj) =\nmin(\u03c0(xi), \u03c0(cid:48)(xj)), where \u03c0, \u03c0(cid:48) are permutations of {1, . . . , D}. With D = 2, this de\ufb01nition of\nfactor fa reduces to 2-SAT factor. This setup for random min-max instances therefore generalizes\ndifferent K-SAT settings, where the min-max solution of minX maxa fa(X\u2202a) = 1 for D = 2,\ncorresponds to a satisfying assignment. The same argument with K > 2 establishes the \u201cNP-hardness\u201d\nof min-max inference in factor-graphs.\nWe test our setup on graphs with N \u2208 {10, 100} variables and cardinality D \u2208 {4, 6, 8}. For each\nchoice of D and N, we run min-max propagation and sum-product reduction for various connectivity\nin the Erdos-Renyi graph. Both methods use random sequential update. For N = 10 we also report\nthe exact min-max solutions.\nMin-max propagation is run for a maximum T = 1000 iterations or until convergence, whichever\ncomes \ufb01rst. The number of iterations actually taken by min-max propagation are reported in appendix.\nThe PBP used in the sum-product reduction requires a \ufb01xed T ; we report the results for T equal to\nthe worse case min-max convergence iterations (see appendix) and T = 1000 iterations. Each setting\nis repeated 10 times for a random graph of a \ufb01xed connectivity value p \u2208 (0, 1].\nDecimation. To obtain a \ufb01nal min-max assignment we need to \ufb01x the free variables. For this we use\na decimation scheme similar to what is used with min-sum inference or in \ufb01nding a satisfying CSP\nassignment in sum-product. We consider three different decimation procedures:\n\nRandom: Randomly choose a variable, set it to the state with minimum min-max marginal value.\nMin-value: Fix the variable with the minimum min-max marginal value.\n\n6\n\n0.10.20.30.40.50.60.70.80.9012345Min-Max Propagation (Random decimation)Min-Max Propagation (Max support decimation)Min-Max Propagation (Min value decimation)PBP CSP Solver (Min-max prop iterations)PBP CSP Solver (1000 iterations)Upper BoundBrute Force0.10.20.30.40.50.60.70.80.9012345670.10.20.30.40.50.60.70.80.901234567890.10.20.30.40.50.60.70.80.90123450.10.20.30.40.50.60.70.80.9012345670.10.20.30.40.50.60.70.80.90123456789\fMax-support: Choose the variable for which its min value occurs with the highest frequency.\n\nResults. Fig. 4 compares the performance of sum-product reduction that relies on PBP with min-max\npropagation and brute-force. For min-max propagation we report the results for three different\ndecimation procedures. Each column uses a different variable cardinality D. While this changes\nthe range of values in the factors, we observe a similar trend in performance of different methods.\nIn the top row, we also report the exact min-max value. As expected, by increasing the number of\nfactors (connectivity) the min-max value increases. Overall, the sum-product reduction (although\nasymptotically more expensive), produces slightly better results. Also different decimation schemes\ndo not signi\ufb01cantly affect the results in these experiments.\n\n5.2 Makespan Minimization\n\nThe objective in the makespan problem is to\nschedule a set of given jobs, each with a load,\non machines which operate in parallel such that\nthe total load for the machine which has the\nlargest total load (i.e.\nthe makespan) is mini-\nmized (Pinedo, 2012). This problem has a range\nof applications, for example in the energy sector,\nwhere the machines represent turbines and the\njobs represent electrical power demands.\nGiven N distinct jobs N = {1, . . . , n, . . . , N}\nand M machines M = {1, . . . , m, . . . , M},\nwhere pnm represents the load of machine\nm, we denote the assignment variable xnm as\nwhether or not job n is assigned to machine\nm. The task is to \ufb01nd the set of assignments\nxnm \u2200 n \u2208 N , \u2200 m \u2208 M which minimizes the total cost function below, while satisfying the\nassociated set of constraints:\n\nFigure 5: Makespan FG.\n\nmin\nX\n\nmax\n\nm\n\npnmxnm\n\ns.t.\n\nn=1\n\nm=1\n\nxnm = 1\n\nxnm \u2208 {0, 1}\n\n\u2200 n \u2208 N , m \u2208 M (13)\n\n(cid:32) N(cid:88)\n\n(cid:33)\n\nM(cid:88)\n\nM N\n\nLPT Min-Max Prop\n(Random Dec.)\n\nMin-Max Prop\n\n(Max-Support Dec.)\n\nMin-Max Prop\n(Min-Value Dec.)\n\nFigure 6: Min-max ratio to a lower bound (lower is better) obtained by\nLPT with 4/3-approximation guarantee versus min-max propagation using\ndifferent decimation procedures. N is the number of jobs and M is the\nnumber of machines. In this setting, all jobs have the same run-time across\nall machines.\n\nThe makespan minimiza-\ntion problem is NP-\nhard for M = 2\nand strongly NP-hard for\nM > 2 (Garey and\nJohnson, 1979). Two\nwell-known approxima-\ntion algorithms are the\n2-approximation greedy\nalgorithm and the 4/3-\napproximation greedy al-\ngorithm. In the former,\nall machines are initial-\nized as empty. We then\nselect one job at ran-\ndom and assign it to the\nmachine with least total\nload given the current\njob assignments. We re-\npeat this process until no jobs remain. This algorithm is guaranteed to give a schedule with a\nmakespan no more than 2 times larger than the one for the optimal schedule (Behera, 2012; Behera\nand Laha, 2012) The 4/3-approximation algorithm, a.k.a. the Longest Processing Time (LPT) algo-\nrithm, operates similar to the 2-approximation algorithm with the exception that, at each iteration,\nwe always take the job with the next largest load rather than selecting one of the remaining jobs at\nrandom (Graham, 1966).\n\n1.091\n1.079\n1.081\n1.071\n1.055\n1.079\n1.110\n1.109\n1.077\n1.074\n1.077\n1.051\n\n1.178\n1.144\n1.135\n1.117\n1.112\n1.094\n1.184\n1.165\n1.138\n1.124\n1.112\n1.102\n\n1.183\n1.167\n1.144\n1.132\n1.117\n1.109\n1.168\n1.186\n1.183\n1.126\n1.131\n1.100\n\n1.128\n1.112\n1.093\n1.086\n1.077\n1.074\n1.105\n1.111\n1.088\n1.090\n1.081\n1.076\n\n25\n26\n33\n34\n41\n42\n31\n32\n41\n42\n51\n52\n\n8\n\n10\n\n7\n\n\fN(cid:88)\n\nn=1\n\n(cid:16)\n\n(cid:26)\n\n\u221e,\n\n0, (cid:80)M\n(cid:17)\n\nFG Representation. Fig. 5 shows the FG with binary variables xnm, where the factors are\n\nfm(x1m, . . . , xN m) =\n\npnmxnm \u2200m ;\n\ngn(xn1, . . . , xnM ) =\n\nm=1 xnm = 1\n\notherwise\n\n\u2200n\n\nwhere f () computes the total load for a machine according to and g() enforces the constraint in\nEq. (13). We see that the following min-max problem over this FG minimizes the makespan\n\nmin\nX\n\nmax\n\nmax\n\nm\n\nfm(x1m, ..., xN m), max\n\nn\n\ngn(xn1, ..., xnM )\n\n.\n\n(14)\n\nMode N/M (Vinyals et al., 2013) Min-Max Prop\n\nFigure 7: Min-max ratio (LP relaxation to that) of\nmin-max propagation versus same for the method\nof (Vinyals et al., 2013) (higher is better). Mode 0,\n1 and 2 corresponds to uncorrelated, machine cor-\nrelated and machine-task correlated respectively.\n\nUsing the procedure for passing messages through the g constraints in Section 4.1 and using the\nprocedure of Section 4 for f, we can ef\ufb01ciently approximate the min-max solution of Eq. (14) by\nmessage passing. Note that the factor f () in the sum-product reduction of this FG has a non-trivial\nform that does not allow ef\ufb01cient message update.\nResults. In an initial set of experiments, we\ncompare min-max propagation (with different\ndecimation procedures) with LPT on a set of\nbenchmark experiments designed in (Gupta and\nRuiz-Torres, 2001) for the identical machine\nversion of the problem \u2013 i.e. a task has the same\nprocessing time on all machines.\nFig. 6 shows the scenario where min-max prop\nperforms best against the LPT algorithm. We see\nthat this scenario involves large instance (from\nthe additional results in the appendix, we see\nthat our framework does not perform as well\non small instances). From this table, we also\nsee that max-support decimation almost always\noutperforms the other decimation schemes.\nWe then test the min-max propagation with max-support decimation against a more dif\ufb01cult version\nof the problem: the unrelated machine model, where each job has a different processing time on\neach machine. Speci\ufb01cally, we compare our method against that of (Vinyals et al., 2013) which also\nuses distributive law for min-max inference to solve a load balancing problem. However, that paper\nstudies a sparsi\ufb01ed version of the unrelated machines problem where tasks are restricted to a subset\nof machines (i.e. they have in\ufb01nite processing time for particular machines). This restriction, allows\nfor decomposition of their loopy graph into an almost equivalent tree structure, something which\ncannot be done in the general setting. Nevertheless, we can still compare their results to what we can\nachieve using min-max propagation with in\ufb01nite-time constraints.\nWe use the same problem setup with three different ways of generating the processing times (uncorre-\nlated, machine correlated, and machine/task correlated) and compare our answers to IBM\u2019s CPLEX\nsolver exactly as the authors do in that paper (where a high ratio is better). Fig. 7 shows a subset\nof results. Here again, min-max propagation works best for large instances. Overall, despite the\ngenerality of our approach the results are comparable.\n\n0.95(0.01)\n0.93(0.01)\n0.90(0.01)\n0.86(0.07)\n0.88(0.00)\n0.73(0.03)\n0.89(0.01)\n0.89(0.01)\n0.86(0.01)\n\n0.93(0.03)\n0.94(0.01)\n0.94(0.00)\n0.90(0.01)\n0.90(0.00)\n0.87(0.01)\n0.81(0.01)\n0.81(0.01)\n0.78(0.01)\n\n5\n10\n15\n5\n10\n15\n5\n10\n15\n\n0\n\n1\n\n2\n\n6 Conclusion\n\nThis paper demonstrates that FGs are well suited to model min-max optimization problems with\nfactorization characteristics. To solve such problems we introduced and evaluated min-max propa-\ngation, a variation of the well-known belief propagation algorithm. In particular, we introduced an\nef\ufb01cient procedure for passing min-max messages through high-order factors that applies to a wide\nrange of functions. This procedure equips min-max propagation with an ammunition unavailable\nto min-sum and sum-product message passing and it could enable its application to a wide range\nof problems. In this work we demonstrated how to leverage ef\ufb01cient min-max-propagation at the\npresence of high-order factors, in approximating the NP-hard problem of makespan. In the future, we\nplan to investigate the application of min-max propagation to a variety of combinatorial problems,\nknown as bottleneck problems (Edmonds and Fulkerson, 1970) that can be naturally formulated as\nmin-max inference problems over FGs.\n\n8\n\n\fReferences\nS. M. Aji and R. J. McEliece. The generalized distributive law. Information Theory, IEEE Transactions\n\non, 46(2):325\u2013343, 2000.\n\nD. Behera. Complexity on parallel machine scheduling: A review. In S. Sathiyamoorthy, B. E.\nCaroline, and J. G. Jayanthi, editors, Emerging Trends in Science, Engineering and Technology,\nLecture Notes in Mechanical Engineering, pages 373\u2013381. Springer India, 2012.\n\nD. K. Behera and D. Laha. Comparison of heuristics for identical parallel machine scheduling.\n\nAdvanced Materials Research, 488:1708\u20131712, 2012.\n\nC. M. Bishop. Pattern recognition and machine learning. Springer-Verlag New York, Inc., Secaucus,\n\nNJ, USA, 2006.\n\nJ. Edmonds and D. R. Fulkerson. Bottleneck extrema. Journal of Combinatorial Theory, 8(3):\n\n299\u2013306, 1970.\n\nM. H. Gail, J. H. Lubin, and L. V. Rubinstein. Likelihood calculations for matched case-control\n\nstudies and survival studies with tied death times. Biometrika, pages 703\u2013707, 1981.\n\nM. R. Garey and D. S. Johnson. Computers and intractability, volume 174. Freeman San Francisco,\n\n1979.\n\nR. L. Graham. Bounds for certain multiprocessing anomalies. Bell System Technical Journal, 45(9):\n\n1563\u20131581, 1966.\n\nJ. N. D. Gupta and A. J. Ruiz-Torres. A list\ufb01t heuristic for minimizing makespan on identical parallel\n\nmachines. Production Planning & Control, 12(1):28\u201336, 2001.\n\nR. Gupta, A. A. Diwan, and S. Sarawagi. Ef\ufb01cient inference with cardinality-based clique potentials.\nIn Proceedings of the 24th international conference on Machine learning, pages 329\u2013336. ACM,\n2007.\n\nF. Kschischang, B. Frey, and H.-A. Loeliger. Factor graphs and the sum-product algorithm. IEEE\n\nTransactions on Information Theory, 47(2):498 \u2013519, 2001.\n\nM. Pinedo. Scheduling: theory, algorithms, and systems. Springer, 2012.\n\nB. Potetz and T. S. Lee. Ef\ufb01cient belief propagation for higher-order cliques using linear constraint\n\nnodes. Computer Vision and Image Understanding, 112(1):39\u201354, 2008.\n\nS. Ravanbakhsh and R. Greiner. Perturbed message passing for constraint satisfaction problems.\n\nJournal of Machine Learning Research, 16:1249\u20131274, 2015.\n\nS. Ravanbakhsh, C. Srinivasa, B. Frey, and R. Greiner. Min-max problems on factor graphs. In\n\nProceedings of the 31st International Conference on Machine Learning, ICML \u201914, 2014.\n\nD. Tarlow, I. Givoni, and R. Zemel. HOP-MAP: Ef\ufb01cient message passing with high order potentials.\n\nJournal of Machine Learning Research - Proceedings Track, 9:812\u2013819, 2010.\n\nM. Vinyals, K. S. Macarthur, A. Farinelli, S. D. Ramchurn, and N. R. Jennings. A message-passing\n\napproach to decentralized parallel machine scheduling. The Computer Journal, 2013.\n\n9\n\n\f", "award": [], "sourceid": 2870, "authors": [{"given_name": "Christopher", "family_name": "Srinivasa", "institution": "University of Toronto/Borealis AI"}, {"given_name": "Inmar", "family_name": "Givoni", "institution": "University of Toronto"}, {"given_name": "Siamak", "family_name": "Ravanbakhsh", "institution": "CMU/UBC"}, {"given_name": "Brendan", "family_name": "Frey", "institution": "Deep Genomics, Vector Institute, Univ. Toronto"}]}