{"title": "Solving Decision Problems with Limited Information", "book": "Advances in Neural Information Processing Systems", "page_first": 603, "page_last": 611, "abstract": "We present a new algorithm for exactly solving decision-making problems represented as an influence diagram. We do not require the usual assumptions of no forgetting and regularity, which allows us to solve problems with limited information. The algorithm, which implements a sophisticated variable elimination procedure, is empirically shown to outperform a state-of-the-art algorithm in randomly generated problems of up to 150 variables and $10^{64}$ strategies.", "full_text": "Solving Decision Problems with Limited Information\n\nDenis D. Mau\u00b4a\n\nIDSIA\n\nManno, CH 6928\n\ndenis@idsia.ch\n\nCassio P. de Campos\n\nIDSIA\n\nManno, CH 6928\n\ncassio@idsia.ch\n\nAbstract\n\nWe present a new algorithm for exactly solving decision-making problems rep-\nresented as an in\ufb02uence diagram. We do not require the usual assumptions of\nno forgetting and regularity, which allows us to solve problems with limited in-\nformation. The algorithm, which implements a sophisticated variable elimination\nprocedure, is empirically shown to outperform a state-of-the-art algorithm in ran-\ndomly generated problems of up to 150 variables and 1064 strategies.\n\n1\n\nIntroduction\n\nIn many tasks, bounded resources and physical constraints force decisions to be made based on lim-\nited information [1, 2]. For instance, a policy for a partially observable Markov decision process\n(POMDP) might be forced to disregard part of the available information in order to meet computa-\ntional demands [3]. Cooperative multi-agent settings offer another such example: each agent might\nperceive only its surroundings and be unable to communicate with all other agents; hence, a policy\nspecifying an agent\u2019s behavior must rely exclusively on local information [4]; it might be further\nconstrained to a maximum size to be computationally tractable [5].\nIn\ufb02uence diagrams [6] are representational devices for utility-based decision making under uncer-\ntainty. Many popular decision-making frameworks such as \ufb01nite-horizon POMDPs can be casted\nas in\ufb02uence diagrams [7]. Traditionally, in\ufb02uence diagrams target problems involving a single,\nnon-forgetful decision maker; this makes them un\ufb01tted to represent decision-making with limited\ninformation. Limited memory in\ufb02uence diagrams (LIMIDs) generalize in\ufb02uence diagrams to allow\nfor (explicit representation of) bounded memory policies and simultaneous decisions [1, 2]. More\nprecisely, LIMIDs relax the regularity and no forgetting assumptions of in\ufb02uence diagrams, namely,\nthat there is a complete temporal ordering over the decisions, and that observations and decisions\nare permanently remembered.\nSolving a LIMID refers to \ufb01nding a combination of policies that maximizes expected utility. This\ntask has been empirically and theoretically shown to be a very hard problem [8]. Under certain\ngraph-structural conditions (which no forgetting and regularity imply), Lauritzen and Nilsson [2]\nshow that LIMIDs can be solved by dynamic programming with complexity exponential in the\ntreewidth of the graph. However, when these conditions are not met, their iterative algorithm might\nconverge to a local optimum that is far from the optimum. Recently, de Campos and Ji [8] formulated\nthe CR (Credal Reformulation) algorithm that solves a LIMID by mapping it into a mixed integer\nprogramming problem; they show that CR is able to solve small problems exactly and obtain good\napproximations for medium-sized problems.\nIn this paper, we formally describe LIMIDs (Section 2) and show that policies can be partially\nordered, and that the ordering can be extended monotonically, allowing for the generalized variable\nelimination procedure in Section 3. We show experimentally in Section 4 that the algorithm built\non these ideas can enormously save computational resources, allowing many problems to be solved\nexactly. In fact, our algorithm is orders of magnitude faster than the CR algorithm on randomly\ngenerated diagrams containing up to 150 variables. Finally, we write our conclusions in Section 5.\n\n1\n\n\f2 Limited memory in\ufb02uence diagrams\n\nx\u2208w\u2191x f (x). Notice that if y \u2229 x = \u2205, then(cid:80)\n\nover \u2126x, and y \u2286 U, the sum-marginal(cid:80)\nw of its domain we have ((cid:80)\ny f )(w) =(cid:80)\n\nIn the LIMID formalism, the quantities and events of interest are represented by three distinct types\nof variables or nodes: chance variables (oval nodes) represent events on which the decision maker\nhas no control, such as outcomes of tests or consequences of actions; decision variables (square\nnodes) represent the alternatives a decision maker might have; value variables (diamond-shaped\nnodes) represent additive parcels of the overall utility. Let U be the set of all variables relevant to\na problem. Each variable X in U has associated a domain \u2126X, which is the \ufb01nite non-empty set\nof values or states X can assume. The empty domain \u2126\u2205 (cid:44) {\u03bb} contains a single element \u03bb that\nis not in any other domain. Decision and chance variables have domains different from the empty\ndomain, whereas value variables are always associated to the empty domain. The domain \u2126x of a\nset of variables x = {X1, . . . , Xn} \u2286 U is the Cartesian product \u2126X1 \u00d7 \u00b7\u00b7\u00b7 \u00d7 \u2126Xn of the variable\ndomains. If x and y are sets of variables such that y \u2286 x \u2286 U, and x is an element of the domain \u2126x,\nwe write x\u2193y to denote the projection of x onto the smaller domain \u2126y, that is, x\u2193y \u2208 \u2126y contains\nonly the components of x that are compatible with the variables in y. By convention, x\u2193\u2205 (cid:44) \u03bb. The\ncylindrical extension of y \u2208 \u2126y to \u2126x is the set y\u2191x (cid:44) {x \u2208 \u2126x : x\u2193y = y}. Oftentimes, if clear\nfrom the context, we write X1 \u00b7\u00b7\u00b7 Xn to denote the set {X1, . . . , Xn}, and X to denote {X}.\nWe notate point-wise comparison of functions implicitly. For example, if f and g are real-valued\nfunctions over a domain \u2126x and k is a real number, we write f \u2265 g and f = k meaning f (x) \u2265 g(x)\nand f (x) = k, respectively, for all x \u2208 \u2126x. Any function over a domain containing a single element\nis identi\ufb01ed by the real number it returns.\nIf f and g are functions over domains \u2126x and \u2126y,\nrespectively, their product f g is the function over \u2126x\u222ay such that (f g)(w) = f (w\u2193x)g(w\u2193y) for all\nw. Sum of functions is de\ufb01ned analogously: (f + g)(w) = f (w\u2193x) + g(w\u2193y). If f is a function\ny f returns a function over \u2126x\\y such that for any element\ny f = f.\nLet C, D and V denote the sets of chance, decision and value variables, respectively, in U. A LIMID\nL is an annotated direct acyclic graph (DAG) over the set of variables U, where the nodes in V have\nno children. The precise meanings of the arcs in L vary according to the type of node to which\nthey point. Arcs entering chance and value nodes denote stochastic and functional dependency,\nrespectively; arcs entering decision nodes describe information awareness or relevance at the time\nthe decision is made. If X is a node in L, we denote by paX the set of parents of X, that is, the\nset of nodes of L from which there is an arc pointing to X. Similarly, we let chX denote the set\nof children of X (i.e., nodes to which there is an arc from X), and faX (cid:44) paX \u222a {X} denote\nits family. Each chance variable C in C has an associated function ppaC\nC specifying the probability\nPr(C = x\u2193C|paC = x\u2193paC ) of C assuming value x\u2193C \u2208 \u2126C given that the parents take on values\nx\u2193paC \u2208 \u2126paC for all x \u2208 \u2126faC . We assume that the probabilities associated to any chance node\nrespect the Markov condition, that is, that any variable X \u2208 C is stochastically independent from its\nnon-descendant non-parents given its parents. Each value variable V \u2208 V is associated to a bounded\nreal-valued utility function uV over \u2126paV , which quanti\ufb01es the (additive) contribution of the states\nof its parents to the overall utility. Thus, the overall utility of a joint state x \u2208 \u2126C\u222aD is given by the\nV \u2208V uV (x\u2193paV ). For any decision variable D \u2208 D, a policy \u03b4D speci\ufb01es\n\u2192 \u2126D. If D has no\nan action for each possible state con\ufb01guration of its parents, that is, \u03b4D : \u2126paD\nparents, then \u03b4D is a function from the empty domain to \u2126D, and therefore constitutes a choice of\nx \u2208 \u2126D. The set of all policies \u03b4D for a variable D is denoted by \u2206D.\nTo illustrate the use of LIMIDs, consider the following example involving a memoryless robot in\na 5-by-5 gridworld (Figure 1a). The robot has 9 time steps to \ufb01rst reach a position sA of the grid,\nfor which it receives 10 points, and then a position sB, for which it is rewarded with 20 points. If\nthe positions are visited in the wrong order, or if a point is re-visited, no reward is given. At each\nstep, the robot can perform actions move north, south, east or west, which cost 1 point and succeed\nwith 0.9 probability, or do nothing, which incurs no cost and always succeeds. Finally, the robot can\nestimate its position in the grid by measuring the distance to each of the four walls. The estimated\nposition is correct 70% of the time, wrong by one square 20% of the time, and by two squares 10%\nof the time. The LIMID in Figure 1b formally represents the environment and the robot behavior.\nThe action taken by the robot at time step t is represented by variable Dt (t = 1, . . . , 8). The costs\nassociated to decisions are represented by variables Ct, which have associated functions uCt that\n\nsum of utility functions(cid:80)\n\n2\n\n\fsA\n\nsB\n\nR\n\n(a)\n\nC1\n\nC2\n\nC8\n\nO1\n\nD1\n\nO2\n\nD2\n\nO8\n\nD8\n\nS1\n\nA1\n\nB1\n\nS2\n\nA2\n\nB2\n\n\u00b7\u00b7\u00b7\n\nS8\n\nA8\n\nB8\n\nS9\n\nA9\n\nB9\n\nR1\n\nR2\n\nR8\n\nR9\n\n(b)\n\nFigure 1: (a) A robot R in a 5-by-5 gridworld with two goal-states. (b) The corresponding LIMID.\n\nSt\n\nreturn zero if Dt = nothing, and otherwise return -1. The variables St (t = 1, . . . , 9) represent\nthe robot\u2019s actual position at time step t, while variables Ot denote its estimated position. The\nassociated to St speci\ufb01es the probabilities Pr(St = st|St\u22121 = st\u22121, Dt = dt) of\nfunction pSt\u22121Dt\ntransitioning to state St = st from a state St\u22121 = st\u22121 when the robot executes action Dt = dt.\nThe function pSt\nis associated to Ot and quanti\ufb01es the likelihood of estimating position Ot = ot\nOt\nwhen in position St = st. We use binary variables At and Bt to denote whether positions sA and\nsB, respectively, have been visited by the robot before time step t. Hence, the function pAt\u22121St\u22121\nassociated to At equals one for At = y if St\u22121 = sa or At\u22121 = y, and zero otherwise. Likewise, the\nfunction pBt\u22121St\u22121\nequals one for Bt = y only if either St\u22121 = sB or Bt\u22121 = y. The reward received\nby the robot in step t is represented by variable Rt. The utility function uRt associated to Rt equals\n10 if st = sA and At = n and Bt = n, 20 if st = sB and At = y and Bt = n, and zero otherwise.\nLet \u2206 (cid:44) \u00d7D\u2208D \u2206D denote the space of possible combinations of policies. An element s =\n(\u03b4D)D\u2208D \u2208 \u2206 is said to be a strategy for L. Given a policy \u03b4D, let ppaD\nD denote a function such that\nfor each x \u2208 \u2126faD it equals one if x\u2193D = \u03b4D(x\u2193paD ) and zero otherwise. In other words, ppaD\nD is a\nconditional probability table representing policy \u03b4D. There is a one-to-one correspondence between\nD and policies \u03b4D \u2208 \u2206D, and specifying a policy \u03b4D is equivalent to specifying ppaD\nfunctions ppaD\nD .\nD by PD. A strategy s induces a joint probability mass function\nWe denote the set of all functions ppaD\nover the variables in C \u222a D by\n\nBt\n\nAt\n\nC\u2208C\nand has an associated expected utility given by\n\nEs[L] (cid:44) (cid:88)\n\nx\u2208\u2126C\u222aD\n\nps (cid:44) (cid:89)\n(cid:88)\n\nps(x)\n\nV \u2208V\n\n(cid:89)\n\nD\u2208D\n\nppaC\nC\n\nppaD\nD ,\n\nuV (x\u2193paV ) =\n\n(cid:88)\n\nC\u222aD\n\nps\n\n(cid:88)\n\nV \u2208V\n\nuV .\n\n(1)\n\n(2)\n\nThe treewidth of a graph measures its resemblance to a tree and is given by the number of vertices\nin the largest clique of the corresponding triangulated moral graph minus one. Given a LIMID L\nof treewidth \u03c9, we can evaluate the expected utility of any strategy s in time and space at most\nexponential in \u03c9. Hence, if \u03c9 is bounded by a constant, computing Es[L] takes polynomial time [9].\nThe primary task of a LIMID is to \ufb01nd an optimal strategy s\u2217 with maximal expected utility, that is,\nto \ufb01nd s\u2217 such that Es[L] \u2264 Es\u2217 [L] for all s \u2208 \u2206. The value Es\u2217 [L] is called the maximum expected\nutility of L and it is denoted by MEU[L]. In the LIMID of Figure 1, the goal is to \ufb01nd an optimal\nstrategy s = (\u03b4D1, . . . , \u03b4D8 ), where the optimal policies \u03b4Dt for t = 1, . . . , 8 prescribe an action in\n\u2126Dt = {north, south, west, east, nothing} for each possible estimated position in \u2126Ot.\nFor most real problems, enumerating all the strategies is prohibitively costly. In fact, computing the\nMEU is NP-hard even in bounded treewidth diagrams [8]. It is well-known that any LIMID L can\nbe mapped into an equivalent LIMID L(cid:48) where all utilities take values on the real interval [0, 1] [10].\nThe mapping preserves optimality of strategies, that is, any optimal strategy for L(cid:48) is also an optimal\n\n3\n\n\fstrategy for L (and vice-versa). This allows us, in the rest of the paper, to focus on LIMIDs whose\nutilities are de\ufb01ned in [0, 1] with no loss of generality for the algorithm we devise.\n\n3 A fast algorithm for solving LIMIDs exactly\n\nall possible valuations is given by \u03a6 (cid:44) (cid:83)\nvariables such that y \u2286 x, the marginal \u03c6\u2193y is the valuation ((cid:80)\n\nThe basic ingredients of our algorithmic framework for representing and handling information in\nLIMIDs are the so-called valuations, which encode information (probabilities, utilities and policies)\nabout the elements of a domain. Each valuation is associated to a subset of the variables in U,\ncalled its scope. More concretely, we de\ufb01ne a valuation \u03c6 with scope x as a pair (p, u) of bounded\nnonnegative real-valued functions p and u over the domain \u2126x; we refer to p and u as the probability\nand utility part, respectively, of \u03c6. Often, we write \u03c6x to make explicit the scope x of a valuation\n\u03c6. For any x \u2286 U, we denote the set of all possible valuations with scope x by \u03a6x. The set of\nx\u2286U \u03a6x. The set \u03a6 is closed under the operations of\ncombination and marginalization. Combination represents the aggregation of information and is\nde\ufb01ned as follows. If \u03c6 = (p, u) and \u03c8 = (q, v) are valuations with scopes x and y, respectively, its\ncombination \u03c6 \u2297 \u03c8 is the valuation (pq, pv + qu) with scope x \u222a y. Marginalization, on the other\nhand, acts by coarsening information. If \u03c6 = (p, u) is a valuation with scope x, and y is a set of\nx\\y u) with scope y. In this\ncase, we say that z (cid:44) x \\ y has been eliminated from \u03c6, which we denote by \u03c6\u2212z. The following\nresult shows that our framework respects the necessary conditions for computing ef\ufb01ciently with\nvaluations (in the sense of keeping the scope of valuations minimal during the variable elimination\nprocedure).\nProposition 1. The system (\u03a6,U,\u2297,\u2193) satis\ufb01es the following three axioms of a (weak) labeled\nvaluation algebra [11, 12].\n(A1) For any \u03c61, \u03c62, \u03c63 \u2208 \u03a6 we have that \u03c61\u2297 \u03c62 = \u03c62\u2297 \u03c61 and \u03c61\u2297 (\u03c62\u2297 \u03c63) = (\u03c61\u2297 \u03c62)\u2297 \u03c63.\n(A2) For any \u03c6z \u2208 \u03a6z and y \u2286 x \u2286 z we have that (\u03c6\u2193x\n(A3) For any \u03c6x \u2208 \u03a6x, \u03c6y \u2208 \u03a6y and x \u2286 z \u2286 x \u222a y we have that (\u03c6x \u2297 \u03c6y)\u2193z = \u03c6x \u2297 \u03c6\u2193y\u2229z\n\nx\\y p,(cid:80)\n\nz )\u2193y = \u03c6\u2193y\nz .\n\n.\n\ny\n\nProof. (A1) follows directly from commutativity, associativity and distributivity of product and sum\nof real-valued functions, and (A2) follows directly from commutativity of the sum-marginal opera-\ntion. To show (A3), consider any two valuations (p, u) and (q, v) with scopes x and y, respectively,\nand a set z such that x \u2286 z \u2286 x\u222a y. By de\ufb01nition of combination and marginalization, we have that\nx\u222ay\\z(pv + qu)). Since x\u222a y \\ z = y \\ z, and p and u are func-\ny\\z q),\ny\\z v) = (p, y) \u2297 (q, v)\u2193y\u2229z. Hence, [(p, u) \u2297 (q, v)]\u2193z =\n\nx\u222ay\\z pq,(cid:80)\n[(p, u)\u2297 (q, v)]\u2193z = ((cid:80)\ntions over \u2126x, it follows that ((cid:80)\nx\u222ay\\z pq,(cid:80)\nwhich equals (p, u) \u2297 ((cid:80)\ny\\z q,(cid:80)\n\nx\u222ay\\z(pv + qu)) = (p(cid:80)\n\ny\\z v + u(cid:80)\n\ny\\z q, p(cid:80)\n\n(p, y) \u2297 (q, v)\u2193y\u2229z.\n\nThe following lemma is a direct consequence of (A3) shown by [12], required to prove the correct-\nness of our algorithm later on.\nLemma 2. If z \u2286 y and z \u2229 x = \u2205 then (\u03c6x \u2297 \u03c6y)\u2212z = \u03c6x \u2297 \u03c6\u2212z\ny .\nThe framework of valuations allows us to compute the expected utility of a given strategy ef\ufb01ciently:\nProposition 3. Given a LIMID L and a strategy s = (\u03b4D)D\u2208D, let\n\n(ppaC\n\nC , 0)\n\n(ppaD\n\nD , 0)\n\n(1, uV )\n\n,\n\n(3)\n\n\u03c6s (cid:44)(cid:104)(cid:79)\n\nC\u2208C\n\n(cid:105) \u2297(cid:104)(cid:79)\n\nD\u2208D\n\n(cid:105) \u2297(cid:104)(cid:79)\n\nV \u2208V\n\n(cid:105)\n\nwhere, for each D, ppaD\n\nD is the function in PD associated with policy \u03b4D. Then \u03c6\u2193\u2205\nProof. Let p and u denote the probability and utility part, respectively, of \u03c6\u2193\u2205\ns . By de\ufb01nition of\nX\u2208C\u222aD ppaX\nX as in (1). Since\ncombination, we have that \u03c6s = (ps, ps\nx\u2208\u2126C\u222aD ps(x) = 1. Finally,\n\n(cid:80)\nV \u2208V uV ), where ps = (cid:81)\nps is a probability distribution over C \u222a D, it follows that p = (cid:80)\nu =(cid:80)C\u222aD ps\n\n(cid:80)\nV \u2208V uV , which equals Es[L] by (2).\n\ns = (1, Es[L]).\n\n4\n\n\fA\n\nB\n\nD\n\nC\n\nE\n\nInput: elimination ordering B < C < A and strategy s = (\u03b4B, \u03b4C )\nInitialization:\n\n\u03c6A = (pA, 0) \u03c6B = (pA\n\nB, 0) \u03c6C = (pA\n\nC , 0) \u03c6D = (1, uD) \u03c6E = (1, uE)\n\nPropagation:\n\n\u03c81 = (\u03c6B \u2297 \u03c6D)\n\n\u2212B\n\n\u03c82 = (\u03c6C \u2297 \u03c6E)\n\n\u2212C\n\n\u03c83 = (\u03c81 \u2297 \u03c82 \u2297 \u03c6A)\n\n\u2212A\n\nTermination: return the utility part of \u03c6\u2193\u2205\n\ns = \u03c83\n\nFigure 2: Computing the expected utility of a strategy by variable elimination.\n\nGiven any strategy s, we can use a variable elimination procedure to ef\ufb01ciently compute \u03c6\u2193\u2205\ns and\nhence its expected utility in time polynomial in the largest domain of a variable but exponential in\nthe width of the elimination ordering.1 Figure 2 shows a variable elimination procedure used to\ncompute the expected utility of a strategy of the simple LIMID on the left-hand side. However,\ncomputing the MEU in this way is unfeasible for any reasonable diagram due to the large number of\nstrategies that would need to be enumerated. For example, if the variables A, B and C in the LIMID\nin Figure 2 have each ten states, there are 10101010 = 1020 possible strategies.\nIn order to avoid considering all possible strategies, we de\ufb01ne a partial order (i.e., a re\ufb02exive, an-\ntisymmetric and transitive relation) over \u03a6 as follows. For any two valuations \u03c6 = (p, u) and\n\u03c8 = (q, v) in \u03a6, if \u03c6 and \u03c8 have equal scope, p \u2264 q and u \u2264 v, then \u03c6 \u2264 \u03c8 holds. The following\nresult shows that \u2264 is monotonic with respect to combination and marginalization.\nProposition 4. The system (\u03a6,U,\u2297,\u2193,\u2264) satis\ufb01es the following two additional axioms of an or-\ndered valuation algebra [13].\n(A4) If \u03c6x \u2264 \u03c8x and \u03c6y \u2264 \u03c8y, then (\u03c6x \u2297 \u03c6y) \u2264 (\u03c8x \u2297 \u03c8y).\n(A5) If \u03c6x \u2264 \u03c8x then \u03c6\u2193y\nProof. (A4). Consider two valuations (px, ux) and (qx, vx) with scope x such that (px, ux) \u2264\n(qx, vx), and two valuations (py, uy) and (qy, vy) with scope y satisfying (py, uy) \u2264 (qy, vy). By\nde\ufb01nition of \u2264, we have that px \u2264 qx, ux \u2264 vx, py \u2264 qy and uy \u2264 vy. Since all functions are\nnonnegative, it follows that pxpy \u2264 qxqy, pxuy \u2264 qxvy and pyux \u2264 qyvx. Hence, (px, ux) \u2297\n(py, uy) = (pxpy, pxuy + pyux) \u2264 (qxqy, qxvy + qyvx) = (qx, vx) \u2297 (qy, vy). (A5). Let y be\nIt follows from monotonicity of \u2264 with respect to addition of real numbers that\na subset of x.\n\nx \u2264 \u03c8\u2193y\nx .\n\nx\\y px,(cid:80)\n\nx\\y ux) \u2264 ((cid:80)\n\nx\\y qx,(cid:80)\n\n(px, ux)\u2193y = ((cid:80)\n\nx\\y vx) = (qx, vx)\u2193y.\n\nIf \u03c8s\n\n1, \u03c8s\n\n2, \u03c8s\n\n1 and \u03c8s\n\n1 \u2264 \u03c8s(cid:48)\n\n2 \u2264 \u03c8s(cid:48)\n\n3 the valuations for s(cid:48).\n\nThe monotonicity of \u2264 allows us to detect suboptimal strategies during variable elimination. To\nillustrate this, consider the variable elimination scheme in Figure 2 for two different strategies s\nand s(cid:48), and let \u03c8s\n3 be the valuations produced in the propagation step for strategy s and\n2 , \u03c8s(cid:48)\n1 , \u03c8s(cid:48)\n\u03c8s(cid:48)\n2 then Proposition 4 tells us that\n3 , which implies Es[L] \u2264 Es(cid:48)[L]. As a consequence, we can abort variable elimination for\n3 \u2264 \u03c8s(cid:48)\n\u03c8s\ns after the second iteration. We can also exploit the redundancy between valuations produced during\nvariable elimination for neighbor strategies. For example, if s and s(cid:48) specify the same policy for B,\nthen we know in advance that \u03c8s\nIn order to facilitate the description of our algorithm, we de\ufb01ne operations over sets of valuations.\nIf \u03a8x is a set of valuations with scope x and \u03a8y is a set of valuations with scope y the operation\n\u03a8x \u2297 \u03a8y (cid:44) {\u03c6x \u2297 \u03c6y : \u03c6x \u2208 \u03a8x, \u03c6y \u2208 \u03a8y} returns the set of combinations of a valuation in \u03a8x\nand a valuation in \u03a8y. For X \u2208 x, the operation \u03a8\u2212X\n: \u03c6x \u2208 \u03a8x} eliminates variable X\nfrom all valuations in \u03a8x. Given a \ufb01nite set of valuations \u03a8 \u2286 \u03a6, we say that a valuation \u03c6 \u2208 \u03a8 is\nmaximal if for all \u03c8 \u2208 \u03a8 such that \u03c6 \u2264 \u03c8 it holds that \u03c8 \u2264 \u03c6. The operator prune returns the set\nprune(\u03a8) of maximal valuations of \u03a8 (by pruning non-maximal valuations).\n\n2 , so that only one of them needs to be computed.\n\n(cid:44) {\u03c6\u2212X\n\n1 = \u03c8s(cid:48)\n\nx\n\nx\n\n1The width of an elimination ordering is the maximum cardinality of the scope of a valuation produced\n\nduring variable elimination minus one.\n\n5\n\n\fWe are now ready to describe the Multiple Policy Updating (MPU) algorithm, which solves ar-\nbitrary LIMIDs exactly. Consider a LIMID L and an elimination ordering X1 < \u00b7\u00b7\u00b7 < Xn over\nthe variables in C \u222a D. The elimination ordering can be selected using the standard methods for\nBayesian networks [9]. Note that unlike standard algorithms for variable elimination in in\ufb02uence\ndiagrams we allow any elimination ordering. The algorithm is initialized by generating one set of\nvaluations for each variable X in U as follows.\nInitialization: Let V0 be initially the empty set.\n\n1. For each chance variable X \u2208 C, add a singleton \u03a8X (cid:44) {(ppaX\n2. For each decision variable X \u2208 D, add a set of valuations \u03a8X (cid:44) {(ppaX\n3. For each value variable X \u2208 V, add a singleton \u03a8X (cid:44) {(1, uX )} to V0.\n\nX , 0)} to V0;\n\nto V0;\n\nX , 0) : ppaX\n\nX \u2208 PX}\n\nOnce V0 has been initialized with a set of valuations for each variable in the diagram, we recursively\neliminate a variable Xi in C \u222a D in the given ordering and remove any non-maximal valuation:\nPropagation: For i = 1, . . . , n do:\n\n1. Let Bi be the set of all valuations in Vi\u22121 whose scope contains Xi;\n\n2. Compute \u03a8i (cid:44) prune([(cid:78)\n\n\u03a8]\u2212Xi );\n\n\u03a8\u2208Bi\n\n\u03a8\u2208Vn\n\n\u03a8\u2208Vn\n\n\u03a8:\n\n3. Set Vi (cid:44) Vi\u22121 \u222a {\u03a8i} \\ Bi.\n\nFinally, the algorithm outputs the utility part of the single maximal valuation in the set(cid:78)\nTermination: Return the real number u such that (p, u) \u2208 prune((cid:78)\nu is a real number because the valuations in(cid:78)\n\n\u03a8\u2208Vn\n\n\u03a8).\n\n\u03a8 have empty scope and thus both their proba-\nbility and utility parts are identi\ufb01ed with real numbers. The following result is a straightforward ex-\ntension of [14, Lemma 1(iv)] that is needed to guarantee the correctness of discarding non-maximal\nvaluations in the propagation step.\nLemma 5. (Distributivity of maximality). If \u03a8x and \u03a8y are two sets of ordered valuations and z \u2286 x\nthen (i) prune(\u03a8x\u2297prune(\u03a8y)) = prune(\u03a8x\u2297\u03a8y) and (ii) prune(prune(\u03a8x)\u2193z) = prune(\u03a8\u2193z\nx ).\nThe result shows that, like marginalization, the prune operation distributes over any factorization of\nX\u2208U \u03a8X. The following lemma shows that at any iteration i of the propagation step the combi-\nnation of all sets in the current pool of sets Vi produces the set of maximal valuations of the initial\nfactorization.\n\n(cid:78)\nLemma 6. For i \u2208 {1, . . . , n}, it follows that prune([(cid:78)\nmas 2 and 5 and the axioms of valuation algebra to prune([(cid:78)\nobtain prune((cid:78)\nprune([(cid:78)\nto prune([prune([(cid:78)\nprune([prune((cid:78)\nprune([(cid:78)\n\u03a8) \u2297 [((cid:78)\nLemma 5(i) equals prune(((cid:78)\nof Vi+1 equals prune((cid:78)\n\nProof. We show the result by induction on i. The basis is easily obtained by applying Lem-\n\u03a8]\u2212X1) in order to\nis,\n\u03a8). By eliminating Xi+1 from both sides and\n\u03a8]\u2212X1\u00b7\u00b7\u00b7Xi)]\u2212Xi+1) =\nthat\n\u03a8]\u2212Xi+1). It follows from (A1) and Lemma 2\n\u03a8)]\u2212Xi+1 ), which by\n\u03a8\u2208Bi+1\n\u03a8)]\u2212Xi+1)), which by de\ufb01nition\n\n\u03a8]\u2212X1\u00b7\u00b7\u00b7Xi) = prune((cid:78)\n\u03a8]\u2212{X1\u00b7\u00b7\u00b7Xi+1}) = prune([(cid:78)\nthe right-hand part equals prune(((cid:78)\n\n\u03a8]\u2212X1\u00b7\u00b7\u00b7Xi) = prune((cid:78)\n\n\u03a8\u2208Vi\n\u03a8\u2208Vi\\Bi+1\n\n\u03a8) \u2297 prune([((cid:78)\n\nFor the induction step, assume the result holds at i,\n\nthen applying the prune operation we get\n\n\u03a8)]\u2212Xi+1).\n\nBy Lemma\n\n(A2), we\n\n\u03a8\u2208V1\n\n\u03a8).\n\n\u03a8\u2208Vi\n\n\u03a8).\n\n5(ii)\n\nand\n\n\u03a8\u2208Vi\\Bi+1\n\n\u03a8\u2208Vi+1\n\n\u03a8).\n\n\u03a8\u2208V0\n\n\u03a8\u2208V0\n\n\u03a8\u2208V0\n\n\u03a8\u2208V0\n\nthat\n\nhave\n\n\u03a8\u2208Bi+1\n\n\u03a8\u2208Vi\n\nthat\n\n\u03a8\u2208V0\n\n\u03a8\u2208Vi\n\nLet \u03a8L (cid:44) {\u03c6s : s \u2208 \u2206}, where \u03c6s is given by (3). According to Proposition 3, each ele-\nment \u03c6\u2212X1\u00b7\u00b7\u00b7Xn\nis a valuation whose probability part is one and utility part equals\nin \u03a8\nEs[L]. Thus, the maximal expected utility MEU[L] is the utility part of the single valuation in\n). It is not dif\ufb01cult to see that after the initialization step, the set V0 contains sets\n\u2212X1\u00b7\u00b7\u00b7Xn\nprune(\u03a8\nL\n\n\u2212X1\u00b7\u00b7\u00b7Xn\nL\n\ns\n\n6\n\n\f\u03a8\u2208V0\n\n\u03a8 = \u03a8L. Hence, Lemma 6 states that after the last iteration,\n) =\n\n\u2212X1\u00b7\u00b7\u00b7Xn\n\u03a8) = prune(\u03a8\nL\n\n\u03a8\u2208Vn\n\nMEU[L]. This is precisely what the following theorem shows.\nTheorem 7. Given a LIMID L, MPU outputs MEU[L].\n\n\u03a8 of valuations such that(cid:78)\nMPU produces a set Vn of sets of valuations such that prune((cid:78)\nProof. The algorithm returns the utility part of a valuation (p, u) in prune((cid:78)\nby Lemma 6 for i = n, equals prune([(cid:78)\nin ((cid:78)\n\u03a8) factorizes as in (3). Also, there is exactly one valuation \u03c6 \u2208 ((cid:78)\neach strategy in \u2206. Hence, by Proposition 3, the set ((cid:78)\n((cid:78)\nSince (p, u) \u2208 prune([(cid:78)\n\n\u03a8), which,\n\u03a8]\u2193\u2205). By de\ufb01nition of V0, any valuation \u03c6\n\u03a8) for\n\u03a8)\u2193\u2205 contains a pair (1, Es[L])\nfor every strategy s inducing a distinct expected utility. Moreover, since functions with empty\nscope correspond to numbers, the relation \u2264 speci\ufb01es a total ordering over the valuations in\n\u03a8)\u2193\u2205, which implies a single maximal element. Let s\u2217 be a strategy associated to (p, u).\n\u03a8]\u2193\u2205), it follows from maximality that Es\u2217 [L] \u2265 Es[L] for all s, and\n\n\u03a8\u2208Vn\n\n\u03a8\u2208V0\n\n\u03a8\u2208V0\n\n\u03a8\u2208V0\n\nhence u = MEU[L].\n\n\u03a8\u2208V0\n\n\u03a8\u2208V0\n\n\u03a8\u2208V0\n\nThe time complexity of the algorithm is given by the cost of creating the sets of valuations in the\ninitialization step plus the overall cost of the combination and marginalization operations performed\nduring the propagation step. Regarding the initialization step, the loops for chance and value vari-\nables generate singletons, and thus take time linear in the input. For any decision variable D,\nlet \u03c1D (cid:44) |\u2126D||\u2126paD| denote the number of policies in \u2206D (which coincides with the number of\nfunctions in PD). There is exactly one valuation in \u03a8D in V0 for every policy in \u2206D. Also, let\n\u03c1 (cid:44) pruneD\u2208D \u03c1D be the cardinality of the largest policy set. Then the initialization loop for de-\ncision variables takes O(|D|\u03c1) time, which is exponential in the input (the sets of policies are not\nconsidered as an input of the problem). Let us analyze the propagation step. As with any variable\nelimination procedure, the running time of propagating (sets of) valuations is exponential in the\nwidth of the given ordering, which is in the best case given by the treewidth of the diagram. Con-\nsider the case of an ordering with bounded width \u03c9 and a bounded number of states per variable \u03ba.\nThen the cost of each combination or marginalization is bounded by a constant, and the complexity\ndepends only on the number of operations performed. Let \u03bd denote the cardinality of the largest\nset \u03a8i, for i = 1, . . . , n. Computing \u03a8i requires at most \u03bd|U|\u22121 operations of combination and \u03bd\noperations of marginalization. In the worst case, \u03bd is equal to \u03c1|D| \u2264 O(\u03ba|D|\u03ba\u03c9\n), that is, all sets\nassociated to decision variables have been combined without discarding any valuation. Hence, the\nworst-case complexity of the propagation step is exponential in the input, even if the ordering width\nand the number of states per variable are bounded. This is not surprising given that the problem\nis still NP-hard in these cases. However, this is a very pessimistic scenario and, on average, the\nremoval of non-maximal elements greatly reduces the complexity, as we show in the next section.\n\n4 Experiments\n\nWe evaluate the performance of the algorithms on random LIMIDs generated in the following way.\nEach LIMID is parameterized by the number of decision nodes d, the number of chance nodes c, the\nmaximum cardinality of the domain of a chance variable \u03c9C, and the maximum cardinality of the\ndomain of a decision variable \u03c9D. We set the number of value nodes v to be d + 2. For each variable\nXi, i = 1, . . . , c + d + v, we sample \u2126Xi to contain from 2 to 4 states. Then we repeatedly add\nan arc from a decision node with no children to a value node with no parents (so that each decision\nnode has at least one value node as children). This step guarantees that all decisions are relevant for\nthe computation of the MEU. Finally, we repeatedly add an arc that neither makes the domain of a\nvariable greater than the given bounds nor makes the treewidth more than 10, until no arcs can be\nadded without exceeding the bounds.2 Note that this generates diagrams where decision and chance\nvariables have at most log2 \u03c9D \u2212 1 and log2 \u03c9C \u2212 1 parents, respectively. Once the graph structure\nis obtained, we specify the functions associated to value variable by randomly sampling numbers in\n[0, 1]. The probability mass functions associated to chance variables are randomly sampled from a\nuniform prior distribution.\n\n2Checking the treewidth of a graph might be hard. We instead use a greedy heuristic that resulted in dia-\n\ngrams whose treewidth ranged from 5 to 10.\n\n7\n\n\f)\ns\n(\n\ne\nm\n\ni\nt\n\ng\nn\ni\nn\nn\nu\nR\n\n104\n\n102\n\n100\n\n10\u22122\n\n101\n\nMPU\nCR\n\n1060\n\nNumber of strategies (|\u2206|)\n\n1020\n\n1040\n\nFigure 3: Running time of MPU and CR on randomly generated LIMIDs.\n\nWe compare MPU against the CR algorithm of [8] in 2530 LIMIDs randomly generated by the\ndescribed procedure with parameters 5 \u2264 d \u2264 50, 8 \u2264 c \u2264 50, 8 \u2264 \u03c9D \u2264 64 and 16 \u2264 \u03c9C \u2264 64.\nMPU was implemented in C++ and tested in the same computer as CR.3 A good succinct indicator\nof the hardness of solving a LIMID is the total number of strategies |\u2206|, which represents the size\nof the search space in a brute-force approach. |\u2206| can also be loosely interpreted as the total number\nof alternatives (over all decision variables) in the problem instance. Figure 3 depicts running time\nagainst number of strategies in a log-log scale for the two algorithms on the same test set of random\ndiagrams. For each algorithm, only solved instances are shown, which covers approximately 96%\nof the cases for MPU, and 68% for CR. A diagram is consider unsolved by an algorithm, if the\nalgorithm was not able to reach the exact solution within the limit of 12 hours. Since CR uses an\ninteger program solver, it can output a feasible solution within any given time limit; we consider a\ndiagram solved by CR only if the solution returned at the end of 12 hours is exact, that is, only if its\nupper and lower bound values match. We note that MPU solved all cases that CR solved (but not\nthe opposite). From the plot, one can see that MPU is orders of magnitude faster than CR. Within\nthe limit of 12 hours, MPU was able to compute diagrams containing up to 1064 strategies, whereas\nCR solved diagrams with at most 1025 strategies. We remark that when CR was not able to solve a\ndiagram, it almost always returned a solution that was not within 5% of the optimum. This implies\nthat MPU would outperform CR even if the latter was allowed a small imprecision in its output.\n\n5 Conclusion\n\nLIMIDs are highly expressive models for utility-based decision making that subsume in\ufb02uence dia-\ngrams and \ufb01nite-horizon (partially observable) Markov decision processes. Furthermore, they allow\nconstraints on policies to be explicitly represented in a concise and intuitive graphical language.\nUnfortunately, solving LIMIDs is a very hard task of combinatorial optimization. Nevertheless, we\nshowed here that our MPU algorithm can solve a large number of randomly generated problems in\nreasonable time. The algorithm ef\ufb01ciency is based on the early removal of suboptimal solutions,\nwhich drastically reduces the search space. An interesting extension is to improve MPU\u2019s running\ntime at the expense of accuracy. This can be done by arbitrarily discarding valuations during the\npropagation step so as to bound the size of propagated sets. Future work is necessary to validate the\nfeasibility of this idea.\n\nAcknowledgments\n\nThis work was partially supported by the Swiss NSF grant nr. 200020 134759 / 1, and by the Com-\nputational Life Sciences Project, Canton Ticino.\n\n3We used the CR implementation available at http://www.idsia.ch/\u02dccassio/id2mip/ and\nCPLEX [15] as mixed integer programming solver. Our MPU implementation can be downloaded at http:\n//www.idsia.ch/\u02dccassio/mpu/\n\n8\n\n\fReferences\n\n[1] N. L. Zhang, R. Qi, and D. Poole. A computational theory of decision networks. International\n\nJournal of Approximate Reasoning, 11(2):83\u2013158, 1994.\n\n[2] S. L. Lauritzen and D. Nilsson. Representing and solving decision problems with limited\n\ninformation. Management Science, 47:1235\u20131251, 2001.\n\n[3] P. Poupart and C. Boutilier. Bounded \ufb01nite state controllers. In Advances in Neural Information\n\nProcessing Systems 16 (NIPS), 2003.\n\n[4] A. Detwarasiti and R. D. Shachter. In\ufb02uence diagrams for team decision analysis. Decision\n\nAnalysis, 2(4):207\u2013228, 2005.\n\n[5] C. Amato, D. S. Bernstein, and S. Zilberstein. Optimizing \ufb01xed-size stochastic controllers\nfor POMDPs and decentralized POMDPs. Autonomous Agents and Multi-Agent Systems,\n21(3):293\u2013320, 2010.\n\n[6] R. A. Howard and J. E. Matheson. In\ufb02uence diagrams. In Readings on the Principles and\n\nApplications of Decision Analysis, pages 721\u2013762. Strategic Decisions Group, 1984.\n\n[7] J. A. Tatman and R. D. Shachter. Dynamic programming and in\ufb02uence diagrams.\n\nTransactions on Systems, Man and Cybernetics, 20(2):365\u2013379, 1990.\n\nIEEE\n\n[8] C. P. de Campos and Q. Ji. Strategy selection in in\ufb02uence diagrams using imprecise proba-\nbilities. In Proceedings of the 24th Conference in Uncertainty in Arti\ufb01cial Intelligence, pages\n121\u2013128, 2008.\n\n[9] D. Koller and N. Friedman. Probabilistic Graphical Models - Principles and Techniques. MIT\n\nPress, 2009.\n\n[10] G. F. Cooper. A method for using belief networks as in\ufb02uence diagrams. Fourth Workshop on\n\nUncertainty in Arti\ufb01cial Intelligence, 1988.\n\n[11] P. Shenoy and G. Shafer. Axioms for probability and belief-function propagation.\n\nIn Pro-\nceedings of the Fourth Conference on Uncertainty in Arti\ufb01cial Intelligence, pages 169\u2013198.\nElsevier Science, 1988.\n\n[12] J. Kohlas. Information Algebras: Generic Structures for Inference. Springer-Verlag, 2003.\n[13] R. Haenni. Ordered valuation algebras: a generic framework for approximating inference.\n\nInternational Journal of Approximate Reasoning, 37(1):1\u201341, 2004.\n\n[14] H. Fargier, E. Rollon, and N. Wilson. Enabling local computation for partially ordered prefer-\n\nences. Constraints, 15:516\u2013539, 2010.\n\n[15] Ilog Optimization. CPLEX documentation. http://www.ilog.com, 1990.\n\n9\n\n\f", "award": [], "sourceid": 422, "authors": [{"given_name": "Denis", "family_name": "Maua", "institution": null}, {"given_name": "Cassio", "family_name": "Campos", "institution": null}]}