{"title": "Distributed Inference in Dynamical Systems", "book": "Advances in Neural Information Processing Systems", "page_first": 433, "page_last": 440, "abstract": null, "full_text": "Distributed Inference in Dynamical Systems\n\nStanislav Funiak Carlos Guestrin Carnegie Mellon University\n\nMark Paskin Google\n\nRahul Sukthankar Intel Research\n\nAbstract\nWe present a robust distributed algorithm for approximate probabilistic inference in dynamical systems, such as sensor networks and teams of mobile robots. Using assumed density filtering, the network nodes maintain a tractable representation of the belief state in a distributed fashion. At each time step, the nodes coordinate to condition this distribution on the observations made throughout the network, and to advance this estimate to the next time step. In addition, we identify a significant challenge for probabilistic inference in dynamical systems: message losses or network partitions can cause nodes to have inconsistent beliefs about the current state of the system. We address this problem by developing distributed algorithms that guarantee that nodes will reach an informative consistent distribution when communication is re-established. We present a suite of experimental results on real-world sensor data for two real sensor network deployments: one with 25 cameras and another with 54 temperature sensors.\n\n1\n\nIntroduction\n\nLarge-scale networks of sensing devices have become increasingly pervasive, with applications ranging from sensor networks and mobile robot teams to emergency response systems. Often, nodes in these networks need to perform probabilistic dynamic inference to combine a sequence of local, noisy observations into a global, joint estimate of the system state. For example, robots in a team may combine local laser range scans, collected over time, to obtain a global map of the environment; nodes in a camera network may combine a set of image sequences to recognize moving objects in a heavily cluttered scene. A simple approach to probabilistic dynamic inference is to collect the data to a central location, where the processing is performed. Yet, collecting all the observations is often impractical in large networks, especially if the nodes have a limited supply of energy and communicate over a wireless network. Instead, the nodes need to collaborate, to solve the inference task in a distributed manner. Such distributed inference techniques are also necessary in online control applications, where nodes of the network need estimates of the state in order to make decisions. Probabilistic dynamic inference can often be efficiently solved when all the processing is performed centrally. For example, in linear systems with Gaussian noise, the inference tasks can be solved in a closed form with a Kalman Filter [3]; for large systems, assumed density filtering can often be used to approximate the filtered estimate with a tractable distribution (c.f., [2]). Unfortunately, distributed dynamic inference is substantially more challenging. Since the observations are distributed across the network, nodes must coordinate to incorporate each others' observations and propagate their estimates from one time step to the next. Online operation requires the algorithm to degrade gracefully when nodes run out of processing time before the observations propagate throughout the network. Furthermore, the algorithm needs to robustly address node failures and interference that may partition the communication network into several disconnected components. We present an efficient distributed algorithm for dynamic inference that works on a large family of processes modeled by dynamic Bayesian networks. In our algorithm, each node maintains a (possibly approximate) marginal distribution over a subset of state variables, conditioned on the measurements made by the nodes in the network. At each time step, the nodes condition on the observations, using a modification of the robust (static) distributed inference algorithm [7], and then advance their estimates to the next time step locally. The algorithm guarantees that, with sufficient communication at each time step, the nodes obtain the same solution as the corresponding centralized algorithm [2]. Before convergence, the algorithm introduces principled approximations in the form of independence assertions in the node estimates and in the transition model.\n\n\f\nIn the presence of unreliable communication or high latency, the nodes may not be able to condition their estimates on all the observations in the network, e.g., when interference causes a network partition, or when high latency prevents messages from reaching every node. Once the estimates are advanced to the next time step, it is difficult to condition on the observations made in the past [10]. Hence, the beliefs at the nodes may be conditioned on different evidence and no longer form a consistent global probability distribution over the state space. We show that such inconsistencies can lead to poor results when nodes attempt to combine their estimates. Nevertheless, it is often possible to use the inconsistent estimates to form an informative globally consistent distribution; we refer to this task as alignment. We propose an online algorithm, optimized conditional alignment (O C A), that obtains the global distribution as a product of conditionals from local estimates and optimizes over different orderings to select a global distribution of minimal entropy. We also propose an alternative, more global optimization approach that minimizes a KL divergence-based criterion and provides accurate solutions even when the communication network is highly fragmented. We present experimental results on real-world sensor data, covering sensor calibration [7] and distributed camera localization [5]. These results demonstrate the convergence properties of the algorithm, its robustness to message loss and network partitions, and the effectiveness of our method at recovering from inconsistencies. Distributed dynamic inference has received some attention in the literature. For example, particle filtering (PF) techniques have been applied to these settings: Zhao et al. [11] use (mostly) independent PFs to track moving objects, and Rosencrantz et al. [10] run PFs in parallel, sharing measurements as appropriate. Pfeffer and Tai [9] use loopy belief propagation to approximate the estimation step in a continuous-time Bayesian network. When compared to these techniques, our approach addresses several additional challenges: we do not assume point-to-point communication between nodes, we provide robustness guarantees to node failures and network partitions, and we identify and address the belief inconsistency problem that arises in distributed systems.\n\n2\n\nThe distributed dynamic inference problem\n\nFollowing [7], we assume a network model where each node can perform local computations and communicate with other nodes over some channel. The nodes of the network may change over time: existing nodes can fail, and new nodes may be introduced. We assume a message-level error model: messages are either received without error, or they are not received at all. The likelihood of successful transmissions (link qualities) are unknown and can change over time, and link qualities of several node pairs may be correlated. We model the system as a dynamic Bayesian network (D B N). A D B N consists of a set of state processes, X = {X1 , . . . , XL } and a set of observed measurement processes Z = {Z1 , . . . , ZK }; each measurement process Zk corresponds to one of the sensors on one of the nodes. State processes are not associated with unique nodes. A D B N defines a joint probability model over steps 1 . . . T as tT tT p (X(1:T ) , Z(1:T ) ) = p (X(1) ) p (X(t) | X(t-1) ) p (Z(t) | X(t) ) . i t m =2 =1 nitial prior ransition model easurement model h (1) (1) The initial prior is given by a factorized probability model p (X ) (Ah ), where each L (t) (t) Ah X is a subset of the state processes. The transition model factors as i=1 p (Xi | Pa[Xi ]), (t) (t) where Pa[Xi ] are the parents of Xi in the previous time step. The measurement model factors K (t) (t) (t) (t) as k=1 p (Zk | Pa[Zk ]), where Pa[Zk ] X(t) are the parents of Zk in the current time step. In the distributed dynamic inference problem, each node n is associated with a set of processes Qn X; these are the processes about which node n wishes to reason. The nodes need to collab(t) orate so that each node can obtain (an approximation to) the posterior distribution over Qn given (t) (1:t) all measurements made in the network up to the current time step t: p (Qi | z ). We assume that node clocks are synchronized, so that transitions to the next time step are simultaneous.\n\n3\n\nFiltering in dynamical systems\n\nThe goal of (centralized) filtering is to compute the posterior distribution p (X(t) | z(1:t) ) for t = 1, 2, . . . as the observations z(1) , z(2) , . . . arrive. The basic approach is to recursively compute p (X(t+1) | z(1:t) ) from p (X(t) | z(1:t-1) ) in three steps: 1. Estimation: p (X(t) | z(1:t) ) p (X(t) | z(1:t-1) ) p (z(t) | X(t) ); :t 2. Prediction: p (X(t) , X(t+1) | z(1p ) ) = p (X(t) | z(1:t) ) p (X(t+1) | X(t) ); (t+1) (1:t) (x(t) , X(t+1) | z(1:t) ) dx(t) . 3. Roll-up: p (X |z )=\n\n\f\nExact filtering in D B Ns is usually expensive or intractable because the belief state rapidly loses all conditional independence structure. An effective approach, proposed by Boyen and Koller [2], hereby denoted \"B & K 9 8\", is to periodically project the exact posterior to a distribution that satisfies independence assertions encoded in a junction tree [3]. Given a junction tree T , with cliques {Ci } S , and separators the projection operation amounts to computing the clique marginals, hence i,j the filtered distribution is approximated as i ( ~ (Cit) | z(1:t-1) ) p p (X(t) | z(1:t-1) ) ~ (X(t) | z(1:t-1) ) = { NT p , (1) (t) ~ (Si,j | z(1:t-1) ) p i,j }ET where NT and ET are the nodes and edges of T , respectively. With this representation, the es(t) (t) timation step is implemented by multiplying each observation likelihood p (zk | Pa[Zk ]) to a clique marginal; the clique and separator potentials are then recomputed with message passing, so that the posterior diistribution is once again wri{ten as a ratio of clique and separator marginals: t . (t) (t) (t) (1:t) (1:t) ~ (X | z ~ (Ci | z ~ (Si,j | z(1:t) ) p )= p ) p The prediction step is NT i,j }ET performed independently for each clique Ci : we multiply ~ (X(t) | z(1:t) ) with the transition p (t+1) (t+1) (t+1) (t+1) model p (X | Pa[X ]) for each variable X Ci and, using variable elimination, (t+1) compute the marginals over the clique at the next time step p (Ci | z(1:t) ).\n(t+1)\n\n4\n\nApproximate distributed filtering\n\nIn principle, the centralized filtering approach described in the previous section could be applied to a distributed system, e.g., by communicating the observations made in the network to a central location that performs all computations, and distributing the answer to every node in the network. While conceptually simple, this approach has substantial drawbacks, including the high communication bandwidth, the introduction of a single point of failure to the system, and the fact that nodes do not have valid estimates when the network is partitioned. In this section, we present a distributed filtering algorithm where each node obtains an approximation to the posterior distribution over subset of the state variables. Our estimation step builds on the robust distributed inference algorithm of Paskin et al. [7, 8], while the prediction, roll-up, and projection steps are performed locally at each node. 4.1 Estimation as a robust distributed probabilistic inference In the distributed inference approach of Paskin et al. [8], the nodes collaborate so that each node n can obtain the posterior distribution over some set of variables Qn given all measurements made throughout the network. In our setting, Qn contains the variables in a subset Ln of the cliques used in our assumed density representation. In their architecture, nodes form a distributed data structure along a routing tree in the network, where each node in this tree is associated with a cluster of variables Dn that includes Qn , as well as any other variables, needed to preserve the flow of information between the nodes, a property equivalent to the running intersection property in junction trees [3]. We refer to this tree as the network junction tree, and, for clarity, we refer to the junction tree used for the assumed density as the external junction tree. Using this architecture, Paskin and Guestrin developed a robust distributed probabilistic inference algorithm, R D P I [7], for static inference settings, where nodes compute the posterior distribution p (Qn | z) over Qn given all measurements throughout the network z. R D P I provides two crucial properties: convergence, if there are no network partitions, these distributed estimates converge to the true posteriors; and, smooth degradation even before convergence, the estimates provide a principled approximation to the true posterior (which introduces additional independence assertions). In R D P I, each node n maintains the current belief n of p (Qn | z). Initially, node n knows only the marginals of the prior distribution {p (Ci ) : i Ln } for a subset of cliques Ln in the external junction tree, and its local observation model p (zn | Pa[Zn ]) for each of its sensors. We assume that Pa[Zn ] Ci for some i Ln ; thus, n is represented as a collection of priors over cliques of variables, and of observation likelihood functions over these variables. Messages are then sent between neighboring nodes, in an analogous fashion to the sum-product algorithm for junction trees [3]. However, messages in R D P I are always represented as a collection of priors {i (Ci )} over cliques of variables Ci , and of measurement likelihood functions {i (Ci )} over these cliques. This decomposition into prior and likelihood factors is the key to the robustness properties of the algorithm [7]. With sufficient communication, n converges to p (Qn | z). (t) (t) In our setting, at each time step t, each prior i (Ci ) is initialized to p (Ci | z(1:t-1) ). The (t) (t) (t) likelihood functions are similarly initialized to i (Ci ) = p (zi | Ci ), if some sensor makes an\n\n\f\nobservation about these variables, or to 1 otherwise. Through message passing n converges to ~ (Q(t) | z(1:t) ). An important property of R D P I that will be useful in the remainder of the paper is: pn Property 1. Let n be the result computed by the R D P I algorithm at convergence at node n. Then the cliques in n form a subtree of an external junction tree that covers Qn . 4.2 Prediction, roll-up and projection The previous section shows that the estimation step can be implemented in a distributed manner, (t) using R D P I. At convergence, each node n obtains the calibrated marginals ~ (Ci | z(1:t) ), for i p Ln . In order to advance to the next time step, each node must perform prediction and roll-up, (t+1) obtaining the marginals ~ (Ci p | z(1:t) ). Recall from Section 3 that, in order to compute a marginal (t+1) (1:t) ~ (Ci |z ), this node needs ~ (X(t) | z(1:t) ). Due to the conditional independencies encoded in p p ~ (X(t) | z(1:t) ), it is sufficient to obtain a subtree of the external junction tree that covers the parents p (t+1) (t+1) Pa[Ci ] of all variables in the clique. The next time step marginal ~ (Ci p | z(1:t) ) can then (t+1) be computed by multiplying this subtree with the transition model p (X | Pa[X (t+1) ]) for each (t+1) (t+1) X (t+1) Ci and eliminating all variables but Ci (recall that Pa[X (t+1) ] X(t) ). This procedure suggests the following distributed implementation of prediction, roll-up, and projection: after completing the estimation step, each node selects a subtree of the (global) exter(t+1) nal junction tree that covers Pa[Ci ] and collects the marginals of this tree from other nodes in the network. Unfortunately, it is unclear how to allocate the running time between estimation and collection of marginals in time-critical applications, when the estimation step may not run to completion. Instead, we propose a simple approach that performs both steps at once: run the distributed inference algorithm, described in the previous section, to obtain the posterior distribution over the parents of each clique maintained at the node. This task can be accomplished by ensuring that these (t+1) parent variables are included in the query variables of node n: Pa[Ci ] Qn , i Ln . When the estimation step cannot be run to convergence within the allotted time, the variables Scope[n ] covered by the distribution n that node n obtains may not cover the entire parent set (t+1) Pa[Ci ]. In this case, multiplying in the standard transition model is equivalent to assuming an uniform prior for the missing variables, which can lead to very poor solutions in practice. When the transition model is learned from data, p (X (t+1) | Pa[X (t+1) ]) is usually computed from the empirical distribution ^ (X (t+1) , Pa[X (t+1) ]), e.g., pM LE (X (t+1) | Pa[X (t+1) ]) = p ^ (X (t+1) , Pa[X (t+1) ])/^ (Pa[X (t+1) ]). Building on these empirical distributions, we can obtain p p an improved solution for the prediction and roll-up steps, when we do not have a distribution (t+1) over the entire parent set Pa[Ci ]. Specifically, we obtain a valid approximate transition model (t+1) (t) (t) ~ (X p | W ), where W = Scope[n ] Pa[X (t+1) ], online by simply marginalizing the empirical distribution ^ (X (t+1) , Pa[X (t+1) ]) down to ^ (X (t+1) , W(t) ). This procedure is equivalent p p to introducing an additional independence assertion to the model: at time step t + 1, X (t+1) is independent of Pa[X (t+1) ] - W(t) , given W(t) . 4.3 Summary of the algorithm Our distributed approximate filtering algorithm can be summarized as follows: Using the architecture in i[8], construt a ni twork junction tr. e s.t. the query variables Qn c e e (t) (t+1) at each node n cover Ci Pa[Ci ] Ln Ln For t = 1, 2, . . ., at each node n, run R D P I [7] until the end of step t, obtaining a (possibly approximate) belief n ; (t+1) for each X (t+1) Ci , i Ln , compute an approximate transition model (t) (t) (t+1) ~ (X p | WX ), where WX = Scope[n ] Pa[X (t+1) ]; (t+1) (t+1) for each clique Ci , i Ln , compute the clique marginal ~ (Ci p | z(1:t) ) from (t) n and from each ~ (X (t+1) | WX ), locally, using variable elimination. p Using the convergence properties of the R D P I algorithm, we prove that, given sufficient communication, our distributed algorithm obtains the same solution as the centralized B & K 9 8 algorithm: Theorem 1. For a set of nodes running our distributed filtering algorithm, if at each time step there is sufficient communication for the R D P I algorithm to converge, and the network is not partitioned, (t) then, for each node n, for each clique i Ln , the distribution ~ (Ci | z(1:t-1) ) obtained by node n p is equal to the distribution obtained by the B & K 9 8 algorithm with assumed density given by T .\n\n\f\n1\n\n2\n\n3\n\n4\n\n1\n\n2\n\n3\n\n4\n\n1\n\n2\n\n3\n\n4\n\n1\n\n2\n\n3\n\n4\n\n(a) BK solution\n\n(b) alignment rooted at 1 (c) alignment rooted at 4\n\n(d) min. KL divergence\n\nFigure 1: Alignment results after partition (shown by vertical line). circles represent 95% confidence intervals\nin the estimate of the camera location. (a) The exact solution, computed by the BK algorithm in the absence of partitions. (b) Solution obtained when aligning from node 1. (c) Solution obtained when aligning from node 4. (d) Solution obtained by joint optimized alignment.\n\n5\n\nRobust distributed filtering\n\nIn the previous section, we introduced an algorithm for distributed filtering with dynamic Bayesian networks that, with sufficient communication, converges to the centralized B & K 9 8 algorithm. In some settings, for example when interference causes a network partition, messages may not be propagated long enough to guarantee convergence before nodes must roll-up to the next time step. Consider the example, illustrated in Figure 1, in which a network of cameras localizes itself by observing a moving object. Each camera i carries a clique marginal over the location of the object M (t) , its own camera pose variable C i , and the pose of one of its neighboring cameras: 1 (C 1,2 , M (t) ), 2 (C 2,3 , M (t) ), and 3 (C 3,4 , M (t) ). Suppose communication were interrupted due to a network partition: observations would not propagate, and the marginals carried by the nodes would no longer form a consistent distribution, in the sense that 1 ,2 ,3 might not agree on their marginals, e.g., 1 (C 2 , M (t) ) = 2 (C 2 , M (t) ). The goal of alignment is to obtain a consistent distribution ~ (X(t) | z(1:t-1) ) from marginals 1 , 2 , 3 that is close to the true posterior p (X(t) | z(1:t-1) ) (as p measured, for example, by the root-mean-square error of the estimates). For simplicity of notation, we omit time indices t and conditioning on the past evidence z(1:t-1) throughout this section. 5.1 Optimized conditional alignment One way to define a consistent distribution ~ is to start from a root node r, e.g., 1, and allow each p clique marginal to decide the conditional density of Ci given its parent, e.g., ~1 (C 1:4 , M ) = 1 (C 1,2 , M ) 2 (C 3 | C 2 , M ) 3 (C 4 | C 3 , M ). p This density ~1 forms a coherent distribution over C 1:4 , M , and we say that ~1 is rooted at node 1. p p Thus, 1 fully defines the marginal density over C 1,2 , M , 2 defines the conditional density of C 3 given C 2 , M , and so on. If node 3 were the root, then node 1 would only contribute 1 (C 1 | C 2 , M ), and we would obtain a different approximate distribution. In general, given a collection of marginals i (Ci ) over the cliques of a junction tree T , and a root node r NT , the distribution obtained by conditional alignment from r can be written as i ~r (X) = r (Cr ) p i (Ci - Sup(i),i | Sup(i),i ), (2)\n(NT -{r })\n\nwhere up(i) denotes the upstream neighbor of i on the (unique) path between r and i. The choice of the root r often crucially determines how well the aligned distribution ~r approxp imates the true prior. Suppose that, in the example in Figure 1, the nodes on the left side of the partition do not observe the person while the communication is interrupted, and the prior marginals 1 , 2 are uncertain about M . If we were to align the distribution from 2 , multiplying 3 (C 4 | C 3 , M ) into the marginal 2 (C 2,3 , M ) would result in a distribution that is uncertain in both M and C 4 (Figure 1(b)), while a better choice of root could provide a much better estimate (Figure 1(c)). One possible metric to optimize when choosing the root r for the alignment is the entropy of the resulting distribution ~r . For example, the entropy of ~2 in the previous example can be written as p p H~2 (C 1:4 , M ) = H2 (C 2,3 , M ) + H3 (C 4 | C 3 , M ) + H1 (C 1 | C 2 , M ), p (3) where we use the fact that, for Gaussians, the conditional entropy of C 4 given C 3 ,M only depends on the conditional distribution ~2 (C 4 | C 3 , M ) = 3 (C 4 | C 3 , M ). A nave algorithm for obtaining p i the best root would exploit this decomposition to compute the entropy of each ~2 , and pick the root p that leads to a lowest total entropy; the running time of this algorithm is O(|NT |2 ). We propose a dynamic programming approach that significantly reduces the running time. Comparing Equation 3\n\n\f\nwith the entropy of the distribution rooted at a neighboring node 3, we see that they share a common term H1 (C 1 | C 2 , M ), and H~3 (C 1:4 , M ) - H~2 (C 1:4 , M ) = H3 (S2,3 ) - H2 (S2,3 ) 2,3 . If p p 2,3 is positive, node 2 is a better root than 3, 2,3 is negative, we have the reverse situation. Thus, when comparing neighboring nodes as root candidates, the difference in entropy of the resulting distribution is simply the difference in entropy their local distributions assign to their separator. This property generalizes to the following dynamic programming algorithm that determines the root r with minimal H~r (X) in O(|NT |) time: p For any node i NT , define the message from i to its neighbor j as i if mki < 0, k = j ,j mij = , i,j + maxk=j mki otherwise where i,j = Hj (Si,j ) - Hi (Si,j ), and k varies over the neighbors of i in T . If maxk mki < 0 then i is the optimal root; otherwise, up(i) = argmaxk mki . Intuitively, the message mij represents the loss (entropy) with root node j , compared to the best root on i's side of the tree. Ties between nodes, if any, can be resolved using node IDs. 5.2 Distributed optimized conditional alignment In the absence of an additional procedure, R D P I can be viewed as performing conditional alignment. However, the alignment is applied to the local belief at each node, rather than the global distribution, and the nodes may not agree on the choice of the root r. Thus, the network is not guaranteed to reach a globally consistent, aligned distribution. In this section, we show that R D P I can be extended to incorporate the optimized conditional alignment (O C A) algorithm from the previous section. By Property 1, at convergence, the priors at each node form a subtree of an external junction tree for the assumed density. Conceptually, if we were to apply O C A to this subtree, the node would have an aligned distribution, but nodes may not be consistent with each other. Intuitively, this happens because the optimization messages mij were not propagated between different nodes. In R D P I, node n's belief n includes a collection of (potentially inconsistent) priors {i (Ci )}. In the standard sum-product inference algorithm, an inference message mn from node m to node k n is computed by marginalizing out some variables from the factor + n m m =n km that combines the messages received from node m's other neighbors with node m's local belief. The inference message in R D P I involves a similar marginalization, which corresponds to pruning some cliques from + n [7]. When such pruning occurs, any likelihood information i (Ci ) associated m with the pruned clique i is transferred to its neighbor j . Our distributed O C A algorithm piggy-backs on this pruning, computing an optimization message mij , which is stored in clique j . (To compute this message, cliques must also carry their original, unaligned priors.) At convergence, the nodes will not only have a subtree of an external tree, but also the incoming optimization messages that result from pruning of all other cliques of the external tree. In order to determine the globally optimal root, each node (locally) selects a root for its subtree. If this root is one of the initial cliques associated with n, then n, and in particular this clique, is the root of the conditional alignment. The alignment is propagated throughout the network. If the optimal root is determined to be a clique that came from a message received from a neighbor, then the neighbor (or another node upstream) is the root, and node n aligns itself with respect to the neighbor's message. With an additional tie-breaking rule that ensures that all the nodes make consistent choices about their subtrees [4], this procedure is equivalent to running the O C A algorithm centrally: Theorem 2. Given sufficient communication and in the absence of network partitions, nodes running distributed O C A reach a globally consistent belief based on conditional alignment, selecting the root clique that leads to the joint distribution of minimal entropy. In the presence of partitions, each partition will reach a consistent belief that minimizes the entropy within this partition. 5.3 Jointly optimized alignment While conceptually simple, there are situations where such a rooted alignment will not provide a good aligned distribution. For example, if in the example in Figure 1, cameras 2 and 3 carry marginals 2 (C 2,3 , M ) and 2 (C 2,3 , M ), respectively, and both observe the person, node 2 will have a better estimate of C 2 , while node 3's estimate of C 3 will be more accurate. If either node is chosen as the root, the aligned distribution will have a worse estimate of the pose of one of the cameras, because performing rooted alignment from either direction effectively overwrites the marginal of the other node. In this example, rather than fixing a root, we want an aligned distribution that attempts to simultaneously optimize the distance to both 2 (C 2,3 , M ) and 2 (C 2,3 , M ).\n\n\f\n0.4\nCamera 7\n\n0.45 0.4 RMS error\nCamera 10\n\n35 nodes 54 nodes\n\n0.3 RMS error\n\n0.35 0.3 0.25 0.2 0\n\n0.2\nCamera 3\n\n0.1\n\n0 0\n\n50\n\n100\n\n150 200 time step\n\n250\n\n300\n\n20 40 epochs per time step\n\n60\n\n(a) 25-camera testbed (b) Convergence, cameras (c) Convergence, temperature Figure 2: (a) Testbed of 25 cameras used for the SLAT experiments. (b) Convergence results for individual cameras in one experiment. Horizontal lines indicate the cooresponding centralized solution at the end of the experiment. (c) Convergence versus amount of communication for a temperature network of 54 real sensors.\n\nWe propose the following optimization problem that minimizes the sum of reverse KL divergence from the aligned distribution to the clique marginals i (Ci ): i ~ (X) = argmin D(q (Ci ) i (Ci )), p\nq (X),q |=T NT\n\n~ where q |= T denotes the constraint that p factorizes according to the junction tree T . This method will often provide very good aligned distributions (e.g., Figure (d)). For Gaussian distributions, this optimization problem corresponds to i i minC ,C - log |Ci | + -1 , Ci + (i - Ci )T -1 (i - Ci ), i i\ni i\n\nNT\n\nNT\n\nsubject to\n\nCi\n\n0,\n\ni NT ,\n\n(4)\n\nwhere Ci , Ci are the means and covariances of q over the variables Ci , and i , i are the means and covariances of the marginals i . The problem in Equation 4 consists of two independent convex optimization problems over the means and covariances of q , respectively. The former problem can be solved in a distributed manner using distributed linear regression [6], while the latter can be solved using a distributed version of an iterative methods, such as conjugate gradient descent [1].\n\n6\n\nExperimental results\n\nWe evaluated our approach on two applications: a camera localization problem [5] (SLAT), in which a set of cameras simultaneously localizes itself by tracking a moving object, and temperature monitoring application, analogous to the one presented in [7]. Figure 2(a) shows some of the 25 ceiling-mounted cameras used to collect the data in our camera experiments. We implemented our distributed algorithm in a network simulator that incorporates message loss and used data from these real sensors as our observations. Figure 2(b) shows the estimates obtained by three cameras in one of our experiments. Note that each camera converges to the estimate obtained by the centralized B & K 9 8 algorithm. In Figure 2(c), we evaluate the sensitivity of the algorithm to incomplete communication. We see that, with a modest number of rounds of communication performed in each time step, the algorithm obtains a high quality of the solution and converges to the centralized solution. In the second set of experiments, we evaluate the alignment methods, presented in Section 5. In Figure 3(a), the network is split into four components; in each component, the nodes communicate fully, and we evaluate the solution if the communication were to be restored after a given number of time steps. The vertical axis shows the RMS error of estimated camera locations at the end of the experiment. For the unaligned solution, the nodes may not agree on the estimated pose of a camera, so it is not clear which node's estimate should be used in the RMS computation; the plot shows an \"omniscient envelope\" of the RMS error, where, given the (unknown) true camera locations, we select the best and worst estimates available in the network for each camera's pose. The results show that, in the absence of optimized alignment, inconsistencies can degrade the solution: observations collected after the communication is restored may not make up for the errors introduced by the partition. The third experiment evaluates the performance of the distributed algorithm in highlydisconnected scenarios. Here, the sensor network is hierarchically partitioned into smaller disconnected components by selecting a random cut through the largest component. The communication is restored shortly before the end of the experiment. Figures 3(b) shows the importance of aligning from the correct node: the difference between the optimized root and an arbitrarily chosen root is significant, particularly when the network becomes more and more fractured. In our experiments, large errors often resulted from the nodes having uncertain beliefs, hence justifying the objective function. We see that the jointly optimized alignment described in Section 5.3, min. KL, tends to provide the best aligned distribution, though often close to the optimized root, which is simpler\n\n\f\nupper bound 0.3 RMS error 0.2 0.1 0 0 lower bound fixed root optimized root unaligned\n\n1 0.8 RMS error 0.6 0.4 0.2 lower bound fixed root optimized root upper bound min. KL unaligned\n\n0.5 0.4 RMS error 0.3\n\nupper bound\n\n0.2 lower bound 0.1\n\n50 100 Duration of the partition\n\n0 0\n\n2\n\n4 6 8 Number of partitions\n\n10\n\n0 0\n\nfixed root optimized root min. KL unaligned 10\n\n5 Number of partitions\n\n(a) camera localization\n\n(b) camera localization\n\n(c) temperature monitoring\n\nFigure 3: Comparison of the alignment methods. (a) RMS error vs. duration of the partition. For the unaligned\nsolution, the plot shows bounds on the error: given the (unknown) camera locations, we select the best and worst estimates available in the network for each camera's pose. In the absence of optimized alignment, inconsistencies can degrade the quality of the solution. (b, c) RMS error vs. number of partitions. In camera localization (b), the difference between the optimized alignment and the alignment from an arbitrarily chosen fixed root is significant. For the temperature monitoring (c), the differences are less pronounced, but follow the same trend.\n\nto compute. Finally, 3(c) shows the alignment results on the temperature monitoring application. Compared to SLAT, the effects of network partitions on the results for the temperature data are less severe. One contributing factor is that every node in a partition is making local temperature observations, and the approximate transition model for temperatures in each partition is quite accurate, hence all the nodes continue to adjust their estimates meaningfully while the partition is in progress.\n\n7\n\nConclusions\n\nThis paper presents a new distributed approach to approximate dynamic filtering based on a distributed representation of the assumed density in the network. Distributed filtering is performed by first conditioning on evidence using a robust distributed inference algorithm [7], and then advancing to the next time step locally. With sufficient communication in each time step, our distributed algorithm converges to the centralized B & K 9 8 solution. In addition, we identify a significant challenge for probabilistic inference in dynamical systems: nodes can have inconsistent beliefs about the current state of the system, and an ineffective handling of this situation can lead to very poor estimates of the global state. We address this problem by developing a distributed algorithm that obtains an informative consistent distribution, optimizing over various choices of the root node, and an alternative joint optimization approach that minimizes a KL divergence-based criterion. We demonstrate the effectiveness of our approach on a suite of experimental results on real-world sensor data. Acknowledgments This research was supported by grants NSF-NeTS CNS-0625518 and CNS-0428738 NSF ITR. S. Funiak was supported by the Intel Research Scholar Program; C. Guestrin was partially supported by an Alfred P. Sloan Fellowship.\n\nReferences\n[1] D. P. Bertsekas and J. N. Tsitsiklis. Parallel and Distributed Computation: Numerical Methods. Athena Scientific; 1st edition (January 1997), 1997. [2] X. Boyen and D. Koller. Tractable inference for complex stochastic processes. In Proc. of UAI, 1998. [3] R. Cowell, P. Dawid, S. Lauritzen, and D. Spiegelhalter. Probabilistic Networks and Expert Systems. Springer, New York, NY, 1999. [4] S. Funiak, C. Guestrin, M. Paskin, and R. Sukthankar. Robust probabilistic filtering in distributed systems. Technical Report CMU-CALD-05-111, Carnegie Mellon University, 2005. [5] S. Funiak, C. Guestrin, M. Paskin, and R. Sukthankar. Distributed localization of networked cameras. In Proc. of Fifth International Conference on Information Processing in Sensor Networks (IPSN-06), 2006. [6] C. Guestrin, R. Thibaux, P. Bodik, M. A. Paskin, and S. Madden. Distributed regression: an efficient framework for modeling sensor network data. In Proc. of IPSN, 2004. [7] M. A. Paskin and C. E. Guestrin. Robust probabilistic inference in distributed systems. In UAI, 2004. [8] M. A. Paskin, C. E. Guestrin, and J. McFadden. A robust architecture for inference in sensor networks. In Proc. of IPSN, 2005. [9] A. Pfeffer and T. Tai. Asynchronous dynamic Bayesian networks. In Proc. UAI 2005, 2005. [10] M. Rosencrantz, G. Gordon, and S. Thrun. Decentralized sensor fusion with distributed particle filters. In Proc. of UAI, 2003. [11] F. Zhao, J. Liu, J. Liu, L. Guibas, and J. Reich. Collaborative signal and information processing: An information directed approach. Proceedings of the IEEE, 91(8):11991209, 2003.\n\n\f\n", "award": [], "sourceid": 2996, "authors": [{"given_name": "Stanislav", "family_name": "Funiak", "institution": null}, {"given_name": "Carlos", "family_name": "Guestrin", "institution": null}, {"given_name": "Rahul", "family_name": "Sukthankar", "institution": null}, {"given_name": "Mark", "family_name": "Paskin", "institution": null}]}