{"title": "Distributed Probabilistic Learning for Camera Networks with Missing Data", "book": "Advances in Neural Information Processing Systems", "page_first": 2924, "page_last": 2932, "abstract": "Probabilistic approaches to computer vision typically assume a centralized setting, with the algorithm granted access to all observed data points. However, many problems in wide-area surveillance can benefit from distributed modeling, either because of physical or computational constraints. Most distributed models to date use algebraic approaches (such as distributed SVD) and as a result cannot explicitly deal with missing data. In this work we present an approach to estimation and learning of generative probabilistic models in a distributed context where certain sensor data can be missing. In particular, we show how traditional centralized models, such as probabilistic PCA and missing-data PPCA, can be learned when the data is distributed across a network of sensors. We demonstrate the utility of this approach on the problem of distributed affine structure from motion. Our experiments suggest that the accuracy of the learned probabilistic structure and motion models rivals that of traditional centralized factorization methods while being able to handle challenging situations such as missing or noisy observations.", "full_text": "Distributed Probabilistic Learning\n\nfor Camera Networks with Missing Data\n\nSejong Yoon\n\nDepartment of Computer Science\n\nRutgers University\n\nVladimir Pavlovic\n\nDepartment of Computer Science\n\nRutgers University\n\nsjyoon@cs.rutgers.edu\n\nvladimir@cs.rutgers.edu\n\nAbstract\n\nProbabilistic approaches to computer vision typically assume a centralized setting,\nwith the algorithm granted access to all observed data points. However, many\nproblems in wide-area surveillance can bene\ufb01t from distributed modeling, either\nbecause of physical or computational constraints. Most distributed models to date\nuse algebraic approaches (such as distributed SVD) and as a result cannot explic-\nitly deal with missing data. In this work we present an approach to estimation and\nlearning of generative probabilistic models in a distributed context where certain\nsensor data can be missing. In particular, we show how traditional centralized\nmodels, such as probabilistic PCA and missing-data PPCA, can be learned when\nthe data is distributed across a network of sensors. We demonstrate the utility\nof this approach on the problem of distributed af\ufb01ne structure from motion. Our\nexperiments suggest that the accuracy of the learned probabilistic structure and\nmotion models rivals that of traditional centralized factorization methods while\nbeing able to handle challenging situations such as missing or noisy observations.\n\n1\n\nIntroduction\n\nTraditional computer vision algorithms, particularly those that exploit various probabilistic and\nlearning-based approaches, are often formulated in centralized settings. A scene or an object is\nobserved by a single camera with all acquired information centrally processed and stored in a single\nknowledge base (e.g., a classi\ufb01cation model). Even if the problem setting relies on multiple cameras,\nas may be the case in multi-view or structure from motion (SfM) tasks, all collected information is\nstill processed and organized in a centralized fashion. Increasingly modern computational settings\nare becoming characterized by networks of peer-to-peer connected devices, with local data process-\ning abilities. Nevertheless, the overall goal of such distributed device (camera) networks may still\nbe to exchange information and form a consensus interpretation of the visual scene. For instance,\neven if a camera observes a limited set of object views, one would like its local computational model\nto re\ufb02ect a general 3D appearance of the object visible by other cameras in the network.\nA number of distributed algorithms have been proposed to address the problems such as calibration,\npose estimation, tracking, object and activity recognition in large camera networks [1\u20133]. In order\nto deal with high dimensionality of vision problems, distributed latent space search such as decen-\ntralized variants of PCA have been studied in [4, 5]. A more general framework using distributed\nleast squares [6] based on distributed averaging of sensor fusions [7] was introduced for PCA, tri-\nangulation, pose estimation and SfM. Similar approaches have been extended to settings such as the\ndistributed object tracking and activity interpretation [8,9]. Even though the methods such as PCA or\nKalman \ufb01ltering have their well-known probabilistic counterparts, the aforementioned approaches\ndo not use probabilistic formulation when dealing with the distributed setting.\nOne critical challenge in distributed data analysis includes dealing with missing data. In camera\nnetworks, different nodes will only have access to a partial set of data features because of varying\ncamera views or object movement. For instance, object points used for SfM may be visible only\n\n1\n\n\fin some cameras and only in particular object poses. As a consequence, different nodes will be\nfrequently exposed to missing data. However, most current distributed data analysis methods are\nalgebraic in nature and cannot seamlessly handle such missing data.\nIn this work we propose a distributed consensus learning approach for parametric probabilistic mod-\nels with latent variables that can effectively deal with missing data. We assume that each node in\na network can observe only a fraction of the data (e.g., object views in camera networks). Further-\nmore, we assume that some of the data features may be missing across different nodes. The goal of\nthe network of sensors is to learn a single consensus probabilistic model (e.g., 3D object structure)\nwithout ever resorting to a centralized data pooling and centralized computation. We will demon-\nstrate that this task can be accomplished in a principled manner by local probabilistic models and\nin-network information sharing, implemented as recursive distributed probabilistic learning.\nIn particular, we focus on probabilistic PCA (PPCA) as a prototypical example and derive its dis-\ntributed version, the D-PPCA. We then suggest how missing data can be handled in this setting\nusing a missing-data PPCA and apply this model to solve the distributed SfM task in a camera net-\nwork. Our model is inspired by the consensus-based distributed Expectation-Maximization (EM)\nalgorithm for Gaussian mixtures [10], which we extend to deal with generalized linear Gaussian\nmodels [11]. Unlike other recently proposed decomposable Gaussian graphical models [4, 12], our\nmodel does not depend on any speci\ufb01c type of graphs. Our network, of arbitrary topology, is as-\nsumed to be static with a single connected component. These assumptions are reasonably applicable\nto many real world camera network settings.\nIn Section 2, we \ufb01rst explain the general distributed probabilistic model. Section 3 shows how D-\nPPCA can be formulated as a special case of the probabilistic framework and propose the means for\nhandling missing data. We then explain how D-PPCA can be modi\ufb01ed for the application in af\ufb01ne\nSfM. In Section 5, we report experimental results of our model using both synthetic and real data.\nFinally, we discuss our approach including its limitations and possible solutions in Section 6.\n\n2 Distributed Probabilistic Model\n\nWe start our discussion by \ufb01rst considering a general parametric probabilistic model in a centralized\nsetting and then we show how to derive its distributed form.\n\n2.1 Centralized Setting\nLet X = {xn|xn \u2208 RD} be a set of iid multivariate data points with the corresponding latent\nvariables Z = {zn|zn \u2208 RM}, n = 1...N. Our model is a joint density de\ufb01ned on (xn, zn) with a\nglobal parameter \u03b8\n\n(xn, zn) \u223c p(xn, zn|\u03b8),\n\nwith p(X, Z|\u03b8) = (cid:81)\n\nn p(xn, zn|\u03b8), as depicted in Fig. 1a. In this general model, we can \ufb01nd an\noptimal global parameter \u02c6\u03b8 (in a MAP sense) by applying standard EM learning. The EM follows a\nrecursive two-step procedure: (a) E-step, where the posterior density p(zn|xn, \u03b8) is estimated, and\n(b) M-step: parametric optimization \u02c6\u03b8 = arg max\u03b8 EZ|X [log p(X, Z|\u03b8)]. It is important to point out\nthat each posterior density estimate at point n depends solely on the corresponding measurement xn\nand does not depend on any other xk, k (cid:54)= n. This means that even if we partition independent mea-\nsurements into arbitrary subsets, posterior density estimation is accomplished locally, within each\nsubset. However, in the M-step all measurements X affect the choice of \u02c6\u03b8 because of the depen-\ndence of each term in the completed log likelihood on the same \u02c6\u03b8. This is a typical characteristic of\nparametric models where the optimal parameters depend on summary data statistics.\n\n2.2 Distributed Setting\nLet G = (V, E) be an undirected connected graph with vertices i, j \u2208 V and edges eij = (i, j) \u2208 E\nconnecting the two vertices. Each i-th node is directly connected with 1-hop neighbors in Bi =\n{j|eij \u2208 E}. Suppose the set of data samples at i-th node is Xi = {xin|n = 1, ..., Ni}, where\nxin \u2208 RD is n-th measurement vector and Ni is the number of samples collected in i-th node.\nLikewise, we de\ufb01ne the latent variable set for node i as Zi = {zin|n = 1, ..., Ni}.\n\n2\n\n\f(a) Centralized\n\n(b) Distributed\n\n(c) Augmented\n\nFigure 1: Centralized, distributed and augmented models for probabilistic PCA.\n\nAs observed previously, each posterior estimation is decentralized. Learning the model parameter\nwould be decentralized if each node had its own independent parameter \u03b8i. Still, the centralized\nmodel can be equivalently de\ufb01ned using the set of local parameters, with an additional constraint on\ntheir consensus, \u03b81 = \u03b82 = \u00b7\u00b7\u00b7 = \u03b8|V |. This is illustrated in Fig. 1b where the local node models\nare constrained using ties de\ufb01ned on the underlying graph. The simple consensus tying can be more\nconveniently de\ufb01ned using a set of auxiliary variables \u03c1ij, one for each edge eij (Fig. 1c). This now\nleads to the \ufb01nal distributed consensus learning formulation, similar to [10]:\n\n\u02c6\u03b8 = arg min\n\n{\u03b8i:i\u2208V }\n\n\u2212 log p(X|\u03b8, G)\n\ns.t.\n\n\u03b8i = \u03c1ij, \u03c1ij = \u03b8j, i \u2208 V, j \u2208 Bi\n\n(1)\n\nwhere we marginalized on X. This is a constrained optimization task that can be solved in a prin-\ncipal manner using the Alternating Direction Method of Multipliers (ADMM) [13\u201315]. ADMM\n(cid:111)\niteratively, in a block-coordinate fashion, solves max\u03bb min\u03b8 L(\u00b7) on the augmented Lagrangian\n\nL(\u03b8, \u03c1, \u03bb) = \u2212 log p(X|\u03b81, \u03b82, ..., \u03b8|V |, G) +\n\nij1(\u03b8i \u2212 \u03c1ij) + \u03bbT\n\u03bbT\n\nij2(\u03c1ij \u2212 \u03b8j)\n\n(cid:88)\n\n(cid:110)\n(cid:88)\n(cid:8)||\u03b8i \u2212 \u03c1ij||2 + ||\u03c1ij \u2212 \u03b8j||2(cid:9)\n\nj\u2208Bi\n\ni\u2208V\n\n(cid:88)\n\n(cid:88)\n\n\u03b7\n2\n\n+\n\ni\u2208V\n\nj\u2208Bi\n\n(2)\nwhere \u03bbij1, \u03bbij2, i, j \u2208 V are the Lagrange multipliers, \u03b7 is some positive scalar parameter and\n|| \u00b7 || is induced norm. The last term (modulated by \u03b7) is not strictly necessary for consensus but\nintroduces additional regularization. Further discussions on this term and the parameter can be found\nin [15] and [16]. The auxiliary \u03c1ij play a critical decoupling role and separate estimation of local\n\u03b8i during block-coordinate ascent/descent. This classic (\ufb01rst introduced in 1970s) meta decompose\nalgorithm can be used to devise a distributed counterpart for any centralized problem that attempts\nto maximize a global log likehood function over a connected network.\n\n3 Distributed Probabilistic PCA (D-PPCA)\n\nWe now apply the general distributed probabilistic learning explained above to the speci\ufb01c case of\ndistributed PPCA. Traditional centralized formulation of probabilistic PCA (PPCA) [17] assumes\nthat latent variable zin \u223c N (zin|0, I), with a generative relation\nxin = Wizin + \u00b5i + \u0001i,\n\n(3)\n\nwhere \u0001i \u223c N (\u0001i|0, a\u22121\n\ni I) and ai is the noise precision. Inference then yields\np(zin|xin) = N (zin|L\u22121\n(4)\n),\nwhere Li = WT\ni I. We can \ufb01nd optimal parameters Wi, \u00b5i, ai by \ufb01nding the maximum\nlikelihood estimates of the marginal data likelihood or by applying the EM algorithm on expected\ncomplete data log likelihood with respect to the posterior density p(Zi|Xi).\n\ni (xin \u2212 \u00b5i), a\u22121\n\ni Wi + a\u22121\n\ni L\u22121\n\ni WT\n\ni\n\n3.1 Distributed Formulation\n\nThe distributed algorithm developed in Section 2 can be directly applied to this PPCA model. The\nbasic idea is to assign each subset of samples as evidence for the local generative models with\n\n3\n\n\fn=1\n\nwhere F (\u0398i) =\n\nNi(cid:80)\n(cid:16)\n(cid:88)\nL(\u03a6i) = \u2212F (\u0398i)+(cid:88)\n(cid:88)\n(cid:88)\n(cid:88)\n(cid:88)\n\nj\u2208Bi\n\nj\u2208Bi\n\ni\u2208V\n\n+\n\ni\u2208V\n\u03b7\n2\n\n+\n\ni\u2208V\n\nj\u2208Bi\n\ni\n\nparameters Wi, \u00b5i, a\u22121\n. The inference is accomplished locally in each node. The local parameter\nestimates are then computed using the consensus updates that combine local summary data statistics\nwith the information about the model conveyed through neighboring network nodes. Below, we\noutline speci\ufb01c details of this approach.\nLet \u0398i = {Wi, \u00b5i, ai} be the set of parameters for each node i. The global constrained consensus\noptimization now becomes\n\nmin{Wi,\u00b5i,ai:i\u2208V } \u2212F (\u0398i)\n\ns.t.\n\nWi = \u03c1ij, \u03c1ij = Wj,\n\u00b5i = \u03c6ij, \u03c6ij = \u00b5j,\nai = \u03c8ij, \u03c8ij = aj,\n\ni \u2208 V, j \u2208 Bi,\ni \u2208 V, j \u2208 Bi,\ni \u2208 V, j \u2208 Bi\n\n(5)\n\nlog p(xin|Wi, \u00b5i, a\u22121\n\ni\n\n). The augmented Lagrangian is\n\nij1(Wi \u2212 \u03c1ij) + \u03bbT\n\u03bbT\n\nij2(\u03c1ij \u2212 Wj)\n\n(\u03b2ij1(ai \u2212 \u03c8ij) + \u03b2ij2(\u03c8ij \u2212 aj)) +\n\n(cid:16)\n\nij1(\u00b5i \u2212 \u03c6ij) + \u03b3T\n\u03b3T\n\nij2(\u03c6ij \u2212 \u00b5j)\n\n(cid:17)\n\n(||Wi \u2212 \u03c1ij||2 + ||\u03c1ij \u2212 Wj||2)\n\n(cid:17)\n\n(cid:88)\n(cid:88)\n(cid:88)\n(cid:88)\n(cid:88)\n(cid:88)\n\nj\u2208Bi\n\ni\u2208V\n\ni\u2208V\n\nj\u2208Bi\n\n+\n\n\u03b7\n2\n\ni\u2208V\n\nj\u2208Bi\n\n(||\u00b5i \u2212 \u03c6ij||2 + ||\u03c6ij \u2212 \u00b5j||2) +\n\n\u03b7\n2\n\n((ai \u2212 \u03c8ij)2 + (\u03c8ij \u2212 aj)2)\n\n(6)\n\nwhere \u03a6i = {Wi, \u00b5i, ai, \u03c1ij, \u03c6ij, \u03c8ij; i \u2208 V, j \u2208 Bi} and {\u03bbijk},{\u03b3ijk},{\u03b2ijk} with k = 1, 2\nare the Lagrange multipliers. The scalar value \u03b7 gives us control over the convergence speed of the\nalgorithm. With reasonably large positive \u03b7, the overall optimization converges fairly quickly [10].\nWe will explore the converging behaviour with respect to various \u03b7 in synthetic data experiments.\nJust like in a standard EM approach, we minimize the upper bound of L(\u03a6i). Exploiting the pos-\nterior density in (4), we compute the expected mean and variance of latent variables in each node\nas\n\ni WT\n\ni (xin \u2212 \u00b5i),\n\nE[zinzT\n\nMaximization of the completed likelihood Lagrangian derived from (6) yields\n\nin] = a\u22121\ni L\u22121\ni + E[zin]E[zin]T.\n(cid:32)\n(cid:19)(cid:27)\n(cid:19)(cid:27)\n\nNi(cid:88)\n\nE[zinzT\n\ni + W(t)\n\nn=1\n\nai\n\n\u00b7\n\nj\n\nW(t)\n\n(cid:18)\n\n\u00b5(t)\n\ni + \u00b5(t)\nj )\n\n\u00b7 (Niai + 2\u03b7|Bi|)\n\n\u22121 ,\n\n(7)\n\n(cid:33)\u22121\n\n,\n\nin] + 2\u03b7|Bi|I\n\n(cid:18)\n(cid:88)\n(cid:88)\n\nj\u2208Bi\n\nj\u2208Bi\n\n(8)\n\n(9)\n\n(10)\n\n(11)\n\n(12)\n\nW(t+1)\n\ni\n\n=\n\n\u00b5(t+1)\n\ni\n\n=\n\nE[zin] = L\u22121\n(cid:26)\n(cid:26)\n\nNi(cid:88)\nNi(cid:88)\n\nn=1\n\nai\n\nai\n\n(cid:18)\n(cid:88)\n(cid:88)\n(cid:88)\n\nj\u2208Bi\n\nj\u2208Bi\n\nj\u2208Bi\n\n\u03b7\n2\n\n\u03b7\n2\n\n(cid:110)\n(cid:110)\n(cid:110)\n\n\u03bb(t+1)\n\ni\n\n= \u03bb(t)\n\ni +\n\nn=1\n\u03b7\n2\n\n\u03b3(t+1)\ni\n\n= \u03b3(t)\n\ni +\n\n\u03b2(t+1)\ni\n\n= \u03b2(t)\n\ni +\n\n(xin \u2212 \u00b5i)E[zin]T \u2212 2\u03bb(t)\n\ni + \u03b7\n\nxin \u2212 WiE[zin]\n\n\u2212 2\u03b3(t)\n\ni + \u03b7\n\n(cid:111)\n\nW(t+1)\n\ni\n\n\u2212 W(t+1)\n\n,\n\n\u00b5(t+1)\n\ni\n\n\u2212 \u00b5(t+1)\n\n,\n\na(t+1)\ni\n\n\u2212 a(t+1)\n\n(cid:19)\n\nj\n\nj\n\nj\n\n.\n\n(cid:111)\n(cid:111)\n(cid:40)\n(cid:104)E[zinzT\n\n\u00b7\n\nFor ai, we solve the quadratic equation\n\n0 = \u2212 NiD\n2\n\nNi(cid:88)\n\nn=1\n\n+\n\n1\n2\n\ni\n\n2\n\n+ 2\u03b7|Bi|a(t+1)\n\n(cid:110)||xin \u2212 \u00b5i||2 + tr\n\n+ a(t+1)\n\ni\n\n2\u03b2(t)\n\ni \u2212 \u03b7\n\nin]WT\n\ni Wi\n\n(cid:16)\n(cid:88)\n(cid:105)(cid:111)(cid:41)\n\nj\u2208Bi\n\n.\n\n(cid:17) \u2212 Ni(cid:88)\n\nn=1\n\na(t)\ni + a(t)\n\nj\n\nE[zin]TWT\n\ni (xin \u2212 \u00b5i)\n\n(13)\n\nThe overall distributed EM algorithm for D-PPCA is summarized in Algorithm 1. Detailed deriva-\ntion can be found in the supplementary material.\n\n4\n\n\fAlgorithm 1 Distributed Probabilistic PCA (D-PPCA)\nRequire: For every node i initialize W(0)\ni\nfor t = 0, 1, 2, ... until convergence do\n\n, \u00b5(0)\n\n, a(0)\n\ni\n\ni\n\nfor all i \u2208 V do\n\n[E-step] Compute E[zin] and E[zinzT\n[M-step] Compute W(t+1)\n\n, \u00b5(t+1)\n\nin] via (7).\n, a(t+1)\n\ni\n\ni\n\ni\n\nvia (8,9,13).\n\nrandomly and set \u03bb(0)\n\ni = 0, \u03b3(0)\n\ni = 0, \u03b2(0)\n\ni = 0.\n\n, \u00b5(t+1)\n\ni\n\n, and a(t+1)\n\ni\n\nto all neighbors of i \u2208 Bi.\n\nCompute \u03bb(t+1)\n\ni\n\n, \u03b3(t+1)\n\ni\n\n, and \u03b2(t+1)\n\ni\n\nvia (10-12).\n\nend for\nfor all i \u2208 V do\n\nBroadcast W(t+1)\n\ni\n\nend for\nfor all i \u2208 V do\n\nend for\n\nend for\n\n3.2 Missing Data D-PPCA\n\nTraditional PPCA is an effective tool for dealing with data missing-at-random (MAR) in traditional\nPCA [18]. While more sophisticated methods including variational approximations, c.f., [18] are\npossible direct use of PPCA is often suf\ufb01cient in practice. Hence, we adopt D-PPCA as a method to\ndeal with missing data in a distributed consensus setting.\nGeneralization to missing data D-PPCA from D-PPCA is straightforward and follows [18]. From\nthe perspective of ADMM-based learning the only modi\ufb01cations comes in the form of adjusted\nn=1(xin\u2212WiE[zin])\n\nterms for local data summaries. For instance, in (9) the data summary term(cid:80)Ni\n\nbecomes\n\nxi,n,f \u2212 wT\n\ni,f\n\nE[zin],\n\n(14)\n\n(cid:88)\n\nn\u2208Oi,f\n\nwhere f = 1, . . . , D is the index of feature, Oi,f is the set of samples in node i that have the feature\nf present, xi,n,f is the value of the present feature, and wT\ni,f is the f-th row of matrix Wi. Similar\nexpressions can be derived for other local parameters. Note that (10-12) incur no changes.\n\n4 D-PPCA for Structure from Motion (SfM)\n\nIn this section, we consider a speci\ufb01c formulation of the modi\ufb01ed distributed probabilistic PCA\nfor application in af\ufb01ne SfM. In SfM, our goal is to estimate the 3D location of N points on a\nrigid object based on corresponding 2-D points observed from multiple cameras (or views). The\ndimension D of our measurement matrix is thus twice the number of frames each camera observed.\nA simple and effective way to solve this problem is the factorization method [19]. Given a 2D (image\ncoordinate) measurement matrix X, of size 2 \u00b7 #f rames \u00d7 #points, the matrix is factorized into a\n2 \u00b7 #f rames \u00d7 3 motion matrix M and the 3 \u00d7 #points 3D structure matrix S. In the centralized\nsetting this can be easily computed using SVD on X. Equivalently, the estimates of M and S can\nbe found using inference and learning in a centralized PPCA, where M is treated as the PPCA\nparameter and S is the latent structure. There we obtain additional estimates of the variance of\nstructure S, which are not immediately available from the factorization approach (although, they\ncan be found).\nHowever, the above de\ufb01ned (2 \u00b7 #f rames \u00d7 #points) data structure of X is not amenable to\ndistribution of different views (cameras, nodes), as considered in Section 3 of D-PPCA. Namely,\nD-PPCA assumes that the distribution is accomplished by splitting the data matrix X into sets of\nnon-overlapping columns, one for each node. Here, however, we seek to distribute the rows of\nmatrix X, i.e., a set of (subsequent) frames is to be assigned to each node/camera.\nHence, to apply the D-PPCA framework to SfM we need to swap the role of rows and columns,\ni.e., consider modeling of XT. This, subsequently, means that the 3D scene structure (which is to\nbe shared across all nodes in the network) will be treated as the D-PPCA parameter. The latent\nD-PPCA variables will model the unknown and uncertain motion of each camera (and/or object in\nits view).\n\n5\n\n\fSpeci\ufb01cally, we will consider the model\n\ni = W \u00b7 Zi + Ei\nXT\n\n(15)\ni is the matrix of image coordinates of all points in node (camera) i of size #points \u00d7 2 \u00b7\nwhere XT\n#f rames in node i, W is the #points \u00d7 3 3D structure (D-PPCA parameter) matrix and Zi is the\n3 \u00d7 2 \u00b7 #f rames motion matrix of node i.\nOne should note that we have implicitly assumed, in a standard D-PPCA manner, that each column\nof Zi is iid and distributed as N (0, I). However, each pair of subsequent Zi columns represents\none 3 \u00d7 2 af\ufb01ne motion matrix. While those columns are not truly independent our experiments (as\ndemonstrated in Section 5) show that this assumption is not detrimental in practice. Remaining task\nis simply following the same process we did to derive D-PPCA.\nMissing data in SfM will be handled using the formalism presented in Sec. 3.2. Strictly speaking,\nthe model of data missing-at-random is not always applicable to SfM. The reason is that occlusions,\nthe main source of missing data, cannot be treated as a random process. Instead, this setting corre-\nsponds to data missing-not-at-random [18] (MNAR). If treated blindly, this may introduce bias in the\nestimated models. However, as we demonstrate in experiments this assumption does not adversely\naffect SfM when the number of missing points is within a reasonable range.\n\n5 Experiments\n\nIn our experiments we \ufb01rst study the general convergence properties of the D-PPCA algorithm in a\ncontrolled synthetic setting. We then apply the D-PPCA to a set of SfM problems, both on synthetic\nand on real data.\n\n5.1 Empirical Convergence Analysis\n\nUsing synthetic data generated from Gaussian distribution, we observed that D-PPCA works well\nregardless of the number of network nodes, topology, choice of the parameter \u03b7 or even with missing\nvalues in both MAR and MNAR cases. Detailed results for the syntehtic data is provided in the\nsupplementary materials.\n\n5.2 Af\ufb01ne Structure from Motion\n\nWe now show that the modi\ufb01ed D-PPCA can be used as an effective framework for distributed af\ufb01ne\nSfM. We \ufb01rst show results in a controlled environment with synthetic data and then report results\non data from real video sequences. We assume that correspondences across frames and cameras\nare known. For the missing values of MNAR case, we either used the actual occlusions to induce\nmissing points or simulated consistently missing points over several frames.\n\n5.2.1 Synthetic Data (Cube)\n\nWe \ufb01rst generated synthetic data with a rotating unit cube and 5 cameras facing the cube in a 3D\nspace, similar to synthetic experiments in [6]. The cube is centered at the origin of the space and\nrotates 30\u25e6 counterclockwise. We extracted 8 cube points projected on each camera view every 6\u25e6,\ni.e. each camera observed 5 frames. Cameras are placed on a skewed plane, making elevation along\nz-axis as shown in Fig. 2a. For all synthetic and real SfM experiments, we picked \u03b7 = 10 and\ninitialized Wi matrix with feature point coordinates of the \ufb01rst frame visible in the i-th camera with\nsome small noise. The convergence criterion for D-PPCA for SfM was set as 10\u22123 relative error.\nTo measure the performance, we computed maximum subspace angle between the ground truth\n3D coordinates and our estimated 3D structure matrix. For comparison, we conducted traditional\nSVD-based SfM on the same data. In noise free case, D-PPCA for SfM always yielded the same\nperformance as SVD-based SfM with near 0\u25e6.\nWe also tested D-PPCA for SfM with noisy and missing-value cases. First, we generated 20 inde-\npendent samples of all 25 frames with 10 different noise levels. Then we ran D-PPCA 20 times on\neach of the independent sample and averaged the \ufb01nal structure estimates. As Fig. 2b shows, we\nfound that D-PPCA for SfM is fairly robust to noise and tends to stabilize even as the noise level\n\n6\n\n\f(a) Camera Setting\n\n(b) Subspace Angle vs. Ground Truth\n\nFigure 2: Rotating unit cube with multiple cameras. Red circles are camera locations and blue\narrows indicate each camera\u2019s facing direction. Green and red crosses in the right plot are outliers\nfor centralized SVD-based SfM and D-PPCA for SfM, respectively.\n\nincreases. The mean subspace angle tends to be slightly larger than that estimated by the central-\nized SVD SfM, however both reside within the overlapping con\ufb01dence intervals. Considering MAR\nmissing values, we obtained 1.66\u25e6 for 20% missing points averaged over 10 different missing point\nsamples. In MNAR case with actual occlusions considered, D-PPCA yielded, relatively larger, 20\u25e6\nerror. Intuitively, this is because the missing points in the scene are naturally not random. However,\nwe argue that D-PPCA can still handle missing points given the evidence below.\n\n5.2.2 Real Data\n\nFor real data experiement, we \ufb01rst applied D-PPCA for SfM on the Caltech 3D Objects on Turntable\ndataset [20]. The dataset provides various objects rotating on a turntable under different lighting\nconditions. The views of most objects were taken every 5\u25e6 which make it challenging to extract\nfeature points with correspondence across frames. Instead, we used a subset of the dataset which\nprovides views taken every degree. This subset contains images of 5 objects. To simulate multiple\ncameras, we adopted a setting similar to that of [6]. We \ufb01rst extracted \ufb01rst 30\u25e6 images of each\nobject. We then used KLT [21] implementation in Voodoo Camera Tracker1 to extract feature points\nwith correspondence. Lastly, we sequentially and equally partitioned the 30 images into 5 nodes to\nsimulate 5 cameras. Thus, each camera observes 6 frames. Table 1 shows the 5 objects and statistics\nof feature points we extracted from the objects. We used \u03b7 = 10 and convergence criterion 10\u22123.\nDue to the lack of the ground truth 3D coordinates, we compared the subspace angles between the\nstructure inferred using the traditional centralized SVD-based SfM and the D-PPCA-based SfM.\nResults are shown in Table 1 as the mean and variance of 20 independent runs. 10% MAR and\nMNAR results are also provided in the table.\nExperimenal results indicate existance of differences between the reconstructions obtained by cen-\ntralized factorization approach and that of D-PPCA. However, the differences are small, depend on\nthe object in question, and almost always include, within their con\ufb01dence, the factorization result.\nQualitative examination reveals no noticable differences. Moreover, re-projecting back to the cam-\nera coordinate space resulted in close matching with the tracked feature points, as shown in videos\nprovided in supplementary materials.\nWe also tested the utility of D-PPCA for SfM on the Hopkins155 dataset [22]. We adopted virtually\nidentical experimental setting as in [6]. We collected 135 single-object sequences containing image\ncoordinates of points and we simulated multi-camera setting by partitioning the frames sequentially\nand almost equally for 5 nodes and the network was connected using ring topology. Again, we\ncomputed maximum subspace angle between centralized SVD-based SfM and distributed D-PPCA\nfor SfM. We chose the convergence criterion as 10\u22123. Average maximum subspace angle between\n\n1http://www.digilab.uni-hannover.de/docs/manual.html\n\n7\n\n01234501234\u22120.500.511.5200.20.40.60.811.21.41.6Noise level (%)Subspace angle (degree)012345678910D\u2212PPCACentralizedSVD\u2212basedSfM\fTable 1: Caltech 3D Objects on Turntable dataset statistics and quantitative results. Green dots indi-\ncate feature points tracked with correspondance across all 30 frames. All results ran 20 independent\ninitializations. MAR results provide variances over both various initializations and missing value\nsettings.\n\nObject\n\nBallSander\n\nBoxStuff\n\nRooster\n\nStanding\n\nStorageBin\n\n# Points\n# Frames\n\n62\n30\n\n67\n30\n\n189\n30\n\n310\n30\n\n102\n30\n\nMean\n\nSubspace angle b/w centralized SVD SfM and D-PPCA (degree)\n1.4848\n0.4159\n\n1.4397\n0.4567\n\n1.4767\n0.9448\n\n2.6221\n1.6924\n\n0.4463\n1.2002\n\n2.8358\n1.3591\n0.0444\n\nVariance\nSubspace angle b/w fully observable centralized PPCA SfM and D-PPCA with MAR (degree)\nMean\n\nVar.(init)\nVar.(miss)\nSubspace angle b/w fully observable centralized PPCA SfM and D-PPCA with MNAR (degree)\n\n6.2991\n4.3562\n0.5729\n\n2.1556\n0.1351\n0.0161\n\n7.6492\n6.6424\n0.7603\n\n5.2506\n3.8810\n0.1755\n\nMean\n\nVariance\n\n3.1405\n0.0124\n\n6.4664\n3.1955\n\n5.8027\n2.4333\n\n9.2661\n2.9720\n\n3.7965\n0.0089\n\nD-PPCA for SfM and SVD-based SfM for all objects was 3.97\u25e6 with variance 7.06. However,\nlooking into the result more carefully, we found that even with substantially larger subspace angle,\n3D structure estimates were similar to that of SVD-based SfM only with orthogonal ambiguity issue.\nMoreover, more than 53% of all objects yielded the subspace angle below 1\u25e6, 77% of them below\n5\u25e6 and more than 94% were less than 15\u25e6. With 10% MAR, we obtained the mean 20.07\u25e6 with\nvariance 27.94\u25e6 with about 18% of them below 1\u25e6, 56% of them below 5\u25e6 and more than 70% of\nthem less than the mean. We could not perform MNAR experiments on Hopkins as the ground truth\nocclusion information is not provided with the dataset.\n\n6 Discussion and Future Work\n\nIn this work we introduced a general approach for learning parameters of traditional centralized\nprobabilistic models, such as PPCA, in a distributed setting. Our synthetic data experiments showed\nthat the proposed algorithm is robust to choices of initial parameters and, more importantly, is not\nadversely affected by variations in network size, topology or missing values. In the SfM problems,\nthe algorithm can be effectively used to distribute computation of 3D structure and motion in camera\nnetworks, while retaining the probabilistic nature of the original model.\nDespite its promising performance D-PPCA for SfM exhibits some limitations. In particular, we\nassume the independence of the af\ufb01ne motion matrix parameters in (15). The assumption is clearly\ninconsistent with the modeling of motion on the SE(3) manifold. However, our experiments demon-\nstrate that, in practice, this violation is not crucial. This shortcoming can be amended in one of sev-\neral possible ways. One can reduce the iid assumption of individual samples to that of subsequent\ncolumns (i.e., full 3x2 motion matrices). Our additional experiments, not reported here, indicate\nno discernable utility of this approach. A more principled approach would be to de\ufb01ne priors for\nmotion matrices compatible with SE(3), using e.g., [23]. While appealing, the priors would render\nthe overall model non-linear and would require additional algorithmic considerations, perhaps in the\nspirit of [1].\n\nAcknowledgments\n\nThis work was supported in part by the National Science Foundation under Grant No. IIS 0916812.\n\n8\n\n\fReferences\n[1] Roberto Tron and Rene Vidal. Distributed Computer Vision Algorithms. IEEE Signal Processing Maga-\n\nzine, 28:32\u201345, 2011.\n\n[2] A.Y. Yang, S. Maji, C.M. Christoudias, T. Darrell, J. Malik, and S.S. Sastry. Multiple-view Object Recog-\nnition in Band-limited Distributed Camera Networks. In Distributed Smart Cameras, 2009. ICDSC 2009.\nThird ACM/IEEE International Conference on, 30 2009-sept. 2 2009.\n\n[3] Richard J. Radke. A Survey of Distributed Computer Vision Algorithms. In Hideyuki Nakashima, Hamid\nAghajan, and Juan Carlos Augusto, editors, Handbook of Ambient Intelligence and Smart Environments.\nSpringer Science+Business Media, LLC, 2010.\n\n[4] A. Wiesel and A.O. Hero. Decomposable Principal Component Analysis. Signal Processing, IEEE\n\nTransactions on, 57(11):4369\u20134377, 2009.\n\n[5] Sergio V. Macua, Pavle Belanovic, and Santiago Zazo. Consensus-based Distributed Principal Component\nIn Signal Processing Advances in Wireless Communications\n\nAnalysis in Wireless Sensor Networks.\n(SPAWC), 2010 IEEE Eleventh International Workshop on, pages 1\u20135, June 2010.\n\n[6] Roberto Tron and Rene Vidal. Distributed Computer Vision Algorithms Through Distributed Averaging.\n\nIn IEEE Conference on Computer Vision and Pattern Recognition, pages 57\u201363, 2011.\n\n[7] Lin Xiao, Stephen Boyd, and Sanjay Lall. A Scheme for Robust Distributed Sensor Fusion Based on\nAverage Consensus. In International Conference on Information Processing in Sensor Networks, pages\n63\u201370, April 2005.\n\n[8] R. Olfati-Saber. Distributed Kalman Filtering for Sensor Networks. In Decision and Control, 2007 46th\n\nIEEE Conference on, pages 5492 \u20135498, dec. 2007.\n\n[9] Bi Song, A.T. Kamal, C. Soto, Chong Ding, J.A. Farrell, and A.K. Roy-Chowdhury. Tracking and Activity\nRecognition Through Consensus in Distributed Camera Networks. Image Processing, IEEE Transactions\non, 19(10):2564 \u20132579, oct. 2010.\n\n[10] P.A. Forero, A. Cano, and G.B. Giannakis. Distributed Clustering Using Wireless Sensor Networks.\n\nSelected Topics in Signal Processing, IEEE Journal of, 5(4):707 \u2013724, aug. 2011.\n\n[11] Sam Roweis and Zoubin Ghahramani. A Unifying Review of Linear Gaussian Models. Neural Compu-\n\ntation, 11:305\u2013345, 1999.\n\n[12] Ami Wiesel, Yonina C. Eldar, and Alfred O. Hero. Covariance Estimation in Decomposable Gaussian\n\nGraphical Models. IEEE Transactions on Signal Processing, 58(3):1482\u20131492, 2010.\n\n[13] Andrew R. Conn, Nicholas I. M. Gould, and Philippe L. Toint. A globally convergent augmented La-\ngrangian algorithm for optimization with general constraints and simple bounds. SIAM J. Numer. Anal.,\n28:545\u2013572, February 1991.\n\n[14] Robert Michael Lewis and Virginia Torczon. A Globally Convergent Augmented Lagrangian Pattern\nSearch Algorithm for Optimization with General Constraints and Simple Bounds. SIAM J. on Optimiza-\ntion, 12:1075\u20131089, April 2002.\n\n[15] Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed Optimization\nand Statistical Learning via the Alternating Direction Method of Multipliers. In Michael Jordan, editor,\nFoundations and Trends in Machine Learning, volume 3, pages 1\u2013122. Now Publishers, 2011.\n\n[16] Pedro A. Forero, Alfonso Cano, and Geogios B. Giannakis. Consensus-Based Distributed Support Vector\n\nMachines. Journal of Machine Learning Research, 11:1663\u20131707, 2010.\n\n[17] Michael E. Tipping and Chris M. Bishop. Probabilistic Principal Component Analysis. Journal of the\n\nRoyal Statistical Society, Series B, 61:611\u2013622, 1999.\n\n[18] Alexander Ilin and Tapani Raiko. Practical Approaches to Principal Component Analysis in the Presence\n\nof Missing Values. Journal of Machine Learning Research, 11:1957\u20132000, 2010.\n\n[19] Carlo Tomasi and Takeo Kanade. Shape and motion from image streams under orthography: a factoriza-\n\ntion method. International Journal of Computer Vision, 9:137\u2013154, 1992. 10.1007/BF00129684.\n\n[20] Pierre Moreels and Pietro Perona. Evaluation of Features Detectors and Descriptors based on 3D Objects.\n\nInternational Journal of Computer Vision, 73(3):263\u2013284, July 2007.\n\n[21] Carlo Tomasi and Takeo Kanade. Detection and Tracking of Point Features. Technical Report CMU-CS-\n\n91-132, Carnegie Mellon University, April 1991.\n\n[22] Roberto Tron and Rene Vidal. A Benchmark for the Comparison of 3-D Motion Segmentation Algo-\n\nrithms. In IEEE International Conference on Computer Vision and Pattern Recognition, 2007.\n\n[23] Yasuko Chikuse. Statistics on Special Manifolds, volume 174 of Lecture Notes in Statistics. Springer, 1\n\nedition, February 2003.\n\n9\n\n\f", "award": [], "sourceid": 1324, "authors": [{"given_name": "Sejong", "family_name": "Yoon", "institution": null}, {"given_name": "Vladimir", "family_name": "Pavlovic", "institution": null}]}