{"title": "Rate Distortion Codes in Sensor Networks: A System-level Analysis", "book": "Advances in Neural Information Processing Systems", "page_first": 931, "page_last": 938, "abstract": "", "full_text": "Rate Distortion Codes in Sensor Networks:\n\nA System-level Analysis\n\nTatsuto Murayama and Peter Davis\n\nNTT Communication Science Laboratories\nNippon Telegraph and Telephone Corporation\n\n\u201cKeihanna Science City\u201d, Kyoto 619-0237, Japan\n\n{murayama,davis}@cslab.kecl.ntt.co.jp\n\nAbstract\n\nThis paper provides a system-level analysis of a scalable distributed sens-\ning model for networked sensors. In our system model, a data center ac-\nquires data from a bunch of L sensors which each independently encode\ntheir noisy observations of an original binary sequence, and transmit their\nencoded data sequences to the data center at a combined rate R, which\nis limited. Supposing that the sensors use independent LDGM rate dis-\ntortion codes, we show that the system performance can be evaluated for\nany given \ufb01nite R when the number of sensors L goes to in\ufb01nity. The\nanalysis shows how the optimal strategy for the distributed sensing prob-\nlem changes at critical values of the data rate R or the noise level.\n\n1\n\nIntroduction\n\nDevice and sensor networks are shaping many activities in our society. These networks are\nbeing deployed in a growing number of applications as diverse as agricultural management,\nindustrial controls, crime watch, and military applications. Indeed, sensor networks can be\nconsidered as a promising technology with a wide range of potential future markets [1].\nStill, for all the promise, it is often dif\ufb01cult to integrate the individual components of a sen-\nsor network in a smart way. Although we see many breakthroughs in component devices,\nadvanced software, and power managements, system-level understanding of the emerging\ntechnology is still weak. It requires a shift in our notion of \u201cwhat to look for\u201d. It requires\na study of collective behavior and resulting trade-offs. This is the issue that we address in\nthis article. We demonstrate the usefulness of adopting new approaches by considering the\nfollowing scenario.\nConsider that a data center is interested in the data sequence, {X(t)}\u221e\nt=1, which cannot be\nobserved directly. Therefore, the data center deploys a bunch of L sensors which each inde-\npendently encodes its noisy observation of the sequence, {Yi(t)}\u221e\nt=1, without sharing any\ninformation, i.e., the sensors are not permitted to communicate and decide what to send to\nthe data center beforehand. The data center collects separate samples from all the L sensors\nand uses them to recover the original sequence. However, since {X(t)}\u221e\nt=1 is not the only\npressing matter which the data center must consider, the combined data rate R at which\nthe sensors can communicate with it is strictly limited. A formulation of decentralized\ncommunication with estimation task, the \u201cCEO problem\u201d, was \ufb01rst proposed by Berger\n\n\fand Zhang [2], providing a new theoretical framework for large scale sensing systems. In\nthis outstanding work, some interesting properties of such systems have been revealed. If\nthe sensors were permitted to communicate on the basis of their pooled observations, then\nthey would be able to smooth out their independent observation noises entirely as L goes\nto in\ufb01nity. Therefore, the data center can achieve an arbitrary \ufb01delity D(R), where D(\u00b7)\ndenotes the distortion rate function of {X(t)}. In particular, the data center recovers almost\ncomplete information if R exceeds the entropy rate of {X(t)}. However, if the sensors are\nnot allowed to communicate with each other, there does not exist a \ufb01nite value of R for\nwhich even in\ufb01nitely many sensors can make D arbitrarily small [2].\n\nIn this paper, we introduce a new analytical model for a massive sensing system with a\n\ufb01nite data rate R. More speci\ufb01cally, we assume that the sensors use LDGM codes for rate\ndistortion coding, while the data center recovers the original sequence by using optimal\n\u201cmajority vote\u201d estimation [3]. We consider the distributed sensing problem of deciding\nthe optimal number of sensors L given the combined data rate R. Our asymptotic anal-\nysis successfully provides the performance of the whole sensing system when L goes to\nin\ufb01nity, where the data rate for an individual sensor information vanishes. Here, we exploit\nstatistical methods which have recently been developed in the \ufb01eld of disordered statistical\nsystems, in particular, the spin glass theory. The paper is organized as follows. In Sec-\ntion 2, we introduce a system model for the sensor network. Section 3 summarizes the\nresults of our approach, where the following section provides the outline of our analysis.\nConclusions are given in the last section.\n\n2 System Model\nLet P (x) be a probability distribution common to {X(t)} \u2208 X , and W (y|x) be a stochastic\nmatrix de\ufb01ned on X \u00d7 Y, with Y denotes the common alphabet of {Yi(t)}, where i =\n1, \u00b7 \u00b7 \u00b7 , L and t \u2265 1. In the general setup, we assume that the instantaneous joint probability\ndistribution in the form\n\nW (yi|x)\n\nPr[x, y1, \u00b7 \u00b7 \u00b7 , yL] = P (x)\nfor the temporally memoryless source {X(t)}\u221e\nt=1. Here, the random variables Yi(t)\nare conditionally independent when X(t) is given, and the conditional probabilities\nW [yi(t)|x(t)] are identical for all i and t. In this paper, we impose the binary assump-\ntions to the problem, i.e., the data sequence {X(t)} and its noisy observations {Y i(t)} are\nall assumed to be binary sequences. Therefore, the stochastic matrix can be parameterized\nas\n\ni=1\n\nL(cid:1)\n\n(cid:2)\n1 \u2212 p,\np,\n\nW (y|x) =\n\nif y = x\notherwise\n\n,\n\nwhere p \u2208 [0, 1] represents the observation noise. Note also that the alphabets have been\nselected as X = Y. Furthermore, for simplicity, we also assume that P (x) = 1/2 always\nholds, implying that a purely random source is observed.\ni = [yi(1), \u00b7 \u00b7 \u00b7 , yi(n)]T of length n\nAt the encoding stage, a sensor i encodes a block y\nfrom the noisy observation {yi(t)}\u221e\nt=1, into a block z i = [zi(1), \u00b7 \u00b7 \u00b7 , zi(m)]T of length\nm de\ufb01ned on Z. Hereafter, we take the Boolean representation of the binary alphabet\nX = {0, 1}, therefore Y = Z = {0, 1} as well. Let \u02c6y\ni be a reproduction sequence for the\nblock, and we have a known integer m < n. Then, making use of a Boolean matrix Ai of\ndimensionality n\u00d7m, we are to \ufb01nd an m bit codeword sequence z i = [zi(1), \u00b7 \u00b7 \u00b7 , zi(m)]T\nwhich satis\ufb01es\n\n\u02c6y\n\ni = Aizi\n\n(mod 2) ,\n\n(1)\n\n\fwhere the \ufb01delity criterion\n\n1\nn\n\ndH(y\n\ni, \u02c6y\n\ni)\n\nD =\n\n(2)\nholds [4]. Here the Hamming distance dH(\u00b7, \u00b7) is used for the distortion measure. Note\nthat we have applied modulo-2 arithmetic for the additive operation in (1). Let A i be\ncharacterized by K ones per row and C per column. The \ufb01nite, and usually small, numbers\nK and C de\ufb01ne a particular LDGM code family. The data center then collects the L\ncodeword sequences, z1, \u00b7 \u00b7 \u00b7 , zL. Since all the L codewords are of the same length m,\nthe combined data rate will be R = L \u00d7 m/n. Therefore, in our scenario, the data center\ndeploys exchangeable sensors with \ufb01xed quality reproductions, \u02c6y\nL. Lastly, the tth\nsymbol of the estimate, \u02c6x = [\u02c6x(1), \u00b7 \u00b7 \u00b7 , \u02c6x(n)]T , is to be calculated by majority vote [3],\nif \u02c6y1(t) + \u00b7 \u00b7 \u00b7 + \u02c6yL(t) \u2264 L/2\n\n1, \u00b7 \u00b7 \u00b7 , \u02c6y\n\n(cid:2)\n\n.\n\n(3)\n\n\u02c6x(t) =\n\n0,\n1, otherwise\n\nTherefore, overall performance of the system can be measured by the expected bit error\nfrequency for decisions by the majority vote (3), P e = Pr[x (cid:4)= \u02c6x].\nIn this paper, we consider two limit cases of decentralization levels; (1) The extreme situa-\ntion of L \u2192 \u221e, and (2) the case of L = R. The former case means that the data rate for\nan individual sensor information vanishes, while the latter case results in the transmission\nwithout coding techniques. In general, it is dif\ufb01cult to determine which level is optimal for\nthe estimation, i.e., which scenario results in the smaller value of Pe. Indeed, by using the\nrate distortion codes, the data center could use as many sensors as possible for a given R.\nHowever, the quality of the individual reproduction would be less informative. The best\nchoice seems to depend largely on R, as well as p.\n\n3 Main Results\nFor simplicity, we consider the following two solvable cases; K = 2 for C \u2265 K and\nthe optimal case of K \u2192 \u221e. Let p be a given observation noise level, and R the \ufb01nite\nreal value of a given combined data rate. Letting L \u2192 \u221e, we \ufb01nd the expected bit error\nfrequency to be\n\n(4)\n\n(5)\n\n\u221a\nR\n\n(cid:3) \u2212(1\u22122p)cg\n(cid:7)\n(cid:6)\u221a\n\n\u2212\u221e\n\n\u2212 \u03c3\n2\u221a\n\u03b1\n\ndr N(0, 1)\n\n(cid:8)\n\n(cid:7)tanh2 x(cid:8)\u03c0(x)\n\nPe(p, R) =\n\n\u03b1\n\n(cid:4)\n1\u221a\n\u221a\n2\n2 ln 2\n\n(cid:5)\u221a\n2 + 2 ln 2\u221a\n(cid:9)\n\n\u03b1\n\n\u2212\n(cid:10)\n\nwith the constant value\n\n\u03b1\n2\n\ncg =\n\n(K = 2)\n(K \u2192 \u221e)\nwhere the rescaled variance \u03c32 = \u03b1 (cid:7)\u02c6x2(cid:8)\u02c6\u03c0(\u02c6x) and the \ufb01rst step RSB enforcement\n(cid:7)tanh2 x (1 + 2x csch x sech x)(cid:8)\u03c0(x) = 0\n\nln 2 +\n\n+\n\n\u22121\n2\n\n2\n\u03b1\n\n1\n2\n\n\u2212 \u03c32\n\u03b1\n\nholds. Here N(X, Y ) denotes the normal distribution with the mean X and the variance Y .\nThe rescaled variance \u03c32 and the scale invariant parameter \u03b1 is determined numerically,\nwhere we use the following notations.\n\n(cid:3) \u221e\n(cid:3)\n\n\u2212\u221e\n+1\n\n\u22121\n\ndx\u221a\n2\u03c0\u03c32\nd\u02c6x\u221a\n2\u03c0\u03c32\n\n(cid:12)\n\n(cid:11)\n\u2212 x2\n2\u03c32\n\u22121 exp\n\n( \u00b7 ) ,\n(cid:11)\n\nexp\n\n(1 \u2212 \u02c6x2)\n\n(cid:7) \u00b7 (cid:8)\u03c0(x) =\n(cid:7) \u00b7 (cid:8)\u02c6\u03c0(\u02c6x) =\n\n(cid:12)\n\n( \u00b7 ) .\n\n\u22121 \u02c6x)2\n\u2212(tanh\n2\u03c32\n\n\f(a) Narrow Band\n2\n\n)\nR\n\n,\np\n(\n)\nB\nd\n(\n\ne\nP\n\n1\n\n0\n\n\u22121\n\n\u22122\n\n0\n\nR = 1\n\nR = 2\nR = 10\n\n0.1\n\n0.2\n\np\n\n0.3\n\n0.4\n\n0.5\n\n(b) Broadband\n\nR = 100\nR = 500\nR = 1000\n\n)\nR\n\n,\np\n(\n)\nB\nd\n(\n\ne\nP\n\n150\n100\n50\n0\n\u221250\n\u2212100\n\u2212150\n\n0\n\n0.1\n\n0.2\n\np\n\n0.3\n\n0.4\n\n0.5\n\nFigure 1: P (dB)\n\ne\n\n(p, R) for K = 2. (a) Narrow band (b) Broadband\n\nTherefore, it is straightforward to evaluate (4) with (5) for given parameters, p and R.\n\nFor a given \ufb01nite value of R, we see what happens to the quality of the estimate when the\nnoise level p varies. Fig. 1 and Fig. 2 shows the typical behavior of the bit error frequency,\nPe(p, R), in decibel (dB), where the reference level is chosen as\n\n(cid:4)(cid:13)\n(cid:13)\n\n(cid:14)\n(cid:15)\n\nR\nl\n\n(cid:15)\n(1 \u2212 p)lpR\u2212l,\n(1 \u2212 p)lpR\u2212l + 1\n\n(cid:14)\n\n(cid:14)\n\n(R\u22121)/2\nR/2\u22121\nl=0\nl=0\n\nR\nl\n\n(cid:15)\n\nR\n\n2\n\nR/2\n\nP (0)\n\ne\n\n(p, R) =\n\n(1 \u2212 p)R/2pR/2\n\n(R is odd)\n(R is even)\n\n(6)\n\nfor a given integer R. The reference (6) denotes Pe for the case of L = R, i.e., the case\nwhen the sensors are not allowed to compress their observations. Here, in decibel, we have\n\nP (dB)\n\ne\n\n(p, R) = 10 log Pe(p, R)\n(p, R)\n\nP (0)\n\ne\n\n,\n\nwhere the log is to base 10. Note that the zero level in decibel occurs when the measured\nerror frequency Pe(p, R) is equal to the reference level. Therefore, it is also possible to\nhave negative levels, which would mean an expected bit error frequency much smaller than\nthe reference level. In the case of small combined data rate R, the narrow band case, the\nnumerical results in Fig. 1 (a) and Fig. 2 (a) show that the quality of the estimate is sensitive\nto the parity of the integer R. In particular, the R = 2 case has the lowest threshold level,\npc = 0.0921 for Fig. 1 (a) and pc = 0.082 for Fig. 2 (a) respectively, beyond which\nthe L \u2192 \u221e scenario outperforms the L = R scenario, while the R = 1 case does not\nhave such a threshold. In contrast, if the bandwidth is wide enough, the difference of the\nexpected bit error probabilities in decibel, P (dB)\n(p, R), is proved to have similar qualitative\ncharacteristics as shown in Fig. 1 (b) and Fig. 2 (b). Moreover, our preliminary experiments\nfor larger systems also indicate that the threshold p c seems to converge to the value, 0.165\nand 0.146 respectively, as L goes to in\ufb01nity; we are currently working on the theoretical\nderivation.\n\ne\n\n4 Outline of Derivation\nSince the predetermined matrices A1, \u00b7 \u00b7 \u00b7 , AL are selected randomly, it is quite natural to\nsay that the instantaneous series, de\ufb01ned by \u02c6y(t) = [\u02c6y1(t), \u00b7 \u00b7 \u00b7 , \u02c6yL(t)]T , can be modeled\n\n\f(a) Narrow Band\n\n2\n\n1\n\n0\n\nR = 1\n\nR = 2\n\nR = 10\n\n(b) Broadband\n\nR = 100\nR = 500\nR = 1000\n\n)\nR\n\n,\np\n(\n)\nB\nd\n(\n\ne\nP\n\n\u22121\n\n\u22122\n\n0\n\n150\n100\n50\n0\n\u221250\n\u2212100\n\u2212150\n\n)\nR\n\n,\np\n(\n)\nB\nd\n(\n\ne\nP\n\n0\n\n0.1\n\n0.2\n\np\n\n0.3\n\n0.4\n\n0.5\n\n0.1\n\n0.2\n\np\n\n0.3\n\n0.4\n\n0.5\n\nFigure 2: P (dB)\n\ne\n\n(p, R) for K \u2192 \u221e. (a) Narrow band (b) Broadband\n\n(cid:2)\n\nusing the Bernoulli trials. Here, the reproduction problem reduces to a channel model,\nwhere the stochastic matrix is de\ufb01ned as\nW (\u02c6y|x) =\n\n(7)\nwhere q denotes the quality of the reproductions, i.e., Pr[x (cid:4)= \u02c6yi] = 1\u2212 q for i = 1, \u00b7 \u00b7 \u00b7 , L.\nLetting the channel model (7) for the reproduction problem be valid, the expected bit error\nfrequency can be well captured by using the cumulative probability distributions\n\nif \u02c6y = x\nq,\n1 \u2212 q, otherwise\n\n,\n\n: L, q),\n\n\u2212 1 : L, q) + 1\n\nif L is odd\n2 : L, q) otherwise\n\n2 b( L\n\n(8)\n\n(cid:2)\nB( L\u22121\nB( L\n2\n\n2\n\nwith\n\nPe = Pr[x (cid:4)= \u02c6x] =\n(cid:1)(cid:16)\n\nL\n\n(cid:3)\nB(L\n\nl=0\n\n: L, q) =\n\nb(l : L, q) ,\n\nb(l : L, q) =\n\nql(1 \u2212 q)L\u2212l ,\n\nwhere an integer l be the total number of non-\ufb02ipped elements in \u02c6y(t), and the second term\n(1/2)b(L/2 : L, q) represents random guessing with l = L/2. Note that the reproduction\nquality q can be easily obtained by the simple algebra q = pD + (1 \u2212 p)(1 \u2212 D), where D\nis the distortion with respect to coding.\nSince the error probability (8) is given by a function of q, we \ufb01rstly derive an analytical\nsolution for the quality q in the limit L \u2192 \u221e, keeping R \ufb01nite. In this approach, we apply\nthe method of statistical mechanics to evaluate the typical performance of the codes [4].\nAs a \ufb01rst step, we translate the Boolean alphabets Z = {0, 1} to the \u201cIsing\u201d ones, S =\n{+1, \u22121}. Consequently, we need to translate the additive operations, such as, zi(s) +\n(cid:3) =\nzi(s\n1, \u00b7 \u00b7 \u00b7 , m. Similarly, we translate the Boolean yi(t)s into the Ising Ji(t)s. For simplicity,\nwe omit the subscript i, which labels the L agents, in the rest of this section. Following the\nprescription of Sourlas [5], we examine the Gibbs-Boltzmann distribution\n\n(cid:3)) (mod 2) into their multiplicative representations, \u03c3 i(s) \u00d7 \u03c3i(s\n\n(cid:3)) \u2208 S for s, s\n\nPr[\u03c3] =\n\nexp [\u2212\u03b2H(\u03c3|J)]\n\nZ(J)\n\nwith Z(J) =\n\n\u2212\u03b2H(\u03c3|J) ,\ne\n\n(9)\n\n(cid:9)\n\n(cid:10)\n\nL\nl\n\n(cid:16)\n\n\u03c3\n\n\fwhere the Hamiltonian of the Ising system is de\ufb01ned as\n\nH(\u03c3|J) = \u2212\n\nAs1...sK Ji[t(s1, . . . , sK)]\u03c3(s1) . . . \u03c3(sK) .\n\n(10)\n\n(cid:16)\n\ns1<\u00b7\u00b7\u00b7