{"title": "A Gaussian Tree Approximation for Integer Least-Squares", "book": "Advances in Neural Information Processing Systems", "page_first": 638, "page_last": 645, "abstract": "This paper proposes a new algorithm for the linear least squares problem where the unknown variables are constrained to be in a finite set. The factor graph that corresponds to this problem is very loopy; in fact, it is a complete graph. Hence, applying the Belief Propagation (BP) algorithm yields very poor results. The algorithm described here is based on an optimal tree approximation of the Gaussian density of the unconstrained linear system. It is shown that even though the approximation is not directly applied to the exact discrete distribution, applying the BP algorithm to the modified factor graph outperforms current methods in terms of both performance and complexity. The improved performance of the proposed algorithm is demonstrated on the problem of MIMO detection.", "full_text": "A Gaussian Tree Approximation for Integer\n\nLeast-Squares\n\nJacob Goldberger\nSchool of Engineering\n\nBar-Ilan University\n\nAmir Leshem\n\nSchool of Engineering\n\nBar-Ilan University\n\ngoldbej@eng.biu.ac.il\n\nleshema@eng.biu.ac.il\n\nAbstract\n\nThis paper proposes a new algorithm for the linear least squares problem where\nthe unknown variables are constrained to be in a \ufb01nite set. The factor graph that\ncorresponds to this problem is very loopy; in fact, it is a complete graph. Hence,\napplying the Belief Propagation (BP) algorithm yields very poor results. The al-\ngorithm described here is based on an optimal tree approximation of the Gaussian\ndensity of the unconstrained linear system. It is shown that even though the ap-\nproximation is not directly applied to the exact discrete distribution, applying the\nBP algorithm to the modi\ufb01ed factor graph outperforms current methods in terms\nof both performance and complexity. The improved performance of the proposed\nalgorithm is demonstrated on the problem of MIMO detection.\n\n1 Introduction\n\nFinding the linear least squares \ufb01t to data is a well-known problem, with applications in almost ev-\nery \ufb01eld of science. When there are no restrictions on the variables, the problem has a closed form\nsolution. In many cases, a-priori knowledge on the values of the variables is available. One example\nis the existence of priors, which leads to Bayesian estimators. Another example of great interest\nin many applications is when the variables are constrained to a discrete \ufb01nite set. This problem\nhas many diverse applications such as decoding of multi-input-multi-output (MIMO) digital com-\nmunication systems, GPS system ambiguity resolution [15] and many lattice problems in computer\nscience, such as \ufb01nding the closest vector in a lattice to a given point in Rn [1], and vector subset\nsum problems which have applications in cryptography [11]. In contrast to the continuous linear\nleast squares problem, this problem is known to be NP hard.\n\nThis paper concentrates on the MIMO application. It should be noted, however, that the proposed\nmethod is general and can be applied to any integer linear least-square problem. A multiple-input-\nmultiple-output (MIMO) is a communication system with n transmit antennas and m receive anten-\nnas. The tap gain from transmit antenna i to receive antenna j is denoted by Hij. In each use of\n\u22a4 is independently selected from a \ufb01nite set of points\nthe MIMO channel a vector x = (x1, ..., xn)\nA according to the data to be transmitted, so that x \u2208 An. A standard example of a \ufb01nite set A\nin MIMO communication is A = {\u22121, 1} or more generally A = {\u00b11, \u00b13, ..., \u00b1(2k + 1)}. The\nreceived vector y is given by:\n\ny = Hx + \u01eb\n\n(1)\nThe vector \u01eb is an additive noise in which the noise components are assumed to be zero mean,\nstatistically independent Gaussians with a known variance \u03c32I. The m \u00d7 n matrix H is assumed\nto be known. (In the MIMO application we further assume that H comprises iid elements drawn\nfrom a normal distribution of unit variance.) The MIMO detection problem consists of \ufb01nding the\nunknown transmitted vector x given H and y. The task, therefore, boils down to solving a linear\nsystem in which the unknowns are constrained to a discrete \ufb01nite set. Since the noise \u01eb is assumed\n\n1\n\n\fto be additive Gaussian, the optimal maximum likelihood (ML) solution is:\n\n\u02c6x = arg min\nx\u2208An\n\nkHx \u2212 yk2\n\n(2)\n\nHowever, going over all the |A|n vectors is unfeasible when either n or |A| are large.\nA simple sub-optimal solution is based on a linear decision that ignores the \ufb01nite set constraint:\n\n(3)\nand then, neglecting the correlation between the symbols, \ufb01nding the closest point in A for each\nsymbol independently:\n\nz = (H\n\nH)\u22121H\n\n\u22a4\n\n\u22a4\n\ny\n\n\u02c6xi = arg min\na\u2208A\n\n|zi \u2212 a|\n\n(4)\n\nThis scheme performs poorly due to its inability to handle ill-conditioned realizations of the matrix\nH. Somewhat better performance can be obtained by using a minimum mean square error (MMSE)\nBayesian estimation on the continuous linear system. Let e be the variance of a uniform distribution\nover the members of A. We can partially incorporate the information that x \u2208 An by using the prior\nGaussian distribution x \u223c N (0, eI). The MMSE estimation becomes:\n\nE(x|y) = (H\n\n\u22a4\n\nH +\n\n\u03c32\ne\n\nI)\u22121H\n\n\u22a4\n\ny\n\n(5)\n\nand then the \ufb01nite-set solution is obtained by \ufb01nding the closest lattice point in each component\nindependently. A vast improvement over the linear approaches described above can be achieved by\nusing sequential decoding:\n\n\u2022 Apply MMSE (5) and choose the most reliable symbol, i.e. the symbol that corresponds to\n\nthe column with the minimal norm of the matrix:\n\n(H\n\n\u22a4\n\nH +\n\n\u03c32\ne\n\n\u22a4\n\nI)\u22121H\n\n\u2022 Make a discrete symbol decision for the most reliable symbol \u02c6xi. Eliminate the detected\nhjxj = y \u2212 hi \u02c6xi (hj is the j-th column of H) to obtain a new smaller linear\n\nsymbol: Pj6=i\nsystem. Go to the \ufb01rst step to detect the next symbol.\n\nThis algorithm, known as MMSE-SIC [5], has the best performance for this family of linear-based\nalgorithms but the price is higher complexity. These linear type algorithms can also easily provide\nprobabilistic (soft-decision) estimates for each symbol. However, there is still a signi\ufb01cant gap\nbetween the detection performance of the MMSE-SIC algorithm and the performance of the optimal\nML detector.\n\nMany alternative structures have been proposed to approach ML detection performance. For exam-\nple, sphere decoding algorithm (an ef\ufb01cient way to go over all the possible solutions) [7], approaches\nusing the sequential Monte Carlo framework [3] and methods based on semide\ufb01nite relaxation [17]\nhave been implemented. Although the detection schemes listed above reduce computational com-\nplexity compared to the exhaustive search of ML solution, sphere decoding is still exponential in the\naverage case [9] and the semide\ufb01nite relaxation is a high-degree polynomial. Thus, there is still a\nneed for low complexity detection algorithms that can achieve good performance.\n\nThis study attempts to solve the integer least-squares problem using the Belief Propagation (BP)\nparadigm. It is well-known (see e.g. [14]) that a straightforward implementation of the BP algorithm\nto the MIMO detection problem yields very poor results since there are a large number of short\ncycles in the underlying factor graph. In this study we introduce a novel approach to utilize the BP\nparadigm for MIMO detection. The proposed variant of the BP algorithm is both computationally\nef\ufb01cient and achieves near optimal results.\n\n2 The Loopy Belief Propagation Approach\n\nGiven the constrained linear system y = Hx + \u01eb, and a uniform prior distribution on x, the posterior\nprobability function of the discrete random vector x given y is:\n\np(x|y) \u221d exp(\u2212\n\n1\n2\u03c32 kHx \u2212 yk2)\n\n,\n\nx \u2208 An\n\n(6)\n\n2\n\n\fThe notation \u221d stands for equality up to a normalization constant. Observing that kHx \u2212 yk2 is\na quadratic expression, it can be easily veri\ufb01ed that p(x|y) is factorized into a product of two- and\nsingle-variable potentials:\n\np(x1, .., xn|y) \u221d Y\n\n\u03c8i(xi) Y\n\n\u03c8ij (xi, xj )\n\ni\n\ni