{"title": "Efficient Minimax Signal Detection on Graphs", "book": "Advances in Neural Information Processing Systems", "page_first": 2708, "page_last": 2716, "abstract": "Several problems such as network intrusion, community detection, and disease outbreak can be described by observations attributed to nodes or edges of a graph. In these applications presence of intrusion, community or disease outbreak is characterized by novel observations on some unknown connected subgraph. These problems can be formulated in terms of optimization of suitable objectives on connected subgraphs, a problem which is generally computationally difficult. We overcome the combinatorics of connectivity by embedding connected subgraphs into linear matrix inequalities (LMI). Computationally efficient tests are then realized by optimizing convex objective functions subject to these LMI constraints. We prove, by means of a novel Euclidean embedding argument, that our tests are minimax optimal for exponential family of distributions on 1-D and 2-D lattices. We show that internal conductance of the connected subgraph family plays a fundamental role in characterizing detectability.", "full_text": "Ef\ufb01cient Minimax Signal Detection on Graphs\n\nDivision of Systems Engineering\n\nDepartment of Electrical and Computer Engineering\n\nVenkatesh Saligrama\n\nBoston University\nBoston, MA 02215\n\nsrv@bu.edu\n\nJing Qian\n\nBoston University\n\nBrookline, MA 02446\n\njingq@bu.edu\n\nAbstract\n\nSeveral problems such as network intrusion, community detection, and disease\noutbreak can be described by observations attributed to nodes or edges of a graph.\nIn these applications presence of intrusion, community or disease outbreak is char-\nacterized by novel observations on some unknown connected subgraph. These\nproblems can be formulated in terms of optimization of suitable objectives on\nconnected subgraphs, a problem which is generally computationally dif\ufb01cult. We\novercome the combinatorics of connectivity by embedding connected subgraphs\ninto linear matrix inequalities (LMI). Computationally ef\ufb01cient tests are then re-\nalized by optimizing convex objective functions subject to these LMI constraints.\nWe prove, by means of a novel Euclidean embedding argument, that our tests are\nminimax optimal for exponential family of distributions on 1-D and 2-D lattices.\nWe show that internal conductance of the connected subgraph family plays a fun-\ndamental role in characterizing detectability.\n\n1\n\nIntroduction\n\nSignals associated with nodes or edges of a graph arise in a number of applications including sensor\nnetwork intrusion, disease outbreak detection and virus detection in communication networks. Many\nproblems in these applications can be framed from the perspective of hypothesis testing between null\nand alternative hypothesis. Observations under null and alternative follow different distributions.\nThe alternative is actually composite and identi\ufb01ed by sub-collections of connected subgraphs.\nTo motivate the setup consider the disease outbreak problem described in [1]. Nodes there are\nassociated with counties and observations associated with each county correspond to reported cases\nof a disease. Under the null distribution, observations at each county are assumed to be poisson\ndistributed and independent across different counties. Under the alternative there are a contiguous\nsub-collection of counties (connected sub-graph) that each experience elevated cases on average\nfrom their normal levels but are otherwise assumed to be independent. The eventual shape of the\nsub-collection of contiguous counties is highly unpredictable due to uncontrollable factors.\nIn this paper we develop a novel approach for signal detection on graphs that is both statistically\neffective and computationally ef\ufb01cient. Our approach is based on optimizing an objective function\nsubject to subgraph connectivity constraints, which is related to generalized likelihood ratio tests\n(GLRT). GLRTs maximize likelihood functions over combinatorially many connected subgraphs,\nwhich is computationally intractable. On the other hand statistically, GLRTs have been shown to be\nasymptotically minimax optimal for exponential class of distributions on Lattice graphs & Trees [2]\nthus motivating our approach.We deal with combinatorial connectivity constraints by obtaining a\nnovel characterization of connected subgraphs in terms of convex Linear Matrix Inequalities (LMIs).\nIn addition we show how our LMI constraints naturally incorporate other features such as shape\nand size. We show that the resulting tests are essentially minimax optimal for exponential family\n\n1\n\n\fof distributions on 1-D and 2-D lattices. Conductance of the subgraph, a parameter in our LMI\nconstraint, plays a central role in characterizing detectability.\nRelated Work: The literature on signal detection on graphs can be organized into parametric and\nnon-parametric methods, which can be further sub-divided into computational and statistical analy-\nsis themes. Parametric methods originated in the scan statistics literature [3] with more recent work\nincluding that of [4, 5, 6, 1, 7, 8] focusing on graphs. Much of this literature develops scanning\nmethods that optimize over rectangles, circles or neighborhood balls [5, 6] across different regions\nof the graphs. However, the drawbacks of simple shapes and the need for non-parametric methods\nto improve detection power is well recognized. This has led to new approaches such as simulated\nannealing [5, 4] but is lacking in statistical analysis. More recent work in ML literature [9] describes\nsemi-de\ufb01nite programming algorithm for non-parametric shape detection, which is similar to our\nwork here. However, unlike us their method requires a heuristic rounding step, which does not lend\nitself to statistical analysis. In this context a number of recent papers have focused on statistical\nanalysis [10, 2, 11, 12] with non-parametric shapes. They derive fundamental bounds for signal\ndetection for the elevated means testing problem in the Gaussian setting on special graphs such as\ntrees and lattices. In this setting under the null hypothesis the observations are assumed to be inde-\npendent identically distributed (IID) with standard normal random variables. Under the alternative\nthe Gaussian random variables are assumed to be standard normal except on some connected sub-\ngraph where the mean \u03bc is elevated. They show that GLRT achieves \u201cnear\u201d-minimax optimality\nin a number of interesting scenarios. While this work is interesting the suggested algorithms are\ncomputationally intractable. To the best of our knowledge only [13, 14] explores a computationally\ntractable approach and also provides statistical guarantees. Nevertheless, this line of work does not\nexplicitly deal with connected subgraphs (complex shapes) but deals with more general clusters.\nThese are graph partitions with small out-degree. Although this appears to be a natural relaxation of\nconnected subgraphs/complex-shapes it turns out to be quite loose1 and leads to substantial gap in\nstatistical effectiveness for our problem. In contrast we develop a new method for signal detection\nof complex shapes that is not only statistically effective but also computationally ef\ufb01cient.\n\n2 Problem Formulation\nLet G = (V, E) denote an undirected unweighted graph with |V | = n nodes and |E| = m edges.\nAssociated with each node, v \u2208 V , are observations xv \u2208 R\np. We assume observations are dis-\ntributed P0 under the null hypothesis. The alternative is composite and the observed distribution,\nPS, is parameterized by S \u2286 V belonging to a class of subsets \u039b \u2286 S, where S is the superset.\nWe denote by SK \u2286 S the collection of size-K subsets. ES = {(u, v) \u2208 E : u \u2208 S, v \u2208 S} de-\nnotes the induced edge set on S. We let xS denote the collection of random variables on the subset\nS \u2286 V . Sc denotes nodes V \u2212 S. Our goal is to design a decision rule, \u03c0, that maps observations\nxn = (xv)v\u2208V to {0, 1} with zero denoting null hypothesis and one denoting the alternative. We\nformulate risk following the lines of [12] and combine Type I and Type II errors:\n\nR(\u03c0) = P0 (\u03c0(xn) = 1) + max\nS\u2208\u039b\n\nPS (\u03c0(xn) = 0)\n\n(1)\n\nDe\ufb01nition 1 (\u03b4-Separable). We say that the composite hypothesis problem is \u03b4-separable if there\nexists a test \u03c0 such that, R(\u03c0) \u2264 \u03b4.\nWe next describe asymptotic notions of detectability and separability. These notions requires us to\nconsider large-graph limits. To this end we index a sequence of graphs Gn = (Vn, En) with n \u2192 \u221e\nand an associated sequence of tests \u03c0n.\nDe\ufb01nition 2 (Separability). We say that the composite hypothesis problem is asymptotically \u03b4-\nseparable if there is some sequence of tests, \u03c0n, such that R(\u03c0n) \u2264 \u03b4 for suf\ufb01ciently large n. It is\nsaid to be asymptotically separable if R(\u03c0n) \u2212\u2192 0. The composite hypothesis problem is said to be\nasymptotically inseparable if no such test exists.\n\nSometimes, additional granular measures of performance are often useful to determine asymptotic\nbehavior of Type I and Type II error. This motivates the following de\ufb01nition:\n\n1A connected subgraph on a 2-D lattice of size K has out-degree at least \u03a9(\n\n\u221a\n\nout-degree \u03a9(\nconstraints can be no better than those for arbitrary K-sets.\n\nK) includes disjoint union of \u03a9(\n\n\u221a\n\n\u221a\n\nK) while set of subgraphs with\nK/4) nodes. So statistical requirements with out-degree\n\n2\n\n\fDe\ufb01nition 3 (\u03b4-Detectability). We say that the composite hypothesis testing problem is \u03b4-detectable\nif there is a sequence of tests, \u03c0n, such that,\n\nPS(\u03c0n(xn) = 0)\n\nsup\nS\u2208\u039b\n\nn\u2192\u221e\u2212\u2192 0,\n\nlim sup\n\nn\n\nP0(\u03c0n(xn) = 1) \u2264 \u03b4\n\nIn general \u03b4-detectability does not imply separability. For instance, consider x H0\u223c N (0, \u03c32) and\nx H1\u223c N (\u03bc, \u03c32\n\nn ). It is \u03b4-detectable for \u03bc\n\n\u03b4 but not separable.\n\n\u03c3 \u2265 2\n\nlog 1\n\n(cid:2)\n\nGeneralized Likelihood Ratio Test (GLRT)\nis often used as a statistical test for composite hy-\npothesis testing. Suppose \u03c60(xn) and \u03c6S(xn) are probability density functions associated with P0\nand PS respectively. The GLRT test thresholds the \u201cbest-case\u201d likelihood ratio, namely,\n\nH1\n><\nH0\n\n\u03b7,\n\nGLRT:\n\n(cid:6)S(xn)\n\n(cid:6)S(x) = log\n\n(cid:6)max(xn) = max\nS\u2208\u039b\n\n(2)\nLocal Behavior: Without additional structure, the likelihood ratio, (cid:6)S(x) for a \ufb01xed S \u2208 \u039b is a\nfunction of observations across all nodes. Many applications exhibit local behavior, namely, the\nobservations under the two hypothesis behave distinctly only on some small subset of nodes (as\nin disease outbreaks). This justi\ufb01es introducing local statistical models in the following section.\nCombinatorial: The class \u039b is combinatorial such as collections of connected subgraphs and GLRT\nis not generally computationally tractable. On the other hand GLRT is minimax optimal for special\nclasses of distributions and graphs and motivates development of tractable algorithms.\n\n\u03c6S(xn)\n\u03c60(xn)\n\n2.1 Statistical Models & Subgraph Classes\n\nThe foregoing discussion motivates introducing local models, which we present next. Then informed\nby existing results on separability we categorize subgraph classes by shape, size and connectivity.\n\n2.1.1 Local Statistical Models\n\nSignal in Noise Models arise in sensor network (SNET) intrusion [7, 15] and disease outbreak de-\ntection [1]. They are modeled with Gaussian (SNET) and Poisson (disease outbreak) distributions.\n\nH0 : xv = wv; H1 : xv = \u03bc\u03b1uv1S(v) + wv, for some, S \u2208 \u039b, u \u2208 S\n\n(3)\nFor Gaussian case we model \u03bc as a constant, wv as IID standard normal variables, \u03b1uv as the\npropagation loss from source node u \u2208 S to the node v. In disease outbreak detection \u03bc = 1,\n\u03b1uv \u223c P ois(\u03bbNv) and wv \u223c P ois(Nv) are independent Poisson random variables, and Nv is\nthe population of county v.\nIn these cases (cid:6)S(x) takes the following local form where Zv is a\nnormalizing constant.\n\n(cid:6)S(x) = (cid:6)S(xS) \u221d\n\n(\u03a8v(xv) \u2212 log(Zv))1S(v)\n\n(4)\n\n(cid:3)\n\nv\u2208V\n\n(cid:4)\n\n\u03bc0 = inf{\u03bc \u2208 R+ | \u2203\u03c0n, lim\n\nWe characterize \u03bc0, \u03bb0 as the minimum value that ensures separability for the different models:\nn\u2192\u221e R(\u03c0n) = 0}\n\n(5)\nCorrelated Models arise in textured object detection [16] and protein subnetwork detection [17]. For\ninstance consider a common random signal z on S, which results in uniform correlation \u03c1 > 0 on\nS.\n\nn\u2192\u221e R(\u03c0n) = 0}, \u03bb0 = inf{\u03bb \u2208 R+ | \u2203\u03c0n, lim\n\n\u03c1(1 \u2212 \u03c1)\u22121)z1S(v) + wv, for some, S \u2208 \u039b,\n\nH0 : xv = wv; H1 : xv = (\n\n(6)\nz, wv are standard IID normal random variables. Again we obtain (cid:6)S(x) = (cid:6)S(xS). These examples\nmotivate the following general setup for local behavior:\nDe\ufb01nition 4. The distributions P0 and PS are said to exhibit local structure if they satisfy:\n(1) Markovianity: The null distribution P0 satis\ufb01es the properties of a Markov Random Field (M-\nRF). Under the distribution PS the observations xS are conditionally independent of xSc\n1 when con-\nditioned on annulus S1 \u2229 Sc, where S1 = {v \u2208 V | d(v, w) \u2264 1, w \u2208 S}, is the 1-neighborhood of\nS. (2) Mask: Marginal distributions of observations under P0 and PS on nodes in Sc are identical:\nP0(xSc \u2208 A) = PS(xSc \u2208 A), \u2200 A \u2208 A, the \u03c3-algebra of measurable sets.\nLemma 1 ([7]). Under conditions (1) and (2) it follows that (cid:6)S(x) = (cid:6)S(xS1 ).\n\n3\n\n\f2.1.2 Structured Subgraphs\n\nExisting works [10, 2, 12] point to the important role of size, shape and connectivity in determining\ndetectability. For concreteness we consider the signal in noise model for Gaussian distribution and\ntabulate upper bounds from existing results for \u03bc0 (Eq. 5). The lower bounds are messier and differ\nby logarithmic factors but this suf\ufb01ces for our discussion here. The table reveals several important\npoints. Larger sets are easier to detect \u2013 \u03bc0 decreases with size; connected K-sets are easier to\ndetect relative to arbitrary K-sets; for 2-D lattices \u201cthick\u201d connected shapes are easier to detect than\n\u201cthin\u201d sets (paths); \ufb01nally detectability on complete graphs is equivalent to arbitrary K-sets, i.e.,\nshape does not matter. Intuitively, these tradeoffs make sense. For a constant \u03bc, \u201csignal-to-noise\u201d\nratio increases with size. Combinatorially, there are fewer K-connected sets than arbitrary K-sets;\nfewer connected balls than connected paths; and fewer connected sets in 2-D lattices than dense\ngraphs. These results point to the need for characterizing the signal detection problem in terms of\n\n2 log(n)\n\n(cid:6)\n(cid:5)(cid:4)\nArbitrary K-Set K-Connected Ball K-Connected Path\n(cid:5)(cid:4)\n\u03c9\n(cid:5)(cid:4)\n\nK log(n)\n2\n(cid:6)\n\u03c9 (1)\n\n(cid:5)(cid:2)\n(cid:5)(cid:2)\n(cid:5)(cid:4)\n\n(cid:6)\n(cid:6)\nK log(n)\n2\n(cid:6)\nK log(n)\n2\n2 log(n)\n\n2 log(n)\n\n\u03c9\n\n\u03c9\n\n\u03c9\n\n2 log(n)\n\n\u03c9\n\n2 log(n)\n\n(cid:5)(cid:2)\n\n\u03c9\n\n(cid:5)(cid:4)\n\n(cid:6)\n(cid:6)\n(cid:6)\n\n\u03c9\n\n\u03c9\n\nLine Graph\n\n2-D Lattice\nComplete\n\nconnectivity, size, shape and the properties of the ambient graph. We also observe that the table is\nsomewhat incomplete. While balls can be viewed as thick shapes and paths as thin shapes, there are\na plethora of intermediate shapes. A similar issue arises for sparse vs. dense graphs. We introduce\ngeneral de\ufb01nitions to categorize shape and graph structures below.\nDe\ufb01nition 5 (Internal Conductance). (a.k.a. Cut Ratio) Let H = (S, FS) denote a subgraph of\nG = (V, E) where S \u2286 V , FS \u2286 ES, written as H \u2286 G. De\ufb01ne the internal conductance of H as:\n\n\u03c6(H) = min\nA\u2282S\n\nmin{|A|,|S \u2212 A|} ; \u03b4S(A) = {(u, v) \u2208 FS | u \u2208 A, v \u2208 S \u2212 A}\n\n(7)\n\n|\u03b4S(A)|\n\nApparently \u03c6(H) = 0 if H is not connected. The internal conductance of a collection of subgraphs,\n\u03a3, is de\ufb01ned as the smallest internal conductance:\n\n\u03c6(\u03a3) = min\nH\u2208\u03a3\n\n\u03c6(H)\n\nFor future reference we denote the collection of connected subgraphs by C and by Ca,\u03a6 the sub-\ncollections containing node a \u2208 V with minimal internal conductance \u03a6:\n\nC = {H \u2286 G : \u03c6(H) > 0}, Ca,\u03a6 = {H = (S, FS) \u2286 G : a \u2208 S, \u03c6(H) \u2265 \u03a6}\n\n(8)\n\nIn 2-D lattices, for example, \u03c6(BK) \u2248 \u03a9(1/\nK) for connected K-balls BK or other thick shapes of\nsize K. \u03c6(C\u2229SK) \u2248 \u03a9(1/K) due to \u201csnake\u201d-like thin shapes. Thus internal conductance explicitly\naccounts for shape of the sets.\n\n\u221a\n\n3 Convex Programming\n\nWe develop a convex optimization framework for generating test statistics for local statistical mod-\nels described in Section 2.1. Our approach relaxes the combinatorial constraints and the functional\nobjectives of the GLRT problem of Eq.(2). In the following section we develop a new characteriza-\ntion based on linear matrix inequalities that accounts for size, shape and connectivity of subgraphs.\nFor future reference we denote A \u25e6 B \u0394= [AijBij]i,j.\nOur \ufb01rst step is to embed subgraphs, H of G, into matrices. A binary symmetric incidence matrix,\nA, is associated with an undirected graph G = (V, E), and encodes edge relationships. Formally, the\nedge set E is the support of A, namely, E = Supp(A). For subgraph correspondences we consider\nsymmetric matrices, M, with components taking values in the unit interval, [0, 1].\n\nM = {M \u2208 [0, 1]n\u00d7n | Muv \u2264 Muu, M Symmetric}\n\n4\n\n\fDe\ufb01nition 6. M \u2208 M is said to correspond to a subgraph H = (S, FS), written as H (cid:2) M, if\n\nS = Supp{Diag(M )}, FS = Supp(A \u25e6 M )\n\nThe role of M \u2208 M is to ensure that if u (cid:16)\u2208 S we want the corresponding edges Muv = 0. Note\nthat A \u25e6 M in Defn. 6 removes the spurious edges Muv (cid:16)= 0 for (u, v) /\u2208 ES.\nOur second step is to characterize connected subgraphs as convex subsets of M. Now a subgraph\nH = (S, FS) is a connected subgraph if for every u, v \u2208 S, there is a path consisting only of edges\nin FS going from u to v. This implies that for two subgraphs H1, H2 and corresponding matrices\nM1 and M2, their convex combination M\u03b7 = \u03b7M1 + (1 \u2212 \u03b7)M2, \u03b7 \u2208 (0, 1) naturally corresponds\nto H = H1 \u222a H2 in the sense of Defn 6. On the other hand if H1 \u2229 H2 = \u2205 then H is disconnected\nand so M\u03b7 is as well. This motivates our convex characterization with a common \u201canchor\u201d node. To\nthis end we consider the following collection of matrices:\n\nM\u2217\n\na = {M \u2208 M | Maa = 1, Mvv \u2264 Mav}\n\nCLM I (a, \u03b3) \u0394= {H (cid:2) M | M \u2208 M\u2217\n\nNote that M\u2217\na includes star graphs induced on subsets S = Supp(Diag(M )) with anchor node a.\nWe now make use of the well known properties [18] of the Laplacian of a graph to characterize\nconnectivity. The unnormalized Laplacian matrix of an undirected graph G with incidence matrix\nA is described by L(A) = diag(A1n) \u2212 A where 1n is the all-one vector.\nLemma 2. Graph G is connected if and only if the number of zero eigenvalues of L(A) is one.\nUnfortunately, we cannot directly use this fact on the subgraph A \u25e6 M because there are many zero\neigenvalues because the complement of Supp(Diag(M )) is by de\ufb01nition zero. We employ linear\nmatrix inequalities (LMI) to deal with this issue. The condition [19] F (x) = F0 + F1x1 + \u00b7\u00b7\u00b7 +\nFpxp (cid:19) 0 with symmetric matrices Fj is called a linear matrix inequality in xj \u2208 R with respect to\nthe positive semi-de\ufb01nite cone represented by (cid:19). Note that the Laplacian of the subgraph L(A\u25e6 M )\nis a linear matrix function of M. We denote a collection of subgraphs as follows:\na, L(A \u25e6 M ) \u2212 \u03b3L(M ) (cid:19) 0}\n\n(9)\nTheorem 3. The class CLM I (a, \u03b3) is connected for \u03b3 > 0. Furthermore, every connected subgraph\ncan be characterized in this way for some a \u2208 V and \u03b3 > 0, namely, C =\nCLM I (a, \u03b3).\nProof Sketch. M \u2208 CLM I (a, \u03b3) implies M is connected. By de\ufb01nition of Ma there must be a star\ngraph that is a subgraph on Supp(Diag(M )). This means that L(M ) (hence L(A \u25e6 M )) can only\nhave one zero eigenvalue on Supp(Diag(M )). We can now invoke Lemma 2 on Supp(Diag(M )).\nThe other direction is based on hyperplane separation of convex sets. Note that Ca,\u03b3 is convex but\nC is not. This necessitates the need for an anchor. In practice this means that we have to search for\nconnected sets with different anchors. This is similar to scan statistics the difference being that we\ncan now optimize over arbitrary shapes. We next get a handle on \u03b3.\n\u03b3 encodes Shape: We will relate \u03b3 to the internal conductance of the class C. This provides us with\na tool to choose \u03b3 to re\ufb02ect the type of connected sets that we expect for our alternative hypothesis.\nIn particular thick sets correspond to relatively large \u03b3 and thin sets to small \u03b3. In general for graphs\nof \ufb01xed size the minimum internal conductance over all connected shapes is strictly positive and we\ncan set \u03b3 to be this value if we do not a priori know the shape.\nTheorem 4. In a 2-D lattice, it follows that Ca,\u03a6 \u2286 CLM I (a, \u03b3), where \u03b3 = \u0398(\nLMI-Test: We are now ready to present our test statistics. We replace indicator variables with the\ncorresponding matrix components in Eq. 4, i.e., 1S(v) \u2192 Mvv, 1S(u)1S(v) \u2192 Muv and obtain:\n\nlog(1/\u03a6) ).\n\na\u2208V,\u03b3>0\n\n(cid:7)\n\n\u03a62\n\nElevated Mean:\n\nCorrelated Gaussian:\n\n(cid:6)M (x) =\n\n(cid:6)M (x) \u221d (cid:8)\n\n(u,v)\u2208E\n\n(\u03a8v(xv) \u2212 log(Zv))Mvv\n\n(cid:8)\n\u03a8(xu, xv)Muv \u2212 (cid:8)\nv\u2208V\n\nMvv log(1 \u2212 \u03c1)\n\nv\n\nLMITa,\u03b3\n\n(cid:6)a,\u03b3(x) =\n\nmax\n\nM\u2208CLM I (a,\u03b3)\n\n(cid:6)M (x)\n\nH1\n><\nH0\n\n\u03b7\n\n(10)\n\n(11)\n\nThis test explicitly makes use of the fact that alternative hypothesis is anchored at a and the internal\nconductance parameter \u03b3 is known. We will re\ufb01ne this test to deal with the completely agnostic case\nin the following section.\n\n5\n\n\f4 Analysis\n\n(cid:10)\n\n1\n\n(cid:9)\n\nIn this section we analyze LMITa,\u03b3 and the agnostic LMI tests for the Elevated Mean problem\nfor exponential family of distributions on 2-D lattices. For concreteness we focus on Gaussian &\nPoisson models and derive lower and upper bounds for \u03bc0 (see Eq. 5). Our main result states that\n, where \u03a6 is the internal conductance of the family Ca,\u03a6 of\nto guarantee separability, \u03bc0 \u2248 \u03a9\nconnected subgraphs, K is the size of the subgraphs in the family, and a is some node that is common\nto all the subgraphs. The reason for our focus on homogenous Gaussian/Poisson setting is that we\ncan extend current lower bounds in the literature to our more general setting and demonstrate that\nthey match the bounds obtained from our LMIT analysis. We comment on how our LMIT analysis\nextends to other general structures and models later.\nThe proof for LMIT analysis involves two steps (see Supplementary):\n\nK\u03a6\n\n1. Lower Bound: Under H1 we show that the ground truth is a feasible solution. This allows\n\nus to lower bound the objective value, (cid:6)a,\u03b3(x), of Eq. 11.\n\n2. Upper Bound: Under H0 we consider the dual problem. By weak duality it follows that\nany feasible solution of the dual is an upper bound for (cid:6)a,\u03b3(x). A dual feasible solution is\nthen constructed through a novel Euclidean embedding argument.\n\nWe then compare the upper and lower bounds to obtain the critical value \u03bc0.\nWe analyze both non-agnostic and agnostic LMI tests for the homogenous version of Gaussian and\nPoisson models of Eq. 3 for both \ufb01nite and asymptotic 2-D lattice graphs. For the \ufb01nite case the\nfamily of subgraphs in Eq. 3 is assumed to belong to the connected family of sets, Ca,\u03a6 \u2229 SK,\ncontaining a \ufb01xed common node a \u2208 V of size K. For the asymptotic case we let the size of the\ngraph approach in\ufb01nity (n \u2192 \u221e). For this case we consider a sequence of connected family of sets\nCn\n\u2229 SKn on graph Gn = (Vn, En) with some \ufb01xed anchor node a \u2208 Vn. We will then describe\na.\u03a6n\nresults for agnostic LMI tests, i.e., lacking knowledge of conductance \u03a6 and anchor node a.\nPoisson Model: In Eq. 3 we let the population Nv to be identically equal to one across counties.\nWe present LMI tests that are agnostic to shape and anchor nodes:\n\nLMITA :\n\n(cid:6)(x) =\n\nmax\na\u2208V,\u03b3\u2265\u03a62\n\nmin\n\n\u221a\n\n\u03b3(cid:6)a,\u03b3(x)\n\nH0\n><\nH1\n\n0\n\n(12)\n\nwhere \u03a6min denotes the minimum possible conductance of a connected subgraph with size K,\nwhich is 2/K.\nTheorem 5. The LMITa,\u03b3 test achieves \u03b4-separability for \u03bb = \u03a9( log(K)\nLMITA for \u03bb = \u03a9(log K\n\nK\u03a6 ) and the agnostic test\n\nlog n).\n\n\u221a\n\nNext we consider the asymptotic case and characterize tight bounds for separability.\nTheorem 6. The two hypothesis H0 and H1 are asymptotically inseparable if \u03bbn\u03a6nKn log(Kn) \u2192\n0. It is asymptotically separable with LMITa,\u03b3 for \u03bbnKn\u03a6n/ log(Kn) \u2192 \u221e. The agnostic LMITA\nachieves asymptotic separability with \u03bbn/(log(Kn)\n\nlog n) \u2192 \u221e.\n\n\u221a\n\nGaussian Model: We next consider agnostic tests for Gaussian model of Eq. 3 with no propagation\nloss, i.e., \u03b1uv = 1.\nTheorem 7. The two hypotheses H0 and H1 for the Gaussian model are asymptotically insepara-\nble if \u03bcn\u03a6nKn log(Kn) \u2192 0, are separable with LMITa,\u03b3 if \u03bcnKn\u03a6n/ log(Kn) \u2192 \u221e, and are\nseparable with LMITA if \u03bcn/(log(Kn)\n\nlog n) \u2192 \u221e\n\n\u221a\n\nOur inseparability bound matches existing results on 2-D Lattice & Line Graphs by plugging in\nappropriate values for \u03a6 for the cases considered in [2, 12]. The lower bound is obtained by spe-\ncializing to a collection of \u201cnon-decreasing band\u201d subgraphs.Yet LMITa,\u03b3 and LMITA is able to\nachieves the lower bound within a logarithmic factor. Furthermore, our analysis extends beyond\nPoisson & Gaussian models and applies to general graph structures and models. The main reason\nis that our LMIT analysis is fairly general and provides an observation-dependent bound through\nconvex duality. We brie\ufb02y describe it here. Consider functions (cid:6)S(x) that are positive, separable\n\n6\n\n\f16\n\n14\n\n12\n\n10\n\n8\n\n6\n\n4\n\n2\n\n0\n\n0\n\n16\n\n14\n\n12\n\n10\n\n8\n\n6\n\n4\n\n2\n\n0\n\n0\n\n2\n\n4\n\n6\n\n8\n\n10\n\n16\n\n14\n\n12\n\n10\n\n8\n\n6\n\n4\n\n2\n\n0\n\n0\n\n16\n\n14\n\n12\n\n10\n\n8\n\n6\n\n4\n\n2\n\n0\n\n0\n\n2\n\n4\n\n6\n\n8\n\n10\n\n2\n\n4\n\n6\n\n8\n\n10\n\n2\n\n4\n\n6\n\n8\n\n10\n\n(a) Thick shape\n\n(b) Thin shape\n\n(c) Snake shape\n\n(d) Thin shape(8-neighbors)\nFigure 1: Various shapes of ground-truth anomalous clusters on a \ufb01xed 15\u00d710 lattice. Anomalous cluster size\nis \ufb01xed at 17 nodes. (a) shows a thick cluster with a large internal conductance. (b) shows a relatively thinner\nshape. (c) shows a snake-like shape which has the smallest internal conductance. (d) shows the same shape of\n(b), with the background lattice more densely connected.\nand bounded for simplicity. By establishing primal feasibility that the subgraph S \u2208 CLM I (a, \u03b3) for\n(cid:10)\na suitably chosen \u03b3, we can obtain a lower bound for the alternative hypothesis H1 and show that\n(cid:10) \u2264 EH0\nEH1\n. On the other hand for the null hypothesis\nwe can show that, EH0\n. Here EH1\n\u221a\nand EH0 denote expectations with respect to alternative and null hypothesis and B(a, \u0398(\n\u03b3)) is a\nball-like thick shape centered at a \u2208 V with radius \u0398(\n\u03b3). Our result then follows by invoking\nstandard concentration inequalities. We can extend our analysis to the non-separable case such as\ncorrelated models because of the linear objective form in Eq. 10.\n\nmaxM\u2208CLM I (a,\u03b3) (cid:6)M (x)\n\nmaxM\u2208CLM I (a,\u03b3) (cid:6)M (x)\n\n(cid:10) \u2265 EH1\n\nv\u2208B(a,\u0398(\n\n\u221a\n\n\u03b3))\n\n(cid:6)S(xv)\n\nv\u2208S (cid:6)S(xv)\n\n(cid:6)\n\n\u221a\n\n(cid:5)(cid:8)\n\n(cid:9)\n\n(cid:9)(cid:8)\n\n(cid:9)\n\n5 Experiments\n\nWe present several experiments to highlight key properties of LMIT and to compare LMIT against\nother state-of-art parametric and non-parametric tests on synthetic and real-world data. We have\nshown that agnostic LMIT is near minimax optimal in terms of asymptotic separability. However,\nseparability is an asymptotic notion and only characterizes the special case of zero false alarms (FA)\nand missed detections (MD), which is often impractical. It is unclear how LMIT behaves with \ufb01nite\nsize graphs when FAs and MDs are prevalent. In this context incorporating priors could indeed be\nimportant. Our goal is to highlight how shape prior (in terms of thick, thin, or arbitrary shapes)\ncan be incorporated in LMIT using the parameter \u03b3 to obtain better AUC performance in \ufb01nite size\ngraphs. Another goal is to demonstrate how LMIT behaves with denser graph structures.\nFrom the practical perspective, our main step is to solve the following SDP problem:\n\n(cid:3)\n\ni\n\nmax\n\nM\n\n:\n\nyiMii\n\ns.t. M \u2208 CLM I (a, \u03b3),\n\ntr(M ) \u2264 K\n\nWe use standard SDP solvers which can scale up to n \u223c 1500 nodes for sparse graphs like lattice\nand n \u223c 300 nodes for dense graphs with m = \u0398(n2) edges.\n(cid:4)|S| = 3, and\nTo understand the impact of shape we consider the test LMITa,\u03b3 for Gaussian model and manually\nvary \u03b3. On a 15\u00d710 lattice we \ufb01x the size (17 nodes) and the signal strength \u03bc\nconsider three different shapes (see Fig. 1) for the alternative hypothesis. For each shape we syn-\nthetically simulate 100 null and 100 alternative hypothesis and plot AUC performance of LMIT as\na function of \u03b3. We observe that the optimum value of AUC for thick shapes is achieved for large \u03b3\nand small \u03b3 for thin shape con\ufb01rming our intuition that \u03b3 is a good surrogate for shape. In addition\nwe notice that thick shapes have superior AUC performance relative to thin shapes, again con\ufb01rming\nintuition of our analysis.\nTo understand the impact of dense graph structures we consider performance of LMIT with neigh-\nborhood size. On the lattice of the previous experiment we vary neighborhood by connecting each\nnode to its 1-hop, 2-hop, and 3-hop neighbors to realize denser structures with each node having 4,\n8 and 12 neighbors respectively. Note that all the different graphs have the same vertex set. This is\nconvenient because we can hold the shape under the alternative \ufb01xed for the different graphs. As\nbefore we generate 100 alternative hypothesis using the thin set of the previous experiment with the\nsame mean \u03bc and 100 nulls. The AUC curves for the different graphs highlight the fact that higher\ndensity leads to degradation in performance as our intuition with complete graphs suggests. We also\n\n7\n\n\f1\n\n0.95\n\n0.9\n\n0.85\n\n0.8\n\n0.75\n\n0.7\n\ne\nc\nn\na\nm\nr\no\nf\nr\ne\np\n \nC\nU\nA\n\n0.65\n\n \n\n10\u00ef3\n\n(cid:1) = 0.05\nAUC=0.899\n\n(cid:1) = 0.2\nAUC=0.952\n\n \n\n(cid:1) = 0.02\nAUC=0.865\n\nThick shape\nThin shape\nSnake shape\n\n10\u00ef2\n\n10\u00ef1\n\nLMIT shape parameter (cid:1)\n\n100\n\n101\n\ne\nc\nn\na\nm\nr\no\nf\nr\ne\np\n \nC\nU\nA\n\n0.9\n\n0.85\n\n0.8\n\n0.75\n\n0.7\n\n0.65\n\n \n\n10\u00ef3\n\n(cid:1) = 0.05\nAUC=0.899\n\n(cid:1) = 0.1\nAUC=0.874\n\n \n\n(cid:1) = 0.2\nAUC=0.855\n\n4\u00efneighbor lattice\n8\u00efneighbor lattice\n12\u00efneighbor lattice\n\n10\u00ef2\n\n10\u00ef1\n\nLMIT shape parameter (cid:1)\n\n100\n\n101\n\n(a) AUC with various shapes\n\n(b) AUC with different graph structures\n\nFigure 2: (a) demonstrates AUC performances with \ufb01xed lattice structure, signal strength \u03bc and size (17\nnodes), but different shapes of ground-truth clusters, as shown in Fig.1. (b) demonstrates AUC performances\nwith \ufb01xed signal strength \u03bc, size (17 nodes) and shape (Fig.1(b)), but different lattice structures.\nsee that as density increases a larger \u03b3 achieves better performance con\ufb01rming our intuition that as\ndensity increases the internal conductance of the shape increases.\nIn this part we compare LMIT against existing state-of-art approaches on a 300-node lattice, a 200-\nnode random geometric graph (RGG), and a real-world county map graph (129 nodes) (see Fig.3,4).\nWe incorporate shape priors by setting \u03b3 (internal conductance) to correspond to thin sets. While\nthis implies some prior knowledge, we note that this is not necessarily the optimal value for \u03b3 and we\nare still agnostic to the actual ground truth shape (see Fig.3,4). For the lattice and RGG we use the\nelevated-mean Gaussian model. Following [1] we adopt an elevated-rate independent Poisson model\nfor the county map graph. Here Ni is the population of county, i. Under null the number of cases at\ncounty i, follows a Poisson distribution with rate Ni\u03bb0 and under the alternative a rate Ni\u03bb1 within\nsome connected subgraph. We assume \u03bb1 > \u03bb0 and apply a weighted version of LMIT of Eq. 12,\nwhich arises on account of differences in population. We compare LMIT against several other tests,\nincluding simulated annealing (SA) [4], rectangle test (Rect), nearest-ball test (NB), and two naive\ntests: maximum test (MaxT) and average test (AvgT). SA is a non-parametric test and works by\nheuristically adding/removing nodes toward a better normalized GLRT objective while maintaining\nconnectivity. Rect and NB are parametric methods with Rect scanning rectangles on lattice and NB\nscanning nearest-neighbor balls around different nodes for more general graphs (RGG and county-\nmap graph). MaxT & AvgT are often used for comparison purposes. MaxT is based on thresholding\nthe maximum observed value while AvgT is based on thresholding the average value.\nWe observe that uniformly MaxT and AvgT perform poorly. This makes sense; It is well known\nthat MaxT works well only for alternative of small size while AvgT works well with relatively large\nsized alternatives [11]. Parametric methods (Rect/NB) performs poorly because the shape of the\nground truth under the alternative cannot be well-approximated by Rectangular or Nearest Neighbor\nBalls. Performance of SA requires more explanation. One issue could be that SA does not explicitly\nincorporate shape and directly searches for the best GLRT solution. We have noticed that this has the\ntendency to amplify the objective value of null hypothesis because SA exhibits poor \u201cregularization\u201d\nover the shape. On the other hand LMIT provides some regularization for thin shape and does not\nadmit arbitrary connected sets.\n\nTable 1: AUC performance of various algorithms on a 300-node lattice, a 200-node RGG, and the county map\ngraph. On all three graphs LMIT signi\ufb01cantly outperforms the other tests consistently for all SNR levels.\n\n(cid:4)|S|/\u03c3)\n\nlattice (\u03bc\n1.5\n2\n0.728\n0.672\n0.581\n0.531\n0.565\n\n0.780\n0.741\n0.637\n0.547\n0.614\n\n3\n\n0.882\n0.827\n0.748\n0.587\n0.705\n\nSNR\n\nLMIT\nSA\n\nRect(NB)\n\nMaxT\nAvgT\n\n(cid:4)|S|/\u03c3)\n\nRGG (\u03bc\n1.5\n2\n0.642\n0.627\n0.584\n0.529\n0.545\n\n0.723\n0.677\n0.632\n0.562\n0.623\n\n3\n\n0.816\n0.756\n0.701\n0.624\n0.690\n\nmap (\u03bb1/\u03bb0)\n\n1.1\n0.606\n0.556\n0.514\n0.525\n0.536\n\n1.3\n0.842\n0.744\n0.686\n0.559\n0.706\n\n1.5\n0.948\n0.854\n0.791\n0.543\n0.747\n\n8\n\n\fReferences\n[1] G. P. Patil and C. Taillie. Geographic and network surveillance via scan statistics for critical\n\narea detection. In Statistical Science, volume 18(4), pages 457\u2013465, 2003.\n\n[2] E. Arias-Castro, E. J. Candes, H. Helgason, and O. Zeitouni. Searching for a trail of evidence\n\nin a maze. In The Annals of Statistics, volume 36(4), pages 1726\u20131757, 2008.\n\n[3] J. Glaz, J. Naus, and S. Wallenstein. Scan Statistics. Springer, New York, 2001.\n[4] L. Duczmal and R. Assuncao. A simulated annealing strategy for the detection of arbitrarily\nshaped spatial clusters. In Computational Statistics and Data Analysis, volume 45, pages 269\u2013\n286, 2004.\n\n[5] M. Kulldorff, L. Huang, L. Pickle, and L. Duczmal. An elliptic spatial scan statistic.\n\nStatistics in Medicine, volume 25, 2006.\n\nIn\n\n[6] C. E. Priebe, J. M. Conroy, D. J. Marchette, and Y. Park. Scan statistics on enron graphs. In\n\nComputational and Mathematical Organization Theory, 2006.\n\n[7] V. Saligrama and M. Zhao. Local anomaly detection. In Arti\ufb01cial Intelligence and Statistics,\n\nvolume 22, 2012.\n\n[8] V. Saligrama and Z. Chen. Video anomaly detection based on local statistical aggregates. 2013\n\nIEEE Conference on Computer Vision and Pattern Recognition, 0:2112\u20132119, 2012.\n\n[9] J. Qian and V. Saligrama. Connected sub-graph detection.\n\nArti\ufb01cial Intelligence and Statistics (AISTATS), 2014.\n\nIn International Conference on\n\n[10] E. Arias-Castro, D. Donoho, and X. Huo. Near-optimal detection of geometric objects by\nfast multiscale methods. In IEEE Transactions on Information Theory, volume 51(7), pages\n2402\u20132425, 2005.\n\n[11] Addario-Berry, N. Broutin, L. Devroye, and G. Lugosi. On combinatorial testing problems. In\n\nThe Annals of Statistics, volume 38(5), pages 3063\u20133092, 2010.\n\n[12] E. Arias-Castro, E. J. Candes, and A. Durand. Detection of an anomalous cluster in a network.\n\nIn The Annals of Statistics, volume 39(1), pages 278\u2013304, 2011.\n\n[13] J. Sharpnack, A. Rinaldo, and A. Singh. Changepoint detection over graphs with the spectral\n\nscan statistic. In International Conference on Arti\ufb01cial Intelligence and Statistics, 2013.\n\n[14] J. Sharpnack, A. Krishnamurthy, and A. Singh. Near-optimal anomaly detection in graphs\n\nusing lovasz extended scan statistic. In Neural Information Processing Systems, 2013.\n\n[15] Erhan Baki Ermis and Venkatesh Saligrama. Distributed detection in sensor networks with\nlimited range multimodal sensors. IEEE Transactions on Signal Processing, 58(2):843\u2013858,\n2010.\n\n[16] G. R. Cross and A. K. Jain. Markov random \ufb01eld texture models. In IEEE Transactions on\n\nPattern Analysis and Machine Intelligence, volume 5, pages 25\u201339, 1983.\n\n[17] M. Bailly-Bechet, C. Borgs, A. Braunstein, J. T. Chayes, A.Dagkessamanskaia, J. Francois,\nand R. Zecchina. Finding undetected protein associations in cell signaling by belief propaga-\ntion. In Proceedings of the National Academy of Sciences (PNAS), volume 108, pages 882\u2013887,\n2011.\n\n[18] F. Chung. Spectral graph theory. American Mathematical Society, 1996.\n[19] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.\n\n9\n\n\f", "award": [], "sourceid": 1402, "authors": [{"given_name": "Jing", "family_name": "Qian", "institution": "Boston University"}, {"given_name": "Venkatesh", "family_name": "Saligrama", "institution": "Boston University"}]}