{"title": "Factorizing Multivariate Function Classes", "book": "Advances in Neural Information Processing Systems", "page_first": 563, "page_last": 569, "abstract": "", "full_text": "Factorizing Multivariate Function Classes \n\nJuan K. Lin* \n\nDepartment of Physics \nUniversity of Chicago \n\nChicago, IL 60637 \n\nAbstract \n\nThe mathematical framework for factorizing equivalence classes of \nmultivariate functions is formulated in this paper. Independent \ncomponent analysis is shown to be a special case of this decompo(cid:173)\nsition. Using only the local geometric structure of a class repre(cid:173)\nsentative, we derive an analytic solution for the factorization. We \ndemonstrate the factorization solution with numerical experiments \nand present a preliminary tie to decorrelation. \n\n1 FORMALISM \n\nIn independent component analysis (ICA), the goal is to find an unknown linear \ncoordinate system where the joint distribution function admits a factorization into \nthe product of one dimensional functions. However, this decomposition is only \nrarely possible. To formalize the notion of multivariate function factorization, we \nbegin by defining an equivalence relation. \nDefinition. We say that two functions f, 9 : IRn -t IR are equivalent if there exists \nA,b and c such that: f(x) = cg(Ax+b), where A is a non-singular matrix and c f. O. \nThus, the equivalence class of a function consists of aU invertible linear transfor(cid:173)\nmations of it. To avoid confusion, equivalence classes will be denoted in upper \ncase, and class representatives in lower case. We now define the product of two \nequivalence classes. Consider representatives b : IRn -t IR, and c : IRm -t IR of cor(cid:173)\nresponding equivalence classes Band C. Let Xl E IRn , x\"2 E IRm , and x = (xl, x\"2). \nFrom the scalar product of the two functions, define the function a : IRn+m -+ IR \nby a(x) = b(xI)c(x\"2). Let the product of Band C be the equivalence class A with \n\n* Current address: E25-201, MIT, Cambridge, MA 02139. Email: jklin@ai.mit.edu \n\n\f564 \n\n1. K Lin \n\nrepresentative a(x). This product is independent of the choice of representatives of \nBand C, and hence is a well defined operation on equivalence classes. We proceed \nto define the notion of an irreducible class. \n\nDefinition. Denote the equivalence class of constants by I. We say that A is \nirreducible if A = BC implies either B = A, C = I, or B = I, C = A. \nFrom the way products of equivalence classes are defined, we know that all equiv(cid:173)\nalence classes of one dimensional functions are irreducible. Our formulation of the \nfactorization of multivariate function classes is now complete. Given a multivariate \nfunction, we seek a factorization of the equivalence class of the given representative \ninto a product of irreducibles. Intuitively, in the context of joint distribution func(cid:173)\ntions, the irreducible classes constitute the underlying sources. This factorization \ngeneralizes independent component analysis to allow for higher dimensional \"vec(cid:173)\ntor\" sources. Consequently, this decomposition is well-defined for all multivariate \nfunction classes. We now present a local geometric approach to accomplishing this \nfactorization. \n\n2 LOCAL GEOMETRIC INFORMATION \n\nGiven that the joint distribution factorizes into a product in the \"source\" coordinate \nsystem, what information can be extracted locally from the joint distribution in a \n\"mixed\" coordinate frame? We assume that the relevant multivariate function is \ntwice differentiable in the re!jion of interest, and denote H f, the Hessian of I, to be \nthe matrix with elements Hij = oiod, where Ok = a~k' \nProposition: H' is block diagonal everywhere, oiojllso = 0 for all points So \nand all i ~ k, j > k, il and only il 1 is separable into a sum I(SI,\"\" sn) = \ng( SI, ... , Sk) + h( Sk+l, ... , sn) for some functions 9 and h. \nProof - Sufficiency: \nGiven l(sl, . .. , sn) = g(SI, . .. , Sk) + h(sk+1, . .. , Sn), \n\n021 _ ~ Oh(Sk+1,\"\" Sn) _ 0 \nOSiOSj - OSi \n\nOSj \n\n-\n\neverywhere for all i ~ k, j > k. \nNecessity: \nFrom H{n = 0, we can decompose 1 into \n\nl(sl, S2,\u00b7\u00b7\u00b7, sn) = 9(SI, ... , sn-t} + h(S2\"'\" sn), \n\nfor some functions 9 and h. Continuing by imposing the constraints H [. = 0 for all \nj > k, we find \n\nJ \n\nl(sl, S2,\"\" sn) = 9(S1, ... , Sk) + h(S2\"'\" sn). \n\nCombining with Htj = 0 for all j > k yields \n\nI(SI, S2,\u00b7\u00b7\u00b7, sn) = 9(SI, ... , Sk) + h(S3, ... , sn). \n\nFinally, inducting on i, from the constraints Ht. = 0 for all i < k and J' > k we \narrive at the desired functional form \n\n, \n\n~ \n\n-\n\nI(SI,S2\"\",Sn) = g(SI,. \",Sk) + h(Sk+l, ... ,Sn). \n\n\fFactorizing Multivariate Function Classes \n\n565 \n\nMore explicitly, a twice-differentiable function satisfies the set of coupled partial \ndifferential equations represented by the block diagonal structure of H if and only \nif it admits the corresponding separation of variables decomposition. By letting \nlog p = f, the additive decomposition of f translates to a product decomposition of \np. The more general decomposition into an arbitrary number of factors is obtained \nby iterative application of the above proposition. The special case of independent \ncomponent analysis corresponds to a strict diagonalization of H. Thus, in the \ncontext of smooth joint distribution functions, pairwise conditional independence is \nnecessary and sufficient for statistical independence. \n\nTo use this information in a transformed \"mixture\" frame, we must understand how \nthe matrix Hlog p transforms. From the relation between the mixture and source \ncoordinate systems given by fl = As, we have 8~; = Aji 8~j' where we use Ein(cid:173)\nstein's convention of summation over repeated indices. From the relation between \nthe joint distributions in the mixture and source frames, Ps(S) = IAIPx (fl), direct \ndifferentiation gives \n\n82 10gps(S) _ A A 82 10gPx(i) \n. \n\nji kl 8x j 8xk \n\nOSi8s1 \n\n-\n\n. \n\n. \n\nh \n\nlJ \n\n. \n\n8s; 8Sj \n\nd H-\n\nan \n\n8x; 8Xj \n\nZ \n\n, \n\n.. = \n\n.. = \n\n8 2 10gp (8) \n\n8 2 10gp (x). \n\nm matnx notatIOn we \n\nL t . H \nave \ne tmg \nlJ \nH = AT if A. In other words, H is a second rank (symmetric) covariant tensor. \nThe joint distribution admits a product decomposition in the source frame if and \nonly if H and hence AT if A has the corresponding block diagonal structure. Thus \nmultivariate function class factorization is solved by joint block diagonalization of \nsymmetric matrices, with constraints on A of the form AjiifjkAkl = 0. \nBecause the Hessian is symmetric, its diagonalization involves only (n choose 2) \nconstraints. Consequently, in the independent component analysis case where the \njoint distribution function admits a factorization into one dimensional functions, \nif the mixing transformation is orthogonal, the independent component coordinate \nsystem will lie along the eigenvector directions of if. Generally however, n(n -\n1) independent constraints corresponding to information from the Hessian at two \npoints are needed to determine the n arbitrary coordinate directions. \n\n3 NUMERICAL EXPERIMENTS \n\nIn the simplest attack on the factorization problem, we solve the constraint equa(cid:173)\ntions from two points simultaneously. The analytic solution is demonstrated in two \ndimensions. Without loss of generality, the mixing matrix A is taken to be of the \nform \n\nA=(~ ~) . \n\nThe constraints from the two points are: ax + b(xy + 1) + cy = 0, and \na'x + b'(xy + 1) + e'y = 0, where Hu = a, H21 = H12 = b and H22 = e at the \nfirst point, and the primed coefficients denote the values at the second point. \n\nSolving the simultaneous quadratic equations, we find \n\nx \n\na'e - ae' \u00b1 v(ale - ae' )2 - 4(a'b - ab' ) (b'e - be') \n\n2(ab' - a'b) \n\n\f566 \n\ny \n\na'e - ae' \u00b1 v(ale - ae')2 - 4(a'b - ab')(b'e - be') \n\n2(bc' - b'e) \n\n1. K. Lin \n\nThe \u00b1 double roots is indicative of the (x, y) ~ (l/y, l/x) symmetry in the equa(cid:173)\ntions, and together only give two distinct orientation solutions. These independent \ncomponent orientation solutions are given by 81 = tan-l(l/x) and 82 = tan-ley). \n\n3.1 Natural Audio Sources \n\nTo demonstrate the analytic factorization solution, we present some proof of concept \nnumerics. Generality is pursued over optimization concerns. First, we perform the \nstandard separation of two linearly mixed natural audio sources. The input dataset \nconsists of 32000 un-ordered datapoints, since no use will be made of the temporal \ninformation. The process for obtaining estimates of the Hessian matrix if is as \nfollows. A histogram of the input distribution was first acquired and smoothed by a \nlow-pass Gaussian mask in spatial-frequency space. The elements of if were then \nobtained via convolution with a discrete approximation of the derivative operator. \nThe width of the Gaussian mask and the support of the derivative operator were \nchosen to reduce sensitivity to low spatial-frequency uncertainty. It should be noted \nthat the analytic factorization solution makes no assumptions about the mixing \ntransformation, consequently, a blind determination of the smoothing length scale \nis not possible because of the multiplicative degree of freedom in each source. \n\nBecause of the need to take the logarithm of p before differentiation, or equivalently \nto divide by p afterwards, we set a threshold and only extracted information from \npoints where the number of counts was greater than threshold. This is justified from \na counting uncertainty perspective, and also from the understanding that regions \nwith vanishing probability measure contain no information. \n\nWith our sample of 32000 datapoints, we considered only the bin-points with a \ncorresponding bin count greater than 30. From the 394 bin locations that satisfied \nthis constraint, the solutions (81 Jh) for all (394 choose 2) = (394\u00b7393/2) pairs of \nthe corresponding factorization equations are plotted in Fig. 1. A histogram ofthese \nsolutions are shown in Fig. 2. The two peaks in the solution histogram correspond \nto orientations that differ from the two actual independent component orientations \nby 0.008 and 0.013 radians. The signal to mixture ratio of the two outputs generated \nfrom the solution are 158 and 49. \n\n3.2 Effect of Noise \n\nBecause the solution is analytic, uncertainty in the sampling just propagates through \nto the solution, giving rise to a finite width in the solution's distribution. We \ninvestigated the effect of noise and counting uncertainty by performing numerics \nstarting from analytic forms for the source distributions. The joint distribution in \nthe source frame was taken to be: \n\nNormalization is irrelevant since a function's decomposition into product form is \npreserved in scalar multiplication. This is also reflected in the equivalence between \nHiogp and Hiog cp for e an arbitrary positive constant. The joint distribution in \nthe mixture frame was obtained from the relation Px(x) = IAI-Ips(S'). To simulate \n\n\fFactorizing Multivariate Function Classes \n\n567 \n\n. . ~. \n\n._ ... -'.or' ~ ... . . \n\nn/4 \n\n82 \n\n.. \u00b7\u00b7\u00b7\u00b7\u00b7 n:I..:-l \n\n\u00b7\u00b7-Tt/4 \n\nre/4 \n\nFigure 1: Scatterplot of the independent component orientation solutions. All \nunordered solution pairs ((h, (J2) are plotted. The solutions are taken in the range \nfrom -7r /2 to 7r /2. \n\n~~~~r--------'r--------'.--------,--------~--------~--------~ \n\n\u00b7\u00b7_\u00b7\u00b7\u00b7\u00b7n/2 \n\n\u00b7_\u00b7\u00b7 .. \u00b7n!4 \n\ne \n\nn/4 \n\nrr/2 \n\nFigure 2: Histogram of the orientation solutions plotted in the previous figure. \nThe range is still taken from -7r /2 to 7r /2, with the histogram wrapped around \nto ease the circular identification. The mixing matrix used was: all = 0.0514, \na21 = 0.779, a12 = 0.930, a22 = -0.579, giving independent component orientations \nat -0.557 and 1.505 radians. Gaussian fit to the centers of the two solution peaks \ngive -0.570 \u00b1 0.066 and 1.513 \u00b1 0.077 radians for the two orientations. \n\nsampling, Px (x) was multiplied with the number of samples M, onto which was \nadded Gaussian distributed noise with amplitude given by the (M Px(X))1/2. This \nreflects the fact that counting uncertainty scales as the square root of the number \nof counts. The result was rounded to the nearest integer, with all negative count \nvalues set to zero. The subsequent processing coincided with that for natural audio \nsources. From the source distribution equation above, the minimum number of \nexpected counts is M, and the maximum is 9M. The results in Figures 3 and 4 \nshow that, as expected, increasing the number of samplings decreases the widths \nof the solution peaks. By fitting Gaussians to the two peaks, we find that the \nuncertainty (peak widths) in the independent component orientations changes from \n0.06 to 0.1 radians as the sampling is decreased from for M = 20 to M = 2. So \neven with few samplings, a relatively accurate determination of the independent \ncomponent coordinate system can be made. \n\n\f568 \n\n1. K Lin \n\n-\n\nIT.!::! \n\n- rr/4 \n\ne \n\nrr/ 4 \n\nrr/ 2 \n\nFigure 3: Histogram of the independent component orientation solutions for four \ndifferent samplings. Solutions were generated from 20000 randomly chosen pairs \nof positions. The curves, from darkest to lightest, correspond to solutions for the \nnoiseless, M = 20,11 and 2 simulations. The noiseless solution histogram curve \nextends to a height of approximately 15000 counts, and is accurate to the width of \nthe bin. The slight scatter is due to discretization noise. Spikes at (} = 0 and -7r /2 \ncorrespond to pairs of positions which contain no information. \n\n. . . . . \n\nrrJ2 \n\n. . . . . . \n\n. . . . . . \n\n-\u00b7~---_--;r--_\"\u00b7_-\"\u00b7+---\"_--r--\u00b7\"\"\u00b7Y-\"\u00b7~'\" +''---%-- .____4- -\n\n~4-\n\ne -\n\n-rr./4 --\n\n-rr./2 p ..... -T ........ .;!;- ,,-\u2022\u2022 ;E-.. ... -T ....... .;!;-\u2022\u2022..\n\n. . . . . \n\n... ;E-.. \" .\u2022. . ;E-. . .. .;1;--\n\n\u2022\u2022\u2022\u2022 .;!;- .. \u2022\u2022 ;E-\n\n. \", ..... -r .. \" .. .;E- . \n\n. \n\nM \n\n-\n-+''.---3;-'' .. - .-3;- ---T ... -Y-\".\".+ - -\n-\n\n-\n.-1'-\" .. '\" .... -:E-. ... -:E-. ..:r \u00b7\u00b7cr\u00b7 = \n\nos \n\n\"\" \n\n= = \n\nFigure 4: The centers and widths of the solution peaks as a function of the minimum \nexpected number of counts M . From the source distribution, the maximum expected \nnumber of counts is 9M. Information was only extracted from regions with more \nthan 2M counts. The actual independent component orientation as determined \nfrom the mixing matrix A are shown by the two dashed lines. The solutions are \nvery accurate even for small samplings. \n\n4 RELATION TO DECORRELATION \n\nIdeally, if a mixed tensor (transforms as J = A-I j A) with the full degrees of \nfreedom can be found which is diagonal if and only if the joint distribution appears \nin product form, then the independent component coordinate directions will coincide \nwith that of the tensor's eigenvectors. However, the preceding analysis shows that \na maximum of n(n -1)/2 constraints contain all the information that exists locally. \nThis, however, provides a nice connection with decorrelation. \nStarting with the characteristic function of log p(i) , \u00a2(k) = J eik.;E logp(X) di, \nthe off diagonal terms of Hlogp are given by \n\nwhich can loosely be seen as the second order cross-moments in \u00a2(k). Thus di-\n\n\fFactorizing Multivariate Function Classes \n\n569 \n\nagonalization of Hiog P roughly translates into decor relation in \u00a2(k). It should be \nnoted that \u00a2(k) is not a proper distribution function. In fact, it is a complex valued \nfunction with \u00a2ek) = \u00a2* (-k). Consequently, the summation in the above equation \nis not an expectation value, and needs to be interpreted as a superposition of plane \nwaves with specified wavelengths, amplitudes and phases. \n\n5 DISCUSSION \n\nThe introduced functional decomposition defines a generalization of independent \ncomponent analysis which is valid for all multivariate functions. A rigorous no(cid:173)\ntion of the decomposition of a multivariate function into a set of lower dimensional \nfactors is presented. With only the assumption of local twice differentiability, we \nderive an analytic solution for this factorization [1]. A new algorithm is presented, \nwhich in contrast to iterative non-local parametric density estimation ICA algo(cid:173)\nrithms [2, 3, 4], performs the decomposition analytically using local geometric in(cid:173)\nformation. The analytic nature of this approach allows for a proper treatment of \nsource separation in the presence of uncertainty, while the local nature allows for a \nlocal determination of the source coordinate system. This leaves open the possibil(cid:173)\nity of describing a position dependent independent component coordinate system \nwith local linear coordinates patches. \n\nThe presented class factorization formalism removes the decomposition assump(cid:173)\ntions needed for independent component analysis, and reinforces the well known \nfact that sources are recoverable only up to linear transformation. By modifying \nthe equivalence class relation, a rich underlying algebraic structure with both mul(cid:173)\ntiplication and addition can be constructed. Also, it is clear that the matrix of \nsecond derivatives reveals an even more general combinatorial undirected graphical \nstructure of the multivariate function. These topics, as well as uniqueness issues of \nthe factorization will be addressed elsewhere [5]. \n\nThe author is grateful to Jack Cowan, David Grier and Robert Wald for many \ninvaluable discussions. \n\nReferences \n\n[1] J. K. Lin, Local Independent Component Analysis, Ph. D. thesis, University of \n\nChicago, 1997. \n\n[2] A. J. Bell and T. J . Sejnowski, Neural Computation 7, 1129 (1995). \n\n[3] S. Amari, A. Cichocki, and H. Yang, in Advances in Neural and Information \nProcessing Systems, 8, edited by D. S. Touretzky, M. C. Mozer, and M. E. \nHasselmo (MIT Press, Cambridge, MA, 1996), pp. 757-763. \n\n[4] B. A. Pearlmutter and L. Parra, in Advances in Neural and Information Pro(cid:173)\n\ncessing Systems, 9, edited by M. C. Mozer, M. I. Jordan, and T. Petsche (MIT \nPress, Cambridge, MA, 1997), pp. 613-619. \n\n[5] J. K. Lin, Graphical Structure of Multivariate Functions, in preparation. \n\n\f", "award": [], "sourceid": 1446, "authors": [{"given_name": "Juan", "family_name": "Lin", "institution": null}]}