{"title": "Sparse Representation for Signal Classification", "book": "Advances in Neural Information Processing Systems", "page_first": 609, "page_last": 616, "abstract": null, "full_text": "Sparse Representation for Signal Classi\ufb01cation\n\nKe Huang and Selin Aviyente\n\nDepartment of Electrical and Computer Engineering\nMichigan State University, East Lansing, MI 48824\n\n{kehuang, aviyente}@egr.msu.edu\n\nAbstract\n\nIn this paper, application of sparse representation (factorization) of signals over\nan overcomplete basis (dictionary) for signal classi\ufb01cation is discussed. Search-\ning for the sparse representation of a signal over an overcomplete dictionary is\nachieved by optimizing an objective function that includes two terms: one that\nmeasures the signal reconstruction error and another that measures the sparsity.\nThis objective function works well in applications where signals need to be recon-\nstructed, like coding and denoising. On the other hand, discriminative methods,\nsuch as linear discriminative analysis (LDA), are better suited for classi\ufb01cation\ntasks. However, discriminative methods are usually sensitive to corruption in sig-\nnals due to lacking crucial properties for signal reconstruction. In this paper, we\npresent a theoretical framework for signal classi\ufb01cation with sparse representa-\ntion. The approach combines the discrimination power of the discriminative meth-\nods with the reconstruction property and the sparsity of the sparse representation\nthat enables one to deal with signal corruptions: noise, missing data and outliers.\nThe proposed approach is therefore capable of robust classi\ufb01cation with a sparse\nrepresentation of signals. The theoretical results are demonstrated with signal\nclassi\ufb01cation tasks, showing that the proposed approach outperforms the standard\ndiscriminative methods and the standard sparse representation in the case of cor-\nrupted signals.\n\n1\n\nIntroduction\n\nSparse representations of signals have received a great deal of attentions in recent years. The prob-\nlem solved by the sparse representation is to search for the most compact representation of a sig-\nnal in terms of linear combination of atoms in an overcomplete dictionary. Recent developments\nin multi-scale and multi-orientation representation of signals, such as wavelet, ridgelet, curvelet\nand contourlet transforms are an important incentive for the research on the sparse representation.\nCompared to methods based on orthonormal transforms or direct time domain processing, sparse\nrepresentation usually offers better performance with its capacity for ef\ufb01cient signal modelling. Re-\nsearch has focused on three aspects of the sparse representation: pursuit methods for solving the\noptimization problem, such as matching pursuit [1], orthogonal matching pursuit [2], basis pur-\nsuit [3], LARS/homotopy methods [4]; design of the dictionary, such as the K-SVD method [5];\nthe applications of the sparse representation for different tasks, such as signal separation, denoising,\ncoding, image inpainting [6, 7, 8, 9, 10]. For instance, in [6], sparse representation is used for image\nseparation. The overcomplete dictionary is generated by combining multiple standard transforms,\nincluding curvelet transform, ridgelet transform and discrete cosine transform. In [7], application of\nthe sparse representation to blind source separation is discussed and experimental results on EEG\ndata analysis are demonstrated. In [8], a sparse image coding method with the wavelet transform is\npresented. In [9], sparse representation with an adaptive dictionary is shown to have state-of-the-art\nperformance in image denoising. The widely used shrinkage method for image desnoising is shown\nto be the \ufb01rst iteration of basis pursuit that solves the sparse representation problem [10].\n\n\fIn the standard framework of sparse representation, the objective is to reduce the signal reconstruc-\ntion error with as few number of atoms as possible. On the other hand, discriminative analysis meth-\nods, such as LDA, are more suitable for the tasks of classi\ufb01cation. However, discriminative methods\nare usually sensitive to corruption in signals due to lacking crucial properties for signal reconstruc-\ntion. In this paper, we propose the method of sparse representation for signal classi\ufb01cation (SRSC),\nwhich modi\ufb01es the standard sparse representation framework for signal classi\ufb01cation. We \ufb01rst show\nthat replacing the reconstruction error with discrimination power in the objective function of the\nsparse representation is more suitable for the tasks of classi\ufb01cation. When the signal is corrupted,\nthe discriminative methods may fail because little information is contained in discriminative anal-\nysis to successfully deal with noise, missing data and outliers. To address this robustness problem,\nthe proposed approach of SRSC combines discrimination power, signal reconstruction and sparsity\nin the objective function for classi\ufb01cation. With the theoretical framework of SRSC, our objective\nis to achieve a sparse and robust representation of corrupted signals for effective classi\ufb01cation.\n\nThe rest of this paper is organized as follows. Section 2 reviews the problem formulation and solu-\ntion for the standard sparse representation. Section 3 discusses the motivations for proposing SRSC\nby analyzing the reconstructive methods and discriminative methods for signal classi\ufb01cation. The\nformulation and solution of SRSC are presented in Section 4. Experimental results with synthetic\nand real data are shown in Section 5 and Section 6 concludes the paper with a summary of the\nproposed work and discussions.\n\n2 Sparse Representation of Signal\n\ns.t.\n\ny = Ax.\n\n(cid:2)x\n\n(cid:1)(cid:2)0\n\nx = min\nx(cid:1)\n\nThe problem of \ufb01nding the sparse representation of a signal in a given overcomplete dictionary can\nbe formulated as follows. Given a N \u00d7 M matrix A containing the elements of an overcomplete\ndictionary in its columns, with M > N and usually M >> N, and a signal y \u2208 RN , the problem\nof sparse representation is to \ufb01nd an M \u00d7 1 coef\ufb01cient vector x, such that y = Ax and (cid:2)x(cid:2)0 is\nminimized, i.e.,\n(1)\nwhere (cid:2)x(cid:2)0 is the (cid:1)0 norm and is equivalent to the number of non-zero components in the vector x.\nFinding the solution to equation (1) is NP hard due to its nature of combinational optimization.\nSuboptimal solutions to this problem can be found by iterative methods like the matching pursuit\nand orthogonal matching pursuit. An approximate solution is obtained by replacing the (cid:1)0 norm in\nequation (1) with the (cid:1)1 norm, as follows:\n(cid:2)x\n\n(2)\nwhere (cid:2)x(cid:2)1 is the (cid:1)1 norm. In [11], it is proved that if certain conditions on the sparsity is satis\ufb01ed,\ni.e., the solution is sparse enough, the solution of equation (1) is equivalent to the solution of equa-\ntion (2), which can be ef\ufb01ciently solved by basis pursuit using linear programming. A generalized\nversion of equation (2), which allows for certain degree of noise, is to \ufb01nd x such that the following\nobjective function is minimized:\n\nx = min\nx(cid:1)\n\ny = Ax.\n\n(cid:1)(cid:2)1\n\ns.t.\n\nJ1(x; \u03bb) = (cid:2)y \u2212 Ax(cid:2)2\n\n2 + \u03bb(cid:2)x(cid:2)1\n\nwhere the parameter \u03bb >0 is a scalar regularization parameter that balances the tradeoff between\nreconstruction error and sparsity. In [12], a Bayesian approach is proposed for learning the optimal\nvalue for \u03bb. Except for the intuitive interpretation as obtaining a sparse factorization that minimizes\nsignal reconstruction error, the problem formulated in equation (3) has an equivalent interpretation\nin the framework of Bayesian decision as follows [13]. The signal y is assumed to be generated by\nthe following model:\n\ny = Ax + \u03b5\n\n(4)\n\nwhere \u03b5 is white Gaussian noise. Moreover, the prior distribution of x is assumed to be super-\nGaussian:\n\n(cid:1)\n\u2212\u03bb\n\nM(cid:2)\n\ni=1\n\n(cid:3)\n\n|xi|p\n\np(x) \u223c exp\n\n(3)\n\n(5)\n\n\fwhere p \u2208 [0, 1]. This prior has been shown to encourage sparsity in many situations, due to its heavy\ntails and sharp peak. Given this prior, maximum a posteriori (MAP) estimation of x is formulated\nas\n\nx\n\nx\n\n((cid:2)y \u2212 Ax(cid:2)2\n\nx\n\nxM AP = arg max\n\np(x|y) = arg min\n\n2 + \u03bb(cid:2)x(cid:2)p)\n(6)\nwhen p = 0, equation (6) is equivalent to the generalized form of equation (1); when p = 1, equation\n(6) is equivalent to equation (2).\n\n[\u2212 log p(y|x)\u2212 log p(x)] = arg min\n\n3 Reconstruction and Discrimination\n\nSparse representation works well in applications where the original signal y needs to be recon-\nstructed as accurately as possible, such as denoising, image inpainting and coding. However, for\napplications like signal classi\ufb01cation, it is more important that the representation is discriminative\nfor the given signal classes than a small reconstruction error.\n\nThe difference between reconstruction and discrimination has been widely investigated in litera-\nture. It is known that typical reconstructive methods, such as principal component analysis (PCA)\nand independent component analysis (ICA), aim at obtaining a representation that enables suf\ufb01cient\nreconstruction, thus are able to deal with signal corruption, i.e., noise, missing data and outliers.\nOn the other hand, discriminative methods, such as LDA [14], generate a signal representation that\nmaximizes the separation of distributions of signals from different classes. While both methods have\nbroad applications in classi\ufb01cation, the discriminative methods have often outperformed the recon-\nstructive methods for the classi\ufb01cation task [15, 16]. However, this comparison between the two\ntypes of method assumes that the signals being classi\ufb01ed are ideal, i.e., noiseless, complete(without\nmissing data) and without outliers. When this assumption does not hold, the classi\ufb01cation may\nsuffer from the nonrobust nature of the discriminative methods that contains insuf\ufb01cient informa-\ntion to successfully deal with signal corruptions. Speci\ufb01cally, the representation provided by the\ndiscriminative methods for optimal classi\ufb01cation does not necessarily contain suf\ufb01cient informa-\ntion for signal reconstruction, which is necessary for removing noise, recovering missing data and\ndetecting outliers. This performance degradation of discriminative methods on corrupted signals\nis evident in the examples shown in [17]. On the other hand, reconstructive methods have shown\nsuccessful performance in addressing these problems. In [9], the sparse representation is shown to\nachieve state-of-the-art performance in image denoising. In [18], missing pixels in images are suc-\ncessfully recovered by inpainting method based on sparse representation. In [17, 19], PCA method\nwith subsampling effectively detects and excludes outliers for the following LDA analysis.\n\nAll of these examples motivate the design of a new signal representation that combines the advan-\ntages of both reconstructive and discriminative methods to address the problem of robust classi\ufb01ca-\ntion when the obtained signals are corrupted. The proposed method should generate a representation\nthat contain discriminative information for classi\ufb01cation, crucial information for signal reconstruc-\ntion and preferably the representation should be sparse. Due to the evident reconstructive proper-\nties [9, 18], the available ef\ufb01cient pursuit methods and the sparsity of representation, we choose the\nsparse representation as the basic framework for the SRSC and incorporate a measure of discrimina-\ntion power into the objective function. Therefore, the sparse representation obtained by the proposed\nSRSC contains both crucial information for reconstruction and discriminative information for clas-\nsi\ufb01cation, which enable a reasonable classi\ufb01cation performance in the case of corrupted signals. The\nthree objectives: sparsity, reconstruction and discrimination may not always be consistent. There-\nfore, weighting factors are introduced to adjust the tradeoff among these objectives, as the weighting\nfactor \u03bb in equation (3). It should be noted that the aim of SRSC is not to improve the standard dis-\ncriminative methods like LDA in the case of ideal signals, but to achieve comparable classi\ufb01cation\nresults when the signals are corrupted. A recent work [17] that aims at robust classi\ufb01cation shares\nsome common ideas with the proposed SRSC. In [17], PCA with subsampling proposed in [19] is\napplied to detect and exclude outliers in images and the rest of pixels are used for calculating LDA.\n\n\f4 Sparse Representation for Signal Classi\ufb01cation\n\nIn this section, the SRSC problem is formulated mathematically and a pursuit method is proposed\nto optimize the objective function. We \ufb01rst replace the term measuring reconstruction error with a\nterm measuring discrimination power to show the different effects of reconstruction and discrimina-\ntion. Further, we incorporate measure of discrimination power in the framework of standard sparse\nrepresentation to effectively address the problem of classifying corrupted signals. The Fisher\u2019s dis-\ncrimination criterion [14] used in the LDA is applied to quantify the discrimination power. Other\nwell-known discrimination criteria can easily be substituted.\n\n4.1 Problem Formulation\n\nGiven y = Ax as discussed in Section 2, we view x as the feature extracted from signal y for\nclassi\ufb01cation. The extracted feature should be as discriminative as possible between the different\nsignal classes. Suppose that we have a set of K signals in a signal matrix Y = [y1, y2, ..., yK] with\nthe corresponding representation in the overcomplete dictionary as X = [x1, x2, ..., xK], of which\nKi samples are in the class Ci, for 1 \u2264 i \u2264 C. Mean mi and variance s2\ni for class Ci are computed\nin the feature space as follows:\n\nx ,\n\ns2\ni =\n\n(cid:2)x \u2212 mi(cid:2)2\n\n2\n\n(7)\n\n(cid:2)\n\nx\u2208Ci\n\nmi =\n\n1\nKi\n\nThe mean of all samples are de\ufb01ned as: m = 1\nK\ncan then be de\ufb01ned as:\n\n(cid:5)(cid:5)(cid:5)(cid:5) C(cid:4)\n\ni=1\n\nF (X) = SB\nSW\n\n=\n\nKi(mi \u2212 m)(mi \u2212 m)T\n\n(cid:5)(cid:5)(cid:5)(cid:5)2\n\n2\n\nxi. Finally, the Fisher\u2019s discrimination power\n\n.\n\n(8)\n\n(cid:5)(cid:5)(cid:5)(cid:5)2\n\n2\n\ncan be inter-\n\nC(cid:4)\n\ni=1\n\nThe difference between the sample means SB =\n\nKi(mi \u2212 m)(mi \u2212 m)T\n\npreted as the \u2018inter-class distance\u2019 and the sum of variance SW =\n\ns2\ni can be similarly interpreted\n\nas the \u2018inner-class scatter\u2019. Fisher\u2019s criterion is motivated by the intuitive idea that the discrimination\npower is maximized when the spatial distribution of different classes are as far away as possible and\nthe spatial distribution of samples from the same class are as close as possible.\n\nReplacing the reconstruction error with the discrimination power, the objective function that focuses\nonly on classi\ufb01cation can be written as:\n\n(cid:2)\n\nx\u2208Ci\n\n1\nKi\n\nK(cid:4)\n\ni=1\n\ns2\ni\n\nC(cid:4)\n(cid:5)(cid:5)(cid:5)(cid:5) C(cid:4)\n\ni=1\n\ni=1\n\nK(cid:2)\n\ni=1\n\nJ2(X, \u03bb) = F (X) \u2212 \u03bb\n\n(cid:2)xi(cid:2)0\n\n(9)\n\nwhere \u03bb is a positive scalar weighting factor chosen to adjust the tradeoff between discrimination\npower and sparsity. Maximizing J2(X, \u03bb) generates a sparse representation that has a good discrim-\nination power. When the discrimination power, reconstruction error and sparsity are combined, the\nobjective function can be written as:\n\nK(cid:2)\n\nK(cid:2)\n\nJ3(X, \u03bb1, \u03bb2) =F (X) \u2212 \u03bb1\n\n(cid:2)xi(cid:2)0 \u2212 \u03bb2\n\n(cid:2)yi \u2212 Axi(cid:2)2\n\n2\n\n(10)\n\nwhere \u03bb1 and \u03bb2 are positive scalar weighting factors chosen to adjust the tradeoff between the\ndiscrimination power, sparsity and the reconstruction error. Maximizing J3(X, \u03bb1, \u03bb2) ensures that\n\ni=1\n\ni=1\n\n\fK(cid:4)\n\nK(cid:4)\n\na representation with discrimination power, reconstruction property and sparsity is extracted for\nrobust classi\ufb01cation of corrupted signals. In the case that the signals are corrupted, the two terms\n\n(cid:2)xi(cid:2)0 and\n\n(cid:2)yi \u2212 Axi(cid:2)2\n\ni=1\n\n2 robustly recover the signal structure, as in [9, 18]. On the other hand,\ni=1\nthe inclusion of the term F (X) requires that the obtained representation contains discriminative\ninformation for classi\ufb01cation. In the following discussions, we refer to the solution of the objective\nfunction J3(X, \u03bb1, \u03bb2) as the features for the proposed SRSC.\n\n4.2 Problem Solution\n\nin\n\nthe\n\n(9)\n\nand\n\nthe\n\nde\ufb01ned\n\nequation\n\nobjective\n\nfunction J2(X, \u03bb)\n\nobjective\nBoth\nfunction J3(X, \u03bb1, \u03bb2) de\ufb01ned in equation (10) have similar forms to the objective function\nde\ufb01ned in the standard sparse representation, as J1(x; \u03bb) in equation (3). However, the key\ndifference is that the evaluation of F (X) in J2(X, \u03bb) and J3(X, \u03bb1, \u03bb2) involves not only a\nsingle sample, as in J1(x; \u03bb), but also all other samples. Therefore, not all the pursuit methods,\nsuch as basis pursuit and LARS/Homotopy methods, that are applicable to the standard sparse\nrepresentation method can be directly applied to optimize J2(X, \u03bb) and J3(X, \u03bb1, \u03bb2). However,\nthe iterative optimization methods employed in the matching pursuit and the orthogonal matching\npursuit provide a direct reference to the optimization of J2(X, \u03bb) and J3(X, \u03bb1, \u03bb2).\nIn this\npaper, we propose an algorithm similar to the orthogonal matching pursuit and inspired by the\nsimultaneous sparse approximation algorithm described in [20, 21]. Taking the optimization\nof J3(X, \u03bb1, \u03bb2) as example, the pursuit algorithm can be summarized as follows:\n\n1. Initialize the residue matrix R0 = Y and the iteration counter t = 0.\n2. Choose the atom from the dictionary, A, that maximizes the objective function:\n\ng = argmaxg\u2208AJ3(gT Rt, \u03bb1, \u03bb2)\n\n(11)\n\n3. Determine the orthogonal projection matrix Pt onto the span of the chosen atoms, and\n\ncompute the new residue.\n\nRt = Y \u2212 PtY\n\n(12)\n\n4. Increment t and return to Step 2 until t is less than or equal to a pre-determined number.\n\nThe pursuit algorithm for optimizing J2(X, \u03bb) also follows the same steps. Detailed analysis of this\npursuit algorithm can be found in [20, 21].\n\n5 Experiments\n\nTwo sets of experiments are conducted. In Section 5.1, synthesized signals are generated to show the\ndifference between the features extracted by J1(X, \u03bb) and J2(X, \u03bb), which re\ufb02ects the properties\nof reconstruction and discrimination. In Section 5.2, classi\ufb01cation on real data is shown. Random\nnoise and occlusion are added to the original signals to test the robustness of SRSC.\n\n5.1 Synthetic Example\n\nTwo simple signal classes, f1(t) and f2(t), are constructed with the Fourier basis. The signals are\nconstructed to show the difference between the reconstructive methods and discriminative methods.\n\nf1(t) = g1 cos t + h1 sin t\n\nf2(t) = g2 cos t + h2 sin t\n\n(13)\n\n(14)\n\n\f20\n\n19\n\n18\n\n17\n\n16\n\n15\n\n14\n\n13\n\n12\n\n11\n\nselected by J1\n\n \n\nf1\nf2\n\ne\nd\nu\n\nt\ni\nl\n\np\nm\na\n\n \nt\n\ni\n\nn\ne\nc\ni\nf\nf\n\ne\no\nc\n\nselected by J2\n\n \n\nf1\nf2\n\n10\n\n9\n\n8\n\n7\n\n6\n\n5\n\n4\n\n3\n\n2\n\n1\n\ne\nd\nu\n\nt\ni\nl\n\np\nm\na\n\n \nt\n\ni\n\nn\ne\nc\ni\nf\nf\n\ne\no\nc\n\n10\n\n \n0\n\n20\n\n40\n60\nsample index\n\n80\n\n100\n\n0\n\n \n0\n\n20\n\n40\n60\nsample index\n\n80\n\n100\n\nFigure 1: Distributions of projection of signals from two classes with the \ufb01rst atom selected by:\nJ1(X, \u03bb) (the left \ufb01gure) and J2(X, \u03bb) (the right \ufb01gure).\n\nThe scalar g1 is uniformly distributed in the interval [0, 5], and the scalar g2 is uniformly distributed\nin the interval [5, 10]. The scalar h1 and h2 are uniformly distributed in the interval [10, 20]. There-\nfore, most of the energy of the signal can be described by the sine function and most of the discrim-\nination power is in the cosine function. The signal component with most energy is not necessary the\ncomponent with the most discrimination power. Construct a dictionary as {sin t, cos t}, optimizing\nthe objective function J1(X, \u03bb) with the pursuit method described in Section 4.2 selects sin t as the\n\ufb01rst atom. On the other hand, optimizing the objective function J2(X, \u03bb) selects cos t as the \ufb01rst\natom. In the simulation, 100 samples are generated for each class and the pursuit algorithm stops at\nthe \ufb01rst run. The projection of the signals from both classes to the \ufb01rst atom selected by J1(X, \u03bb)\nand J2(X, \u03bb) are shown in Fig.1. The difference shown in the \ufb01gures has direct impact on the\nclassi\ufb01cation.\n\n5.2 Real Example\n\nClassi\ufb01cation with J1, J2 and J3(SRSC) is also conducted on the database of USPS handwritten\ndigits [22]. The database contains 8-bit grayscale images of \u201c0\u201d through \u201c9\u201d with a size of 16 \u00d7 16\nand there are 1100 examples of each digit. Following the conclusion of [23], 10-fold strati\ufb01ed\ncross validation is adopted. Classi\ufb01cation is conducted with the decomposition coef\ufb01cients (\u2019 X\u2019 in\nequation (10)) as feature and support vector machine (SVM) as classi\ufb01er. In this implementation,\nthe overcomplete dictionary is a combination of Haar wavelet basis and Gabor basis. Haar basis is\ngood at modelling discontinuities in signal and on the other hand, Gabor basis is good at modelling\ncontinuous signal components.\n\nIn this experiment, noise and occlusion are added to the signals to test the robustness of SRSC. First,\nwhite Gaussian noise with increasing level of energy, thus decreasing level of signal-to-noise ratio\n(SNR), are added to each image. Table 1 summarizes the classi\ufb01cation error rates obtained with\ndifferent SNR. Second, different sizes of black squares are overlayed on each image at a random\nlocation to generate occlusion (missing data). For the image size of 16 \u00d7 16, black squares with\nsize of 3 \u00d7 3, 5 \u00d7 5, 7 \u00d7 7, 9 \u00d7 9 and 11 \u00d7 11 are overlayed as occlusion. Table 2 summarizes the\nclassi\ufb01cation error rates obtained with occlusion.\n\nResults in Table 1 and Table 2 show that in the case that signals are ideal (without missing data and\nnoiseless) or nearly ideal, J2(X, \u03bb) is the best criterion for classi\ufb01cation. This is consistent with the\nknown conclusion that discriminative methods outperform reconstructive methods in classi\ufb01cation.\nHowever, when the noise is increased or more data is missing (with larger area of occlusion), the\naccuracy based on J2(X, \u03bb) degrades faster than the accuracy base on J1(X, \u03bb). This indicates\n\n\fTable 1: Classi\ufb01cation error rates with different levels of white Gaussian noise\n5db\n0.2310\n0.2785\n0.2060\n\nJ1(Reconstruction)\nJ2(Discrimination)\n\n10db\n0.1895\n0.2065\n0.1490\n\n20db\n0.0975\n0.0816\n0.0803\n\n15db\n0.1375\n0.1475\n0.1025\n\n0.0855\n0.0605\n0.0727\n\nJ3(SRSC)\n\nN oiseless\n\nno occlusion\n\nTable 2: Classi\ufb01cation error rates with different sizes of occlusion\n9 \u00d7 9\n0.2020\n0.2405\n0.1815\n\n3 \u00d7 3\n0.0930\n0.0720\n0.0775\n\n7 \u00d7 7\n0.1605\n0.1805\n0.1465\n\n5 \u00d7 5\n0.1270\n0.1095\n0.1135\n\n0.0855\n0.0605\n0.0727\n\nJ1(Reconstruction)\nJ2(Discrimination)\n\nJ3(SRSC)\n\n11 \u00d7 11\n0.2990\n0.3305\n0.2590\n\nthat the signal structures recovered by the standard sparse representation are more robust to noise\nand occlusion, thus yield less performance degradation. On the other hand, the SRSC demonstrates\nlower error rate by the combination of the reconstruction property and the discrimination power in\nthe case that signals are noisy or with occlusions.\n\n6 Discussions\n\nIn summary, sparse representation for signal classi\ufb01cation(SRSC) is proposed. SRSC is motivated\nby the ongoing researches in the area of sparse representation in the signal processing area. SRSC\nincorporates reconstruction properties, discrimination power and sparsity for robust classi\ufb01cation. In\ncurrent implementation of SRSC, the weight factors are empirically set to optimize the performance.\nApproaches to determine optimal values for the weighting factors are being conducted, following\nthe methods similar to that introduced in [12].\n\nIt is interesting to compare SRSC with the relevance vector machine (RVM) [24]. RVM has shown\ncomparable performance to the widely used support vector machine (SVM), but with a substantially\nless number of relevance/support vectors. Both SRSC and RVM incorporate sparsity and recon-\nstruction error into consideration. For SRSC, the two terms are explicitly included into objective\nfunction. For RVM, the two terms are included in the Bayesian formula. In RVM, the \u201cdictio-\nnary\u201d used for signal representation is the collection of values from the \u201ckernel function\u201d. On the\nother hand, SRSC roots in the standard sparse representation and recent developments of harmonic\nanalysis, such as curvelet, bandlet, contourlet transforms that show excellent properties in signal\nmodelling. It would be interesting to see how RVM works by replacing the kernel functions with\nthese harmonic transforms. Another difference between SRSC and RVM is how the discrimination\npower is incorporated. The nature of RVM is function regression. When used for classi\ufb01cation,\nRVM simply changes the target function value to class membership. For SRSC, the discrimination\npower is explicitly incorporated by inclusion of a measure based on the Fisher\u2019s discrimination. The\nadjustment of weighting factor in SRSC (in equation (10)) may give some \ufb02exibility for the algo-\nrithm when facing various noise levels in the signals. A thorough and systemic study of connections\nand difference between SRSC and RVM would be an interesting topic for the future research.\n\nReferences\n\n[1] S. Mallat and Z. Zhang, \u201cMatching pursuits with time-frequency dictionaries,\u201d IEEE Trans.\n\non Signal Processing, vol. 41, pp. 3397\u20133415, 1993.\n\n[2] Y. Pati, R. Rezaiifar, and P. Krishnaprasad, \u201cOrthogonal matching pursuit: Recursive func-\ntion approximation with applications to wavelet decomposition,\u201d in 27th Annual Asilomar\nConference on Signals, Systems, and Computers, 1993.\n\n[3] S. Chen, D. Donoho, and M. Saunders, \u201cAtomic decomposition by basis pursuit,\u201d SIAM J.\n\nScienti\ufb01c Computing, vol. 20, no. 1, pp. 33\u201361, 1999.\n\n\f[4] I. Drori and D. Donoho, \u201cSolution of L1 minimization problems by LARS/Homotopy meth-\n\nods,\u201d in ICASSP, 2006, vol. 3, pp. 636\u2013639.\n\n[5] M. Aharon, M. Elad, and A. Bruckstein, \u201cThe K-SVD: An algorithm for designing of over-\ncomplete dictionaries for sparse representation,\u201d IEEE Trans. On Signal Processing, to appear.\n[6] J. Starck, M. Elad, and D. Donoho, \u201cImage decomposition via the combination of sparse\nrepresentation and a variational approach,\u201d IEEE Trans. on Image Processing, vol. 14, no. 10,\npp. 1570\u20131582, 2005.\n\n[7] Y. Li, A. Cichocki, and S. Amari, \u201cAnalysis of sparse representation and blind source separa-\n\ntion,\u201d Neural Computation, vol. 16, no. 6, pp. 1193\u20131234, 2004.\n\n[8] B. Olshausen, P. Sallee, and M. Lewicki, \u201cLearning sparse image codes using a wavelet pyra-\n\nmid architecture,\u201d in NIPS, 2001, pp. 887\u2013893.\n\n[9] M. Elad and M. Aharon, \u201cImage denoising via learned dictionaries and sparse representation,\u201d\n\nin CVPR, 2006.\n\n[10] M. Elad, B. Matalon, and M. Zibulevsky, \u201cImage denoising with shrinkage and redundant\n\nrepresentation,\u201d in CVPR, 2006.\n\n[11] D. Donoho and X. Huo, \u201cUncertainty principles and ideal atomic decomposition,\u201d IEEE Trans.\n\non Information Theory, vol. 47, no. 7, pp. 2845\u20132862, 2001.\n\n[12] Y. Lin and D. Lee, \u201cBayesian L1-Norm sparse learning,\u201d in ICASSP, 2006, vol. 5, pp. 605\u2013608.\n[13] D. Wipf and B. Rao, \u201cSparse bayesian learning for basis selection,\u201d IEEE Trans. on Signal\n\nProcessing, vol. 52, no. 8, pp. 2153\u20132164, 2004.\n\n[14] R. Duda, P. Hart, and D. Stork, Pattern classi\ufb01cation (2nd ed.), Wiley-Interscience, 2000.\n[15] P. Belhumeur, J. Hespanha, and D. Kriegman, \u201cEigenfaces vs. \ufb01sherfaces: Recognition using\nclass speci\ufb01c linear projection,\u201d IEEE Transactions on Pattern Analysis and Machine Intelli-\ngence, vol. 19, no. 7, pp. 711\u2013720, 1997.\n\n[16] A. Martinez and A. Kak, \u201cPCA versus LDA,\u201d IEEE Trans. on Pattern Analysis and Machine\n\nIntelligence, vol. 23, no. 2, pp. 228\u2013233, 2001.\n\n[17] S. Fidler, D. Skocaj, and A. Leonardis, \u201cCombining reconstructive and discriminative subspace\nmethods for robust classi\ufb01cation and regression by subsampling,\u201d IEEE Trans. on Pattern\nAnalysis and Machine Intelligence, vol. 28, no. 3, pp. 337\u2013350, 2006.\n\n[18] M. Elad, J. Starck, P. Querre, and D.L. Donoho, \u201cSimultaneous cartoon and texture image\ninpainting using morphological component analysis (MCA),\u201d Journal on Applied and Compu-\ntational Harmonic Analysis, vol. 19, pp. 340\u2013358, 2005.\n\n[19] A. Leonardis and H. Bischof, \u201cRobust recognition using eigenimages,\u201d Computer Vision and\n\nImage Understanding, vol. 78, pp. 99\u2013118, 2000.\n\n[20] J. Tropp, A. Gilbert, and M. Strauss, \u201cAlgorithms for simultaneous sparse approximation. part\nI: Greedy pursuit,\u201d Signal Processing, special issue on Sparse approximations in signal and\nimage processing, vol. 86, no. 4, pp. 572\u2013588, 2006.\n\n[21] J. Tropp, A. Gilbert, and M. Strauss, \u201cAlgorithms for simultaneous sparse approximation. part\nII: Convex relaxation,\u201d Signal Processing, special issue on Sparse approximations in signal\nand image processing, vol. 86, no. 4, pp. 589\u2013602, 2006.\n\n[22] USPS\n\nHandwritten\n\nhttp://www.cs.toronto.edu/ roweis/data.html,\u201d .\n\nDigit\n\nDatabase,\n\n\u201cavailable\n\nat:\n\n[23] R. Kohavi, \u201cA study of cross-validation and bootstrap for accuracy estimation and model\n\nselection,\u201d in IJCAI, 1995, pp. 1137\u20131145.\n\n[24] M. Tipping, \u201cSparse bayesian learning and the relevance vector machine,\u201d Journal of Machine\n\nLearning Research, vol. 1, pp. 211\u2013244, 2001.\n\n\f", "award": [], "sourceid": 3130, "authors": [{"given_name": "Ke", "family_name": "Huang", "institution": null}, {"given_name": "Selin", "family_name": "Aviyente", "institution": null}]}