Pedro Moreno, Purdy Ho, Nuno Vasconcelos
Over the last years signiﬁcant efforts have been made to develop kernels that can be applied to sequence data such as DNA, text, speech, video and images. The Fisher Kernel and similar variants have been suggested as good ways to combine an underlying generative model in the feature space and discriminant classiﬁers such as SVM’s. In this paper we sug- gest an alternative procedure to the Fisher kernel for systematically ﬁnd- ing kernel functions that naturally handle variable length sequence data in multimedia domains. In particular for domains such as speech and images we explore the use of kernel functions that take full advantage of well known probabilistic models such as Gaussian Mixtures and sin- gle full covariance Gaussian models. We derive a kernel distance based on the Kullback-Leibler (KL) divergence between generative models. In effect our approach combines the best of both generative and discrim- inative methods and replaces the standard SVM kernels. We perform experiments on speaker identiﬁcation/veriﬁcation and image classiﬁca- tion tasks and show that these new kernels have the best performance in speaker veriﬁcation and mostly outperform the Fisher kernel based SVM’s and the generative classiﬁers in speaker identiﬁcation and image classiﬁcation.