Computation of Similarity Measures for Sequential Data using Generalized Suffix Trees

Rieck, Konrad; Laskov, Pavel; Sonnenburg, Sören

Computation of Similarity Measures for Sequential Data using Generalized Suffix Trees

Konrad Rieck, Pavel Laskov, Sören Sonnenburg

Advances in Neural Information Processing Systems 19 (NIPS 2006)

Abstract

We propose a generic algorithm for computation of similarity measures for se- quential data. The algorithm uses generalized sufﬁx trees for efﬁcient calculation of various kernel, distance and non-metric similarity functions. Its worst-case run-time is linear in the length of sequences and independent of the underlying embedding language, which can cover words, k-grams or all contained subse- quences. Experiments with network intrusion detection, DNA analysis and text processing applications demonstrate the utility of distances and similarity coefﬁ- cients for sequences as alternatives to classical kernel functions.

Abstract

Name Change Policy