Rapid Distance-Based Outlier Detection via Sampling

Sugiyama, Mahito; Borgwardt, Karsten

Rapid Distance-Based Outlier Detection via Sampling

Mahito Sugiyama, Karsten Borgwardt

Advances in Neural Information Processing Systems 26 (NIPS 2013)

Bibtex Metadata Paper Reviews Supplemental

Abstract

Distance-based approaches to outlier detection are popular in data mining, as they do not require to model the underlying probability distribution, which is particularly challenging for high-dimensional data. We present an empirical comparison of various approaches to distance-based outlier detection across a large number of datasets. We report the surprising observation that a simple, sampling-based scheme outperforms state-of-the-art techniques in terms of both efficiency and effectiveness. To better understand this phenomenon, we provide a theoretical analysis why the sampling-based approach outperforms alternative methods based on k-nearest neighbor search.

Abstract

Name Change Policy