Part of Advances in Neural Information Processing Systems 16 (NIPS 2003)
Daniel Neill, Andrew Moore
Given an N(cid:2)N grid of squares, where each square has a count and an un- derlying population, our goal is to ﬁnd the square region with the highest density, and to calculate its signiﬁcance by randomization. Any density measure D, dependent on the total count and total population of a re- gion, can be used. For example, if each count represents the number of disease cases occurring in that square, we can use Kulldorff’s spatial scan statistic DK to ﬁnd the most signiﬁcant spatial disease cluster. A naive approach to ﬁnding the maximum density region requires O(N 3) time, and is generally computationally infeasible. We present a novel algorithm which partitions the grid into overlapping regions, bounds the maximum score of subregions contained in each region, and prunes re- gions which cannot contain the maximum density region. For sufﬁciently dense regions, this method ﬁnds the maximum density region in optimal O(N2) time, in practice resulting in signiﬁcant (10-200x) speedups.