Ping Li, Gennady Samorodnitsk, John Hopcroft
The method of Cauchy random projections is popular for computing the $l_1$ distance in high dimension. In this paper, we propose to use only the signs of the projected data and show that the probability of collision (i.e., when the two signs differ) can be accurately approximated as a function of the chi-square ($\chi^2$) similarity, which is a popular measure for nonnegative data (e.g., when features are generated from histograms as common in text and vision applications). Our experiments confirm that this method of sign Cauchy random projections is promising for large-scale learning applications. Furthermore, we extend the idea to sign $\alpha$-stable random projections and derive a bound of the collision probability.