Distributed estimation of the inverse Hessian by determinantal averaging

Part of Advances in Neural Information Processing Systems 32 (NeurIPS 2019)

AuthorFeedback Bibtex MetaReview Metadata Paper Reviews Supplemental

Authors

Michal Derezinski, Michael W. Mahoney

Abstract

In distributed optimization and distributed numerical linear algebra, we often encounter an inversion bias: if we want to compute a quantity that depends on the inverse of a sum of distributed matrices, then the sum of the inverses does not equal the inverse of the sum.
An example of this occurs in distributed Newton's method, where we wish to compute (or implicitly work with) the inverse Hessian multiplied by the gradient.
In this case, locally computed estimates are biased, and so taking a uniform average will not recover the correct solution.
To address this, we propose determinantal averaging, a new approach for correcting the inversion bias. This approach involves reweighting the local estimates of the Newton's step proportionally to the determinant of the local Hessian estimate, and then averaging them together to obtain an improved global estimate. This method provides the first known distributed Newton step that is asymptotically consistent, i.e., it recovers the exact step in the limit as the number of distributed partitions grows to infinity.
To show this, we develop new expectation identities and moment bounds for the determinant and adjugate of a random matrix.
Determinantal averaging can be applied not only to Newton's method, but to computing any quantity that is a linear tranformation of a matrix inverse, e.g., taking a trace of the inverse covariance matrix, which is used in data uncertainty quantification.