Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The GP-based method for modelling non-stationary data is interesting. The combination of a global GP and local GPs by learning sparse multinomial logit/softmax seems to work well in practice. The authors have added more experiments in the rebuttal that addressed some of the reviewers' concerns. I would suggest to discuss further in the related work the connection with Mixture of Gaussian Process Experts (see e.g Rasmussen and Grahramani 2002, Meeds and Osindero).