Kukjin Kang, Jong-Hoon Oh
We study generalization capability of the mixture of experts learn(cid:173) ing from examples generated by another network with the same architecture. When the number of examples is smaller than a crit(cid:173) ical value, the network shows a symmetric phase where the role of the experts is not specialized. Upon crossing the critical point, the system undergoes a continuous phase transition to a symme(cid:173) try breaking phase where the gating network partitions the input space effectively and each expert is assigned to an appropriate sub(cid:173) space. We also find that the mixture of experts with multiple level of hierarchy shows multiple phase transitions.