Part of Advances in Neural Information Processing Systems 8 (NIPS 1995)
Jefferson Coelho, R. Sitaraman, Roderic Grupen
This paper describes a policy iteration algorithm for optimizing the performance of a harmonic function-based controller with respect to a user-defined index. Value functions are represented as poten(cid:173) tial distributions over the problem domain, being control policies represented as gradient fields over the same domain. All interme(cid:173) diate policies are intrinsically safe, i.e. collisions are not promoted during the adaptation process. The algorithm has efficient imple(cid:173) mentation in parallel SIMD architectures. One potential applica(cid:173) tion - travel distance minimization - illustrates its usefulness.