Guy Shani, Ronen Brafman
Agents learning to act in a partially observable domain may need to overcome the problem of perceptual aliasing i.e., different states that appear similar but require different responses. This problem is exacer- bated when the agent's sensors are noisy, i.e., sensors may produce dif- ferent observations in the same state. We show that many well-known reinforcement learning methods designed to deal with perceptual alias- ing, such as Utile Suffix Memory, finite size history windows, eligibility traces, and memory bits, do not handle noisy sensors well. We suggest a new algorithm, Noisy Utile Suffix Memory (NUSM), based on USM, that uses a weighted classification of observed trajectories. We compare NUSM to the above methods and show it to be more robust to noise.