Inverse Reinforcement Learning with Suboptimal Experts
Reinforcement Learning is defined as: Given 1) measurements of an agent’s behavior over time in a variety of circumstances, 2) if needed, measurements of the sensory inputs of that agent; 3) if available, a model of the environment, determine the reward function being optimized. The problem definition does not talk about how well behaved the experts are, and if they know the exact model of the environment when they are taking actions. Therefore a general solution to this problem also should not include any assumptions as to whether the expert trajectories are optimal or not. However, existing works in this field either explicitly assume the expert trajectories are all optimal, or their algorithms tend to work poorly for sub-optimal expert trajectories. We propose a new algorithm which is much more resilient to sub-optimal trajectories.