Integrating Policy Summaries with Reward Decomposition for Explaining Reinforcement Learning Agents

Septon, Yael; Huber, Tobias; André, Elisabeth; Amir, Yael Septon and Tobias Huber Elisabeth André Ofra

Published in

arXiv, 2022

DOI: 10.48550/arxiv.2210.11825

Springer, Lecture Notes in Computer Science, p. 320-332, 2023

DOI: 10.1007/978-3-031-37616-0_27

Tools

Export citation

Search in Google Scholar

Integrating Policy Summaries with Reward Decomposition for Explaining Reinforcement Learning Agents

Journal article published in 2023 by Yael Septon

, Tobias Huber

, Elisabeth André

, Yael Septon and Tobias Huber Elisabeth André Ofra Amir

Distributing this paper is prohibited by the publisher

Full text: Unavailable

Preprint: policy unknown

Upload

Postprint: policy unknown

Upload

Published version: policy unknown

Upload

Abstract

Explaining the behavior of reinforcement learning agents operating in sequential decision-making settings is challenging, as their behavior is affected by a dynamic environment and delayed rewards. Methods that help users understand the behavior of such agents can roughly be divided into local explanations that analyze specific decisions of the agents and global explanations that convey the general strategy of the agents. In this work, we study a novel combination of local and global explanations for reinforcement learning agents. Specifically, we combine reward decomposition, a local explanation method that exposes which components of the reward function influenced a specific decision, and HIGHLIGHTS, a global explanation method that shows a summary of the agent's behavior in decisive states. We conducted two user studies to evaluate the integration of these explanation methods and their respective benefits. Our results show significant benefits for both methods. In general, we found that the local reward decomposition was more useful for identifying the agents' priorities. However, when there was only a minor difference between the agents' preferences, then the global information provided by HIGHLIGHTS additionally improved participants' understanding.

Published in

Links

Tools

Integrating Policy Summaries with Reward Decomposition for Explaining Reinforcement Learning Agents

Abstract