A Consolidated Actor-Critic Model with Function Approximation for High-Dimensional POMDPs

Christopher A Niedzwiedz, Itamar Elhanany, Scott Livingston, ZhenZhen Liu

Practical problems in artificial intelligence often involve both large state and/or action spaces where only partial information is available to the agent. In high-dimensional cases, function approximation methods, such as neural networks, are often used to overcome limitations of traditional tabular schemes. In the context of reinforcement learning, the actor-critic architecture has received much attention in recent years, in which an actor network maps states to actions and a critic produces value function approximation given a state-action pair. This framework involves training two separate networks, thus requiring the critic network to effectively converge before the actor is able to produce a suitable policy, resulting in duplication of effort in modeling the environment. This paper presents a novel approach for consolidating the actor and critic networks into a single network that provides the functionality offered by the two separate networks. We demonstrate the proposed architecture on a partially observable maze learning problem.

Subjects: 12.1 Reinforcement Learning; 14. Neural Networks

Submitted: May 5, 2008

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.