Hierarchical POMDP Controller Optimization by Likelihood Maximization

Marc Toussaint, Laurent Charlin, Pascal Poupart

To simplify the difficulty of planning in partially observable domains, several researchers have proposed to decompose the task into smaller tasks arranged hierarchically. Several approaches have shown the benefits of planning using a hierarchy given a priori. Charlin et al. (2006) recently showed that the hierarchy discovery problem can be framed as a non-convex optimization problem. However, the inherent computational difficulty of solving such an optimization problem makes it hard to scale to real-world problems. In another line of research, Toussaint et al. (2006) developed a method to solve planning problems by maximum-likelihood estimation. In this paper, we show how the hierarchy discovery problem in partially observable domains can be tackled using a similar maximum likelihood approach. Our technique first transforms the problem into a dynamic Bayesian network through which a hierarchical structure can naturally be discovered while optimizing the policy. Experimental results demonstrate that this approach scales better than previous techniques based on non-convex optimization.

Subjects: 1.11 Planning; 3.4 Probabilistic Reasoning

Submitted: May 5, 2008

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.