Exploiting Structure in Policy Construction

Craig Boutilier, Richard Dearden and Moise’s Goldszmidt

Markov decision processes (MDPs) have recently been applied to the problem ofmodeling decision-theoretic planning. While such traditional methods for solving MDPs are often practical for small states spaces, their effectiveness for large AI planning problems is questionable. We present an algorithm, called structured policy iteration (SPI), that constructs optimal policies without explicit enumeration of the state space. The algorithm retains the fundamental computational steps of the commonly used modified policy iteration algorithm, but exploits the variable and propositional independencies in reflected in a temporal Bayesian network representation ofMDPs. The principles behind SPI can be applied any structured representation of stochastic actions, and can be used in conjunction with recent approximation methods.

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.