Thursday, April 22, 2010
Sunday, April 11, 2010
Right now, I'm trying to formally model OCCAM's behavior as a Markov Decision Process (MDP). Doing so will allow me to use a large variety of standard tools for testing MDPs, such as various MDP toolboxes for MATLAB. This will allow me to evaluate the suitability of different types of MDP solvers (Q-learning, value iteration, policy iteration, etc.) for OCCAM's controller.
INRA's MDP Toolbox is what I'm using at the moment. It seems to be working out pretty well. One interesting thing is that describing the problem as an MDP is very memory intensive, but solving this MDP it is not particularly CPU intensive. Currently, MATLAB is using about 3.2GB of memory (note that I *am* using sparse matrices to reduce memory consumption), but I can find a policy with the tools within a few seconds. I'll have to see if this interesting property holds when I start working on problems that truly require a long/infinite horizon solution.