This distribution contains code to accompany the paper "The Fixed
Points of Off-Policy TD", as well as an expanded description of the
optimization procedure.  The counter example in Sections 3 and 4 can
be generated by the counter_example.m script, and the Figures in
Section 5 can be generated by the experiments.m file.  The
optimization procedure is described in the optimization.pdf file.

The code requires the Matlab Lightspeed toolbox and the minFunc.m
routine:
http://research.microsoft.com/en-us/um/people/minka/software/lightspeed/
http://www.cs.ubc.ca/~schmidtm/Software/minFunc.html

The optimize_dist.m files are the main routines that minimize the KL
divergence between an initial and optimized distribution, subject to
the constraint that F(d) >= 0.  The optimize_dist.m routine computes
the updates without ever forming the F_i matrices, while the
optimize_dist2.m routine forms these matrices explicitly (and so has
higher complexity in theory, though it may be more efficient
especially for small numbers of basis functions)
