Supplementary material:

- nips18-supp.pdf: additional material and experimental results.

- *.gif: GIF animations showing the sparse oblique tree resulting from the
  TAO algorithm for increasing sparsity (decreasing C value), initialized
  from the CART tree (oblique or axis-aligned, with a depth as indicated in
  the filename), for the MNIST dataset of handwritten digit images. There are
  two types of animations: one shows the tree pictorially, and the other shows
  it as a binary heap (the filename includes "_heap"). We describe both below
  in detail.

  All these animations may be seen with a web browser or with specialized
  GIF image viewers. For example, in Linux:
    gifview animation.gif
    animate -delay 100 -resize 1200x900 animation.gif
    smplayer animation.gif


TREE ANIMATIONS

- Example: depth08_init:obl.gif.
  As the name indicates, this corresponds to using as initial tree an oblique
  tree (trained with CART) of depth 8.
- Each frame shows:
  . The value of C the tree was computed for, except for the first frame,
    which is the initial tree obtained with CART (oblique or axis-aligned).
  . The resulting tree and its parameters at the end of the TAO optimization
    (for the corresponding C value).
  . The tree's value for the training and test error, percentage of nonzero
    parameters and number of splits (internal nodes) in the tree.
  . The two plots at the top left show the curves over C of the training and
    test error (left plot) and the curves over C of the %nonzeros and #splits
    (right plot). The horizontal dashed lines indicate the initial CART tree. 
    The vertical moving bar indicates the C value for the frame (i.e., the
    current tree).
- For each internal node of the tree we plot:
  . Title line: the index in the heap of the node (see binary heap below) and,
    in parentheses, the bias of its linear decision boundary.
  . Plot: the weight vector of its linear decision boundary as a 28x28 image
    (red: negative, blue: positive, white: zero).
- For each leaf node of the tree we plot:
  . Title line: the index in the heap and, in parentheses, the number of
    training points that reach the leaf and (boldface) the digit class it
    predicts.
  . Plot: the mean of the training points that reach the leaf.

The sharp changes in the tree topology are caused by postprocessing it at the
end of the TAO optimization (for each C value) to eliminate dead branches and
pure subtrees; this makes the tree progressively smaller as we decrease C.


BINARY HEAP ANIMATIONS

- Example: depth08_init:obl_heap.gif. The filename includes "_heap".

- We visualize the tree as the array of a binary heap, where the tree nodes
  are indexed in breadth-first order assuming a complete binary tree (missing
  nodes at a given level are not shown but they count for the purposes of
  indexing). See chapter 6 "Heapsort" in Cormen et al., "Introduction to
  Algorithms", 3rd ed., MIT Press, 2009.
- For each internal node of the tree, we plot:
  . Title line: the index in the heap (breadth-first index) of the node and,
    in parentheses, the number of training points that reach it.
  . Above-left plot: its weight vector as a 28x28 image.
  . Above-right plot: the mean of the training points that reach it.
  . Below plot: the class label histogram of the training points that reach it
    (one bar for each of the 10 digit classes 1-10, where 10 is the zero
    digit, each in a different color).
- For each leaf node, we plot:
  . Title line: the index in the heap and, in parentheses, the number of
    training points that reach the leaf and (boldface) the digit class it
    predicts.
  . Above plot: the mean of the training points that reach it.
  . Below plot: the class label histogram of the training points that reach it.

The heap plot shows the tree with no postprocessing, i.e., showing dead
branches (whose nodes are shown as empty nodes).

