Matt's Maniacal Musings

Fixed Image List Algorithm (FILA)A backtrack algorithm for solving tiling problems

matt — Fri, 14 Sep 2018 00:50:28 +0000

I've designed a new backtrack algorithm for solving tiling problems that I call fixed image list algorithm (FILA). The algorithm is flexible in that it supports various ordering heuristics.¹ By using a heuristic that always selects the first open cell, FILA behaves like Fletcher and de Bruijn's algorithms.^2,3 If you use a heuristic that always picks the cell that has fewest fit options, it behaves like my most constrained hole (MCH) algorithm.⁴ A key distinguishing feature of FILA is that ordering heuristics return neither a target cell to fill, nor a target piece to place, but rather return a set of image lists that should be tried (where an image is defined as a particular translation of a particular rotation of a puzzle piece). The returned set contains one list of images for each uniquely shaped puzzle piece. Although the ordering heuristics that target cells are best suited to FILA, the interface does allow heuristics to target pieces, and is a subject for additional research.

This interface allows the heuristic to select and return a precalculated set of image lists that is customized in three different ways to radically reduce the size of the lists by eliminating most images that cannot possibly fit. First, because different lists are calculated for each cell, only images bounded by the puzzle walls are included in the lists. Second, some heuristics (like that used by Fletcher's algorithm) guarantee that cells are filled in a particular order. For such fixed selection order heuristics, FILA identifies this order during initialization and, through a procedure I call priority occupancy filtering (POF), only includes images in a list for a cell that do not conflict with cells that must already be filled. Third, a technique I call neighbor occupancy filtering (NOF) (which is similar to a technique Gerard Putter described to me in a 2011 e-mail conversation) precalculates a different set of image lists for each possible occupancy state of the adjacent neighbors of each target cell. For a 3D polycube puzzle, there are up to six adjacent neighbors (in the $-x$, $-y$, $-z$, $+x$, $+y$, and $+z$ directions), and so up to 64 different sets of image lists are precalculated for each puzzle cell. Later, when a cell is selected by the heuristic, the current occupancy state of those adjacent neighbors is determined, and the set of image lists corresponding to that compound state is returned, guaranteeing no image conflicts with those neighboring cells. In this way, the number of images that must be tried by FILA at each recursive step is radically reduced, improving algorithm efficiency relative to other algorithms that make no such optimization, but without the expense of continuous list maintenance as is required by Donald Knuth's DLX algorithm.

Version 2.0 of my polycube puzzle solver only includes the DLX and FILA algorithms, but the retired algorithms (de Bruijn, EMCH, and MCH) can all be recreated with FILA by using the f (first), e (estimate), and s (size) heuristics respectively. In addition all of the other implemented heuristics, previously only available to DLX, may now also be used with FILA. Despite the additional abstraction, the new FILA algorithm (even with the new NOF optimization disabled) has improved puzzle solve times (I've seen from 10% to 35%). Enabling NOF (by simply adding -n to the command line) consistently provides additional incremental performance gains (I've seen from 5% to 27%). Because performance gains afforded by NOF are not attributable to changes in the search tree, but rather are limited to the efficient elimination of many images that don't fit at each branch; these performance improvement percentages should not compound as puzzle size increases, but should rather be largely independent of puzzle size.

Although the examples shown here, and the polycube puzzle solver software are limited to 3D puzzles on a cubic lattice, FILA and all of it's supporting components have no such constraints and can be used to solve tiling problems in any number of dimensions on any lattice.

December 3, 2018 Edit: I changed the title of this blog entry and edited the above introduction to make it clear that FILA was not limited to 3D puzzles on a cubic lattice.

Background

I was recently invited to work on a paper on polyomino puzzle parity with professor Marcus Garvie of the University of Guelph, which got me back to the subject of polyomino and polycube puzzle solving. During my research for that paper, I read Fletcher's 1965 publication on solving the 10x6 pentomino problem shown in Figure 1, and also spent an evening making my best effort translation (via google translate) of de Bruijn's 1971/72 paper on solving the same puzzle.

Figure 1. Both Fletcher and de Bruijn published papers describing a nearly identical technique for finding all 2339 solutions to the problem of placing the twelve pentominos into a 10x6 box. One solution is shown here.

Although some 50 years old, Fletcher and de Bruijn's algorithms (which are extremely similar) are still widely regarded as the fastest for many tiling problems. These algorithms define the fixed list of 63 possible rotations of the 12 pentominoes. At each recursive step, the algorithms target an unfilled cell (a hole), and attempt to fill that cell by translating each of the 63 piece images to the absolute position of the target, and then attempting to place each of those translated images to cover the target. Here an image is defined as a particular translation of a particular rotation of a particular puzzle piece. (This is a slightly looser definition than I used in my previous blog article⁴ in that here I allow an image to fall partially outside the boundary of the puzzle box as required by Fletcher and de Bruijn's algorithms. I'll use the term bounded image to refer to images contained wholly inside the puzzle box.) If you were to simultaneously watch animations of the Fletcher and de Bruijn algorithms placing and removing pieces from the puzzle as they search for solutions, you would find them to be almost identical. In particular, the search trees the two algorithms explore are identical. (I.e., the set of partial assemblies the two algorithms produce are identical.) Only the order in which the branches of that tree are explored differ.

Both Fletcher and de Bruijn modeled the 10x6 puzzle box so the shorter sides (dimension 6) are on the left and right, and the longer sides (dimension 10) are top-and-bottom. To eliminate rotationally redundant solutions, both Fletcher and de Bruijn started by placing the X piece in one of 7 locations in the upper left quadrant of the box. Then, starting with the top-left cell, the algorithms scan down searching for an unfilled puzzle cell. When the bottom edge of the puzzle box is reached, the scan continues at the top of the next column to the right. When an unfilled cell is found (the target), the algorithms try all 63 possible orientations of the 12 pentomino pieces to fill that target. For each such orientation one could try to place each of the five constituent cubes of the pentomino into the target, but because all cells to the left and directly above that cell are guaranteed to be previously filled, there is only one cell of each oriented piece that can possibly fit the puzzle. (For a more complete explanation, see Figure 2 of my previous blog article.⁴) So each algorithm need only try 63 images to fill the target. For each successfully placed image, the algorithm recurses, scanning for the next unfilled cell in order. When the list of images is exhausted at any particular target, the algorithms backtrack to the previous target and continue processing where it left off, considering each of the remaining images in the list at that cell.

The order in which the 63 images are considered at each cell do differ, but other than that the algorithms are identical if your view of them is limited to the animations. But internally, their designs significantly differ in how they process the 63 images to quickly sift through those that do not fit and identify the ones that do. The two approaches each have certain advantages and disadvantages in efficiency. It was these differences that got me thinking about how to better optimize this aspect of backtrack algorithms that rely on fixed image lists that ultimately led to my design of FILA.

During any particular recursive step, most of the images in the list of 63 images will be found to be unusable either because the image corresponds to a piece that is already used, or because it conflicts with pieces already placed on the board, or because it intersects the boundary of the puzzle box itself. Donald Knuth's dancing links (DLX) algorithm,¹ in contrast, takes the different approach of maintaining dynamic image lists for each puzzle cell and for each puzzle shape, that are continuously pruned and restored during algorithm execution. At any recursive step, for each unfilled puzzle cell a perfect image list is available that includes all images that cover that cell, but only those images that still fit in the puzzle without conflict, and only images for pieces that are still available. Likewise, there is a perfect image list available for each remaining piece that includes all images of that piece that still fit in the puzzle without conflict. But for many problems the time to keep these lists updated is more than the time saved by not having to sift through images that are either no longer available or no longer fit.

Still, many other tiling problems, due to their curious geometries, are not efficiently solved with the simple Fletcher algorithm that always fills the puzzle from left-to-right. DLX's abstract data model easily supports any conceivable ordering heuristic allowing it to better solve these problems. Further, this same abstract data model allows it to be used for a wide variety of problem domains that go beyond tiling problems in Z^n space. Recognizing their performance advantages, FILA was designed to use fixed (precalculated) image lists, but also have the flexibility to be easily used with a variety of ordering heuristics.

In this article, I'll start by showing how Fletcher and de Bruijn sifted through the list of 63 pentomino images. I'll then detail FILA, and show how it's use of cell-specific image lists generated with POF and NOF image filtering attempts to capture the best aspects of both approaches and improve upon them.

Fletcher's Image Sifting Technique

Fletcher's approach to checking for image fits is interesting. Instead of checking the 63 images for conflicts independently, he explores the cells around the target cell following a predefined tree structure. The distance from the root to each leaf of this tree is length 5. There are 63 leaves on the tree, each corresponding to one of the 63 pentomino orientations. If either a filled cell or a puzzle wall is encountered while following a branch, then exploration of that branch is terminated and all images subtending that branch are efficiently skipped. For example, the first cell of the tree that is checked is directly below the target cell. If that cell is filled, then 29 images from the list of 63 are skipped and the cell to the immediate right of the target cell is then tested to see if it is filled. Each time a leaf is reached, the pentomino corresponding to that leaf is checked for availability (it may have already been used). If available, the pentomino is placed and the algorithm recurses.

I've mapped the tree Fletcher designed to an animated player shown in Figure 2. The black area to the left represents cells that were found to be previously filled, and the red cell (shown at step 0) is the first empty cell found during the search for an empty cell. In the original 10x6 pentomino problem, the tree cannot possibly be fully traversed because either the top or bottom of the puzzle would interfere, so I've increased the vertical dimension of the puzzle area so that the entire search tree can be examined. Each time a leaf of the tree is reached, I display an image counter in the top-right area of the puzzle so you can more easily keep track of where you are in the image list.

Figure 2. Animation of Fletcher's tree-based search of the cells neighboring a target cell to determine which of the 63 different rotations and reflections of the 12 pentominoes can be placed.

The entire tree can be explored with only 90 memory accesses, but in practice, far fewer steps are typically needed for the 6x10 pentomino problem as either the puzzle boundary or previously placed pieces will interrupt many branch explorations. Still, there are some aspects of this approach that are undesirable. First, Fletcher doesn't check to see whether a piece is even available until a leaf of the search tree is reached. 83% of the pieces placed during algorithm execution on the 10x6 pentomino puzzle are for the last 4 pieces of the puzzle, so a significant amount of time is spent checking to see if pieces that are no longer available will fit. Second, note that despite the tree structure, the occupancy state of many cells are checked multiple times. For example, you can see from the movie player, that the cell just to the right of the target cell is checked in steps 5, 9, 20, and 44. If that cell is filled, those four checks respectively eliminate 1, 4, 10, and 34 images. It would be nice if only one check of that cell was made and, if found to be occupied, all 49 of these conflicting images were eliminated at once. Unfortunately, due to the diversity of the pentomino shapes, there is no way to construct a simple exploration tree that avoids revisiting cells.

I'll make one other observation which is important to understanding the effectiveness of NOF filtering (discussed later). Because the puzzle is filled from left to right, cells further to the right of the root cell are decreasingly likely to be found occupied. Also, note that occupancies detected further from the root eliminate fewer images of the tree. So the cells nearest the root are among the most likely to be occupied, and also eliminate the most images when they are occupied.

De Bruijn's Image Sifting Technique

Like Fletcher, de Bruijn's software started by placing the X piece in one of 7 board positions. Then when trying to fill a targeted cell, de Bruijn simply linearly iterated over the remaining 62 images. But where Fletcher checked for piece availability last, de Bruijn made this check first. Here's is an excerpt of his program that I attempted to translate to English:

        refillingAttempt:   if warehouse[pieceNum[i]] = 0 then
                                 goto nextSlice;
                            for i:= step 1 until 4 do
                                if occupied[cell + relpos[slice, i]] = 1 then
                                    goto nextSlice;
                            warehouse[pieceNum[slice]] := 0;
                            occupied[cell] := 1;
                            for i:= step 1 until 4 do
                                occupied[cell + relpos[slice,i]] := 1;

                             .
                             .  //  recurse, or produce solution if this was the last piece
                             .

        nextSlice:           i := i + 1;
                             if i <= 63 then
                                 goto refillingAttempt;

As you can see, he kept the definitions for all 63 images in a two dimensional array: relpos[slice, i], where slice = 1 to 63 was what I'd call an image number, and i = 1 to 4 identifies the four cells of the pentomino shape (other than the cell occupying the target cell which is known to be open). The value of each array entry is an integer that specified the relative-position of a constituent cell of the image (slice). This number could be added to the integer cell location of the target to give the integer cell location of the i^th cell of the slice. He also had an array pieceNum defined so that pieceNum[slice] mapped the image number slice back to it's prototype polyomino number (1 to 12). He had a boolean array called warehouse which tracked the availability of the 12 polyominoes. There was also a boolean array called occupied which tracked whether each puzzle cell was occupied or not.

This linear iteration over 62 images seems far less efficient than Fletcher's tree-like search over the grid space, but this approach does have the one advantage of not checking cell occupancy states for images of puzzle pieces that have already been placed. Because most of the search is done when only few pieces remain, most images in the list of 63 are skipped without checking board availability at all. Running the algorithm on the 6x10 pentomino problem, I found that on average only 3.65 pieces must be considered at each recursive step. Because each piece has on average 5.25 unique images, at each recursive step, the algorithm only checks cell availability for about 3.65 x 5.25 = 19.2 images.

The original motivation for my design of FILA was to try to somehow take advantage of Fletcher's approach of quickly eliminating many potential images from an image list due to a single puzzle cell being occupied, while simultaneously somehow quickly skipping over images for pieces that are no longer available (in the spirit of de Bruijn's algorithm).

Fixed Image List Algorithm (FILA)

We'll start by looking at the pseudo code for the main backtrack processing of FILA which will reveal it's recursive nature and the abstract interface to the ordering heuristic. The ordering heuristic is where the new and interesting stuff happens, and is explained over a few sections wherein the workings of NOF, and POF are explained.

solveFila

Assume you have a puzzle with $P$ polycube pieces that are to be used to fill some puzzle region $R$. We will not require that each piece have a unique shape, so let $\mathbb{Q}$ be the set of shapes unique under rotation from which the $P$ pieces are chosen. $\mathbb{Q}$ is a minimal set in that every shape in $\mathbb{Q}$ must be used to form at least one of the $P$ pieces. Let $Q$ be the number of unique shapes: $Q = \vert \mathbb{Q} \vert$. Identify each shape in $\mathbb{Q}$ with a number $s = 1, 2, 3, \ldots, Q$. The algorithm solveFila maintains an ordered set $S$ (e.g., an array) holding these shape numbers. Initialize $S$ by arbitrarily loading these numbers in order: $S_1 = 1, S_2 = 2, \ldots S_Q = Q$. Define $N_s$ to be the number of pieces having shape $s$. Initially $N_s > 0$ for all $s$, but each time a piece of shape $s$ is placed, the value of $N_s$ will be decremented, and when a piece of shape $s$ is removed, the value of $N_s$ will be incremented. So during algorithm execution, $N_s$ represents the number of pieces of shape $s$ that have yet to be placed. Let $V$ (volume) be the number of cells in the puzzle region $R$ that must be filled, and denote the cells themselves $c_0, c_1, c_2, \ldots c_{V-1}$. Although it's probably inappropriate for pseudo code, we'll assume that the occupancy state of the puzzle is modeled as a bitfield $o$, where bit $v$ of $o$ is one if and only if $c_v$ is occupied. The list $O$ is used as a stack of images currently placed in the puzzle and is used only for producing output when solutions are found.

solveFila invokes the function selectFila which returns a set $I$ of lists of (references to) images to be attempted to be placed in the puzzle. There is one image list $I_s \in I$ for each shape $s$. In general all lists in $I$ could contain images, but only the images in lists $I_s$ for which at least one piece of shape $s$ remains to be placed should be attempted to be placed in the puzzle. All image list sets are precalculated, but many such sets exist. The process by which an image list set is chosen by selectFila, and the exact content of each set are detailed in subsequent sections. Each image $i$ in $I_s$ has a layout field $L[i]$ which is itself a bitfield. Bit $v$ of $L[i]$ is set if and only if image $i$ occupies cell $c_v$.

solveFila takes three arguments: $p$ is the number of remaining puzzle pieces; $q$ is the number of remaining shapes; and $o$ is the current occupancy state of the puzzle region $R$. So to start things off, you invoke solveFila with parameters $p=P$, $q=Q$, and $o = 0$. Below, I use the notation $x \land y$ to represent the bit-wise and of bit fields $x$ and $y$, and $x \lor y$ to represent the bit-wise or of $x$ and $y$.

 1. solveFila$(p, q, o)$
 2.     If $p = 0$ process the solution and return.
 3.     Set $I \leftarrow$ selectFila$(p, q, o)$.
 4.     For each $j \leftarrow 1, 2, 3, \ldots q$,
 5.        set $s \leftarrow S_j$;
 6.        set $N_s \leftarrow N_s - 1$;
 7.        if $N_s = 0$,
 8.            swap$(S_j, S_q)$,
 9.            set $q \leftarrow q - 1$;
10.        for each $i$ in $I_s$,
11.            if $o \land L[i] = 0$,
12.                set $O_p \leftarrow i$;
13.                solveFila$(p-1, q, o \lor L[i])$;
14.        if $N_s = 0$,
15.            set $q \leftarrow q + 1$;
16.            swap$(S_j, S_q)$;
17.        set $N_s \leftarrow N_s + 1$.

So unlike Fletcher and de Bruijn's algorithms, FILA keeps track of which shapes still have unused pieces, and only considers placing images of those shapes. This information is maintained by lines 4-9 of the algorithm, and perhaps deserves some explanation. Line 4 iterates over the numbers $j$ from 1 to the number of remaining shapes $q$. Note that $j$ is not a shape number, but just a sequence number. The numbers of the available shapes are stored in the ordered list $S$ which serves as a warehouse of available shape numbers. The available shape numbers are kept in the first $q$ positions of $S$, so $s = S_j$ is an available shape number for all iterated $j$ values. While shape $s$ is under consideration (starting at line 5), the number of copies of that shape, $N_s$, is decremented (line 6). If that counter hits zero (line 7), then no more copies of that shape are available. In that case, the values of $S_j$ and $S_q$ are swapped (line 8) so that shape number $s$ is listed as the last available shape in $S$. Then the number of available shapes $q$ is decremented (line 9), so that subsequent recursive calls to solveFila will no longer see shape $s$ in the now smaller window into the warehouse $S$. After all images of shape $s$ have been tested for fit, and a recursive search for solutions has been performed for each image that does fit (lines 10-13), the shape bookkeeping operations (performed in lines 6-9) are undone (lines 14-17) to restore shape-related data to it's previous state, and the next sequence number $j$ is processed (starting again at line 4).

Puzzle boundary filtering, and POF and NOF filtering (explained in the next sections), can be so effective that it is not uncommon for an Image list $I_s$ to be empty. For this reason, a small overall performance benefit can be had by inserting a check immediately after line 5 to see if $I_s$ is empty, and if so, skip immediately to the next $j$ value, bypassing the shape bookkeeping updates, the pointless loop over the empty image list, and the subsequent undo of the shape bookkeeping.

selectFila

The function selectFila returns the set of image lists $I$ that should be tried by solveFila. The implementation will vary depending on the desired behavior of the ordering heuristic. I will give here three example implementations.

selectFila for Fixed-Order Heuristics

The first works well for any fixed-order ordering heuristic, wherein the heuristic keeps an array, $C$, of (references to) all the puzzle cells in a particular (fixed) order, and always picks the first unoccupied cell from this list as the fill target. Through an appropriate ordering of the cells in $C$, any fixed order heuristic can be realized. For example, by ordering the cells so as to minimize coordinates in $x$, $y$, $z$ priority order, Fletcher's left-to-right heuristic is produced. By sorting cells to maximize the quantity $x^2+y^2+z^2$, cells are filled radially from the outside towards the puzzle center. Assume the index into $C$ is zero-based: $C_0$, $C_1$, $\ldots$ $C_{V-1}$. Let each cell $c_v$ have a bit field $B[c_v]$ with only bit $v$ set, so that $c_v$ is occupied if and only if $B[c_v] \land o \ne 0$. As a performance optimization, this implementation of selectFila maintains a stack $M$, of the indices into $C$ of previously selected cells so that subsequent calls to selectFila don't have to start searching from the beginning of $C$ for the next unoccupied cell. $M_{P+1}$ is initialized to -1 to ensure the first invocation of selectFila starts its search for an empty cell at position $0$ in $C$.

selectFila is invoked with the same three arguments as solveFila: the number of remaining pieces $p$, the number of remaining shapes $q$, and the current puzzle occupancy state $o$. The function getImageListSet returns the appropriate set of image lists $I$ for the selected fill target $C_m$ and is detailed in the next section.

selectFila$(p, q, o)$
{
    Set $m \leftarrow M_{p+1} + 1$.
    While $o \land B[C_m] \ne 0$,
        set $m \leftarrow m + 1$.
    Set $M_p \leftarrow m$.
    Return getImageListSet$(C_m, o)$;
}

`selectFila` for F Heuristic

For our second example, first recall that the puzzle cells $c$ are themselves numbered, $c_0$, $c_1$, $\ldots$, $c_{V-1}$. This numbering defines their bit position in the occupancy state variable $o$. If this ordering happens to be that of a desirable fixed order heuristic, you can use that natural order directly with no need for the list $C$. In my solver, I number my cells according to their numerical coordinate positions with the $x$ coordinate taking precedence over $y$, and $y$ taking precedence over $z$. But this ordering is exactly the left-to-right fill order used by Fletcher and de Bruijn's algorithms. I call this heuristic that just picks the first open cell the "F" heuristic. (Or you can think about the F standing for Fletcher if you want.) The F heuristic in my solver overrides the default selectFila implementation used by all other fixed-order heuristics with a simpler (and faster) implementation that takes advantage of the natural cell ordering. Abstractly, it looks like this:

selectFila$(p, q, o)$
{
    Set $v \leftarrow $lowestSetBit$(\lnot o)$.
    Return getImageListSet$(c_v, o)$.
}

The operation $\lnot o$ is the binary negation of $o$ (to produce the bitfield representing the holes in the puzzles), and lowestSetBit(o) returns the number of the lowest bit in $o$ that's set (which most modern processors implement in silicon).

`selectFila` for E Heuristic

As a third example, consider a heuristic that picks a cell estimated to be hardest to fill by first identifying all cells that have a minimum number of open neighbor cells, and then picking the cell among that set at which a minimum number of images fit (by explicitly counting the number of fits). So it acts sort of like a poor man's S heuristic most often used by DLX. Give each cell $c$ an additional field $N[c]$ which is a bit field with up to six bits set that identify the occupancy bits of the adjacent neighbors of $c$ in the six ordinal directions: +x, +y, +z, -x, -y, and -z. If one or more of these six neighbors are non-existent (because $c$ is at the perimeter of the puzzle, and/or because the puzzle is only two-dimensional), then $N[c]$ will have fewer than six bits set. Then the number of open neighbor cells of $c$ may be found by counting the number of bits set in the quantity $N[c] \land \lnot o$. The algorithm below starts by loading the cells with a minimum number of open neighbors into the set $C$, then iterating over all cells in $C$ and using a fit counting helper function to find a cell for which a minimum number of images fit.

selectFila$(p, q, o)$
    Set $h \leftarrow \lnot o$.
    Set $n_{min} \leftarrow \infty$.
    Set $C \leftarrow \emptyset$.
    For each bit number $v$ set in $h$,
        set $n \leftarrow $ countBits$(N[c_v] \land h)$;
        if $n \le n_{min}$,
            if $n < n_{min}$,
                set $n_{min} \leftarrow n$;
                set $C \leftarrow \emptyset$;
            add $c_v \rightarrow C$.
    Set $f_{min} \leftarrow \infty$
    For each $c$ in $C$,
        set $I \leftarrow $getImageListSet$(c, o)$;
        set $f \leftarrow $countFits$(I, q, o, f_{min})$;
        if $f < f_{min}$,
            set $f_{min} \leftarrow f$,
            set $I_{min} \leftarrow I$.
    Return $I_{min}$.

countFits$(I, q, o, f_{max})$
     Set $f \leftarrow 0$.
     For each $j \leftarrow 1, 2, 3, \ldots q$,
        set $s \leftarrow S_j$;
        for each $i$ in $I_s$,
            if $o \land L[i] = 0$,
                set $f \leftarrow f + 1$;
                if $f \ge f_{max}$,
                    return $f$.
    Return $f$.

countBits$(x)$ returns the number of bits set in bit field $x$ (which is another operation that most modern processors implement in silicon.) Also note that countFits only counts image fits up to a supplied maximum. (Since we are only looking for the cell with the minimum fits, counting beyond the minimum found so far is unnecessary.)

getImageListSet

The function getImageListSet(c, o) returns an image list set $I$ (i.e., a set of lists of images). List $I_s$ in $I$ is a list of all images of shape $s$ that cover $c$ with the following two restrictions:

No image in $I_s$ will conflict with an occupied adjacent neighbor cell of $c$ (where an adjacent neighbor is any of the neighbors in the six ordinal directions from $c$ that share a common side with $c$).
If the ordering heuristic follows a fixed cell selection order (according to some predetermined prioritization among the puzzle cells), then no image in $I_s$ will conflict with any puzzle cell which must previously have been filled due to this prioritized selection order.

The first property is guaranteed by NOF. The second is guaranteed by POF.

Each ordering heuristic holds a two dimensional array, $A$, of image list sets. Each entry $A_{c,z}$ is an image list set composed specifically for cell $c$ and for the occupancy state of adjacent neighbors encoded in the index variable $z$. The number $z$ is called an image list set index (ILSI). Without explaining how an ILSI is calculated, getImageListSet looks (roughly) like this:

getImageListSet$(c, o)$
    Set $z \leftarrow$ getIlsi$(c, o)$.
    return $A_{c, z}$

So you simply calculate an ILSI $z$ for cell $c$, and then return the $z$^th image list set for cell $c$ from the matrix $A$. I'm glossing over one detail here: each cell can use a different (optimized) getIlsi function. An updated (real) version of getImageListSet is given below after I've explained how the ILSI $z$ is calculated and by implication, how the associated image list set is defined.

Neighbor Occupancy Filtering (NOF) and Image List Set Indices (ILSI)

As a group the six adjacent neighbors of some cell $c$ can take on $2^6 = 64$ different compound states, and each of these states will have associated with it a different image list set. (Note that some neighbor occupancy states for some cells cannot be entered, and the associated image list sets for these states need not be populated.) We want to extract from the puzzle occupancy state $o$ just the six bits that represent the occupancy state of the six adjacent neighbors of $c$. Then we'll take those bits and repack them into a new bit field that is just 6 bits long. This six-bit bit field is our ILSI.

The ILSI is constructed as follows. The highest order bit of an ILSI (bit 5) is always loaded with the bit representing the occupancy state of the adjacent neighbor in the $-x$ direction relative to cell $c$. Similarly bits 4, 3, 2, 1, and 0 respectively are loaded with the bit representing the occupancy state of the adjacent neighbors in the $-y$, $-z$, $+x$, $+y$, and $+z$ directions. These assignments of neighbors to bit-positions in the ILSI are arbitrary, but must be consistent. If one or more of the six neighbors of $c$ are outside the puzzle bounds (and therefore are not represented by any bit in $o$), then a 1 is loaded into the corresponding bits in the ILSI (so that a zero consistently identifies an open neighbor).

This resulting six-bit bit field is then interpreted as an integer between 0 and 63, which is in turn used as the second index into the matrix $A$ to retrieve an image list set that contains all puzzle images that fill cell $c$, but do not conflict with the occupied neighbors of $c$ and (if POF filtering is possible) do not conflict with cells that must have been filled prior to the selection of $c$ as a target. This process of identifying the occupancy states of neighbors and then returning an image list set from which all images that conflict with occupied neighbors was filtered is what I mean by NOF.

Figure 3 graphically depicts this process for a 10x6 pentomino puzzle that is in the process of being solved using the F heuristic. The cells are numbered from 0 to 59. These cell numbers identify the position of the bit in the occupancy state $o$ that indicate whether the cell is filled. Knowing that the F heuristic picks open cells in order, we know that cell 0 was targeted first, then cell 3, then cell 6, and then cell 10. The next hole is cell 14 which is our current target. The adjacent neighbors of cell 14 are cells 8, 13, 15 and 20. These bits are extracted from $o$, and loaded into their pre-assigned bit positions in the ILSI. Since this is a two-dimensional puzzle, the bits in the ILSI corresponding to the neighbors in the $-z$ and $+z$ directions are each loaded with a 1. The resulting ILSI bit field has the value $111101$ which has a decimal value of 61, so the 61^st image list set for cell 14 will be returned.

Bit Number						5 4						4 8						4 2						3 6						3 0						2 4						1 8						1 2						6						0
Occupancy State	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	1	1	1	1	1	0	0	1	1	1	1	1	1	1	1	1	1	1	1	1	1

Bit Number	5	4	3	2	1	0
Neighbor Direction	$-x$	$-y$	$-z$	$+x$	$+y$	$+z$
Neighbor Location	$8$	$13$	$-$	$20$	$15$	$-$
ILSI	1	1	1	1	0	1

Figure 3. Given target cell 14 for the shown partial pentomino puzzle assembly, occupancy bits corresponding to the neighbors of cell 14 are loaded into a 6 bit field to produce an ILSI with value 61.

To do this algorithmically, we'll start by defining some additional fields for each cell $c$. Recall that $B[c]$ is a bit field with a single bit set in the same position as $c$'s occupancy bit in $o$. Define $N_{x^-}[c]$ to be the bit field $B$ of the neighbor adjacent to $c$ in the $-x$ direction, or a zeroed bit field if no such neighbor exists. Similarly define $N_{y^-}[c]$, $N_{z^-}[c]$, $N_{x^+}[c]$, $N_{y^+}[c]$, and $N_{z^+}[c]$ to be the $B$ field of the adjacent neighbors of $c$ in the $-y$, $-z$, $+x$, $+y$, and $+z$ directions respectively. With these definitions, we can now write getIlsi(c, o). Although there are far more concise ways to write getIlsi, I have found an obnoxious set of nested if statements six levels deep with hard-coded integer return values to be far faster than several other approaches I've tried. Here is a portion of one way to implement getIlsi that is quite fast:

getIlsi$(c, o)$
    Set $h \leftarrow \lnot o$.
    If $N_{x^-}[c] \land h \ne 0$,
        if $N_{y^-}[c] \land h \ne 0$,
            if $N_{z^-}[c] \land h \ne 0$,
                if $N_{x^+}[c] \land h \ne 0$,
                    if $N_{y^+}[c] \land h \ne 0$,
                        if $N_{z^+}[c] \land h \ne 0$,
                            return 0;
                        else
                            return 1;
                    else
                        if $N_{z^+}[c] \land h \ne 0$,
                            return 2;
                        else
                            return 3;
                else
                    if $N_{y^+}[c] \land h \ne 0$,
                        if $N_{z^+}[c] \land h \ne 0$,
                            return 4;
                        else
                            return 5;
                    else
                        if $N_{z^+}[c] \land h \ne 0$,
                            return 6;
                        else
                            return 7;
            else

                $\ldots$

                        else
                            return 63.

Notice that the first thing I do in this version of getIlsi is to negate the occupancy state to produce a new bit field $h$ that contains a 1 for each hole in the puzzle. This is done because an operation like $N_{z^+}[c] \land o$ will produce a zero result if either the neighbor is empty or non-existent. This behavior is not well suited for producing ILSI since we want bit positions corresponding to non-existent neighbors to be loaded with a 1 and bit positions corresponding to empty neighbors to be loaded with a 0. To avoid this ambiguity, getIlsi(c, o) instead works with the puzzle holes $h$. Then $N_{z^+}[c] \land h$ will produce a non-zero result if and only if the neighbor is unoccupied.

So the above implementation of getIlsi works fine, but the approach raises the question, “Why are we even checking the occupancy states of neighbors that don't exist?” For fixed-order heuristics, there will also be neighbors that are guaranteed to be occupied. Checking the occupancy states of those neighbors is equally wasteful.

In for a penny, in for a pound! My solver actually has not 1, but 64 different getIlsi methods (which I wrote with a little code generator I hacked out). They are named, getIlsi00, getIlsi01, $\ldots$, getIlsi63. The number in the name of each function, when interpreted as an ILSI, convey the ILSI bits that the function assumes are known to be set — and so it won't check the occupancy of the neighbors that correspond to those bit positions and simply return an ILSI with those same bits always set. These functions can be accessed through an array $G$ indexed by ILSI, so that, for example, $G_{34} =$ getIlsi34. Then, as part of initialization, each heuristic, for each puzzle cell $c$, determines the ILSI bits which must be set for the cell and composes an ILSI mask $m_c$ with these bits set and all other bits clear. Then a list $Z$ is defined for the heuristic which associates each cell with its appropriate getIlsiXX method: $Z_c = G_{m_c}$. We can now update our previous implementation of getImageListSet to use an optimized getIlsi method:

getImageListSet$(c, o)$
    Set $z \leftarrow Z_c(c, o)$.
    return $A_{c, z}$

So instead of invoking a general-purpose getIlsi method, the particular getIlsiXX method best suited for cell $c$ (referenced through the list entry $Z_c$) is invoked.

For example, for the problem depicted in Figure 3, cell 0 is associated with getIlsi57 since only the two neighbors in the +x, and +y direction (which corresponds to bits 1 and 2 of the ILSI) can possibly be open. That method's implementation looks like this:

getIlsi57$(c, o)$
    If $N_{x^+}[c] \land o \ne 0$,
        if $N_{y^+}[c] \land o \ne 0$,
            return 63;
        else
            return 61;
    else
        if $N_{y^+}[c] \land o \ne 0$,
            return 59;
        else
            return 57.

Because these specialized functions never even look at non-existent neighbors, there's no need to negate the occupancy state as was done for the general-purpose getIlsi. Since the F heuristic is being used to solve that puzzle (which guarantees that cells are targeted in their numbered order), it is always the case that cells to the left and below the target are filled. So all cells in this puzzle except those on the top row and the right-most column would be associated with getIlsi57.

Cells 5, 11, 17, …, 53, which can only possibly have an open neighbor in the +x direction, are bound to getIlsi59:

getIlsi59$(c, o)$
    If $N_{x^+}[c] \land o \ne 0$,
            return 63;
    else
            return 59.

Cells 54, 55, 56, 57 and 58 can only have an open neighbor in the +y direction and are bound to getIlsi61:

getIlsi61$(c, o)$
    If $N_{y^+}[c] \land o \ne 0$,
            return 63;
    else
            return 61.

And cell 59, which can have no open neighbors, is bound to getIlsi63:

getIlsi63$(c, o)$
    Return 63.

I know this all seems daft (see my web site name), but in my testing, these specialized getIlsiXX methods improved overall run times for some 2D puzzles by about 10% compared to a getIlsi method written in a single line that just checks all 6 neighbors and sums (or binary or's) the corresponding bit values.

So that pretty much completes the algorithm description, but I still haven't detailed exactly what I mean by priority occupancy filtering (POF). POF doesn't affect the solver algorithm at all, but it does affect how the image list sets are composed as explained in the next section.

Priority Occupancy Filtering (POF)

Consider again the partially solved puzzle shown in Figure 3. Recall that the ILSI for target cell 14 is 61. So image list set $A_{14,61}$ would be returned from getImageListSet(c, o). Exactly what images are in that set? If only NOF is applied, the answer is any bounded image that covers cell 14 but does not cover cells 8, 13, or 20 as shown in Figure 4. For ordering heuristics that target cells in an unpredictable order (like heuristics e and s in my solver), this is a complete definition of $A$.

Figure 4. For ILSI 61 of cell 14 (as per the assembly of Figure 3), NOF ensures that images loaded to image list set $A_{14,61}$ do cover cell 14, but do not cover the blackened cells 8, 13, 15, or 20.

But because in this example we're using the F heuristic (which always target cells in their numbered order), we also know that in order for cell 14 to be targeted, all cells with a smaller number must also be occupied as shown in Figure 5:

Figure 5. If a fixed order heuristic ensures that the blackened (cells 1-13) are filled before cell 14 is targeted, then POF ensures images loaded to all image list sets $A_{14,z}$ do cover cell 14, but do not cover the blackened cells.

So we can also filter from list $A_{14,61}$ all images that conflict with the black cells above. This is priority occupancy filtering: excluding from all lists $A_{c,z}$ (for all $0 \le z < 63$) any image that conflicts with cells that must have been filled before $c$ is selected as a target by a fixed-order heuristic. Understand that in this example, the cells that must be previously filled are those cells with a lower number than the target, but that's only because this example uses the F heuristic which targets the lowest numbered hole. In general the order that cells are filled by a fixed-order heuristic can vary, but POF will work with any fixed order heuristic to eliminate all images that conflict with any set of cells that must have been previously filled by that heuristic.

Combining the occupancies in Figures 4 and 5, produces the occupancy map shown in Figure 6:

Figure 6. Combining the constraints imposed by POF and NOF filtering depicted in Figures 4 and 5, yields this combined occupancy map for target cell 14 with ISLI 61. POF and NOF filtering ensures that all images loaded to $A_{14,61}$ do cover cell 14 but do not cover any blackened cell.

And so through the combined application of NOF and POF filtering, image list set $A_{14,61}$ is loaded only with those images that cover cell 14 but avoid all of the black cells in Figure 6.

If you want to know more about how to algorithmically setup these image list sets, take a look at my source code for OrderingHeuristicStore::loadImages(), OrderingHeuristic::initNeighborOccupancy(), OrderingEntity::loadImages(), and OrderingEntity::initPriorityOccupancy().

FILA Performance

We'll start by taking a macro view of the algorithms comparing the overall performance characteristics of FILA both with and without NOF enabled relative to- and in coordination with- other good puzzle solving tactics. Then we'll take a micro view to better understand the effects of NOF filtering on a per-target-cell basis. Finally I'll make some brief statements comparing the performance of this new polycube version 2.0 to the previously available polycube version 1.2.1.

Macro FILA Performance Characteristics

Figure 7 shows four puzzles used to analyze the performance of polycube 2.0 and FILA. Table 1 shows the results of several test cases run on each of these puzzles. Each series of tests starts with straight DLX using Knuth's S heuristic (which picks the cell or piece target that have fewest fit options), and with the -r option enabled to eliminate rotationally redundant solutions. (Some of these puzzles take an annoyingly long time to run without that optimization. And who wants rotationally redundant solutions anyway?) Each successive test in a test group adds one additional feature or optimization so you can see the incremental effect of each. The key below the table explains everything. A discussion of the test case results follows.

Test Puzzle P

Test Puzzle OP

Test Puzzle TC

Test Puzzle PT

Figure 7. The four puzzles used to analyze the performance of polycube 2.0 and FILA.

Table 1. Test Cases
Test Case	Command Line	Fits	$\Delta$ %	No-Fits	$\Delta$ %	Run Time (hh:mm:ss)	$\Delta$ %	Solutions
P-1	./polycube -i -q -r-- def/pentominoes_10x6.txt	892,247	-	0	-	00:00:02.082	-	2339
P-2	./polycube -i -q -r -V-- def/pentominoes_10x6.txt	768,356	-13.9%	0	0.0%	00:00:01.754	-15.8%	2339
P-3	./polycube -i -q -r -V -of=11-- def/pentominoes_10x6.txt	1,000,250	+30.2%	0	0.0%	00:00:02.050	+16.9%	2339
P-4	./polycube -i -q -r -V -of=11 -f11-- def/pentominoes_10x6.txt	2,091,215	+109.1%	13,106,789	+$\infty$%	00:00:00.168	-91.8%	2339
P-5	./polycube -i -q -r -V -of=11 -f11 -n-- def/pentominoes_10x6.txt	2,091,215	0.0%	4,682,886	-64.3%	00:00:00.157	-6.3%	2339

OP-1	./polycube128 -i -q -r-- def/pentominoes_1s_18x5.txt	1,816,931,170	-	0	-	01:14:18.016	-	686,628
OP-2	./polycube128 -i -q -r -V-- def/pentominoes_1s_18x5.txt	1,771,195,065	-2.5%	0	0.0%	01:11:25.145	-3.9%	686,628
OP-3	./polycube128 -i -q -r -V -of=17 -f-- def/pentominoes_1s_18x5.txt	13,151,493,569	+642.5%	83,733,447,441	+$\infty$%	00:23:07.672	-67.6%	686,628
OP-4	./polycube128 -i -q -r -V -of=17 -f17 -n-- def/pentominoes_1s_18x5.txt	13,151,493,569	0.0%	25,422,589,384	-69.6%	00:19:12.259	-17.0%	686,628

TC-1	./polycube -i -q -rL-- def/tetriscube.txt	30,255,329	-	0	-	00:01:52.951	-	9839
TC-2	./polycube -i -q -rL -f11 -oe=11-- def/tetriscube.txt	48,705,459	+61.0%	1,093,916,558	+$\infty$%	00:00:22.206	-80.3%	9839
TC-3	./polycube -i -q -rL -f11 -oe=11:f=3-- def/tetriscube.txt	80,346,268	+65.0%	1,526,897,959	+39.6%	00:00:19.945	-10.2%	9839
TC-4	./polycube -i -q -rL -f11 -oe=11:f=3 -n-- def/tetriscube.txt	80,346,268	0.0%	393,143,352	-74.3%	00:00:17.007	-14.7%	9839

PT-1	./polycube -i -q -r-- def/PT12.txt	207,341,751	-	0	-	00:10:45.529	-	51,184
PT-2	./polycube -i -q -r -V13-- def/PT12.txt	78,145,746	-62.3%	0	0.0%	00:03:02.362	-71.7%	51,184
PT-3	./polycube -i -q -r -V13 -f13 -oe=13-- def/PT12.txt	153,069,413	+95.9%	1,094,305,862	+$\infty$%	00:01:10.218	-61.5%	51,184
PT-4	./polycube -i -q -r -V13 -f13 -oe=13:f=3-- def/PT12.txt	185,469,244	+21.2%	1,203,943,050	+10.0%	00:01:07.707	-3.6%	51,184
PT-5	./polycube -i -q -r -V13 -f13 -oe=13:f=3 -n-- def/PT12.txt	185,469,244	0.0%	291,191,337	-75.8%	00:01:00.625	-10.5%	51,184

KEY
Test Case	P	Pentomino 10x6	All test cases were run on a Intel(R) Core(TM) i3-4130T CPU @ 2.90GHz running Unbutu Linux using only one thread on one processor).
	OP	One-Sided Pentomino 18x5
	TC	Tetris Cube
	PT	Tetromino+Pentomino 13x13 Diamond
Command Line	This is the command line you can use to reproduce the test. Two different builds of polycube were used: `polycube` uses a 64 bit occupancy bitfield. `polycube128` was built with the preprocessor definition `-DGRIDBITFIELD_SIZE=128` to produce a 128 bit occupancy field which slows FILA, but allows it to be activated earlier in the puzzle search process. The command line options used are summarized below. Additional details of these and other command line options can be found by running polycube with the --help option, or reading README.txt.
	-i	info: Turns on informational output including statistics and performance measurements.
	-q	quiet: Turns off solution output (so as not to impact performance measurements).
	-r	redundancyFilter: Attempts to eliminate rotationally redundant solutions by constraining the position and/or rotation of one uniquely shaped puzzle piece. If no argument is given, a piece is chosen for you; or you can supply the name of a piece to attempt to pick a better piece yourself.
	-V	volumeFilter: With no arguments (as used here), before the search for puzzle solutions begins, every bounded image is considered to see if placing it will partition the puzzle region into two or more isolated subregions with at least one of those subregions having a volume that cannot be matched by any subset of the remaining pieces. Each such image found is filtered out (removed).
	-f	fila: Activates FILA every time N pieces remain to be placed. All solves start with DLX. Each time the number of remaining pieces hits the number N, a FILA data model of the remaining open space and remaining pieces (as modeled by the DLX matrix) is constructed, DLX is deactivated, and FILA is activated. When FILA has completed exploration of this sub-puzzle, DLX continues where it left off.
	-o	order: Sets a colon separated list of ordering heuristic configurations, H. For example, e=11:f=5 activates the estimated-most-constrained-hole heuristic when 11 pieces are left; and activates the first-hole heuristic when 5 pieces are left.
	-n	nof: Enables neighbor occupancy filtering (NOF).
No Fits	The number of times an algorithm attempts to place a piece in the puzzle only to find it doesn't fit.
Fits	The number of times an algorithm successfully places a piece.
Run Time	The run time of the program in hours minutes and seconds. This is the total program run time including program load, puzzle parsing, puzzle and solver initialization, the solve itself, and all cleanup time. This detail is not really important since in all cases the solve took more than 99% of the run time.
Solutions	The total number of rotationally unique solutions found.
$\Delta$ %	The incremental percent change of the statistic to the left from the previous row to the current row.

Test Case P: Pentominoes in a 10x6 Rectangle

The first set of test cases (P) operate on the 10x6 pentomino puzzle. Test case P-1 uses DLX with the (default) S heuristic enabled and the the rotational redundancy filter enabled. Of the $63 \times 60 = 3780$ piece images that could possibly be placed anywhere in the puzzle, only 2056 are bounded to the 10x6 puzzle region, and so the DLX matrix begins with 2056 rows. The rotational redundancy filter selects a uniquely shaped piece (if available) to rotationally and/or translationally constrain to prevent rotationally redundant solutions from ever being discovered and (as a beneficial side-affect) to significantly reduce program run times. In this case it chooses to constrain piece X. Originally, piece X has 32 images that fit in the puzzle. After constraint, only 8 images positioned in the lower-left quadrant of the puzzle remain (reducing the total number of rows in the matrix to 2032). This is identical to how Fletcher and de Bruijn started their algorithms, save that they placed the X piece in the top-left quadrant, and excluded the image of the X piece jammed into the corner (which obviously can lead to no solutions.) Because the X piece now has so few fit options it becomes the first target of DLX, so the algorithm begins by placing one of these 8 images as the first step — again just like Fletcher and de Bruijn. With this configuration, DLX finds all 2339 solutions in 2.082 seconds.

Test case P-2 adds a one-time application of the volume constraint filter to all images as a preliminary step of solver processing. This filter examines the placement of each image in the DLX matrix (one at a time) to determine if it results in a partitioning of the puzzle into two or more subregions where at least one subregion has a volume that cannot possibly be equaled by any combination of the remaining pieces. If so, that image is discarded from the DLX matrix. For this puzzle, this eliminates 125 of the 2032 bounded images or about 6.2%. Among these is one of the remaining images of the X piece that was jammed into the lower-left corner of the puzzle, reducing the number of images of piece X to 7 (which completes the replication of the starting conditions used by Fletcher and de Bruijn). This filtering took 1.3 msec, of processing, but reduced the total run time by 336 msec, or by 15.8% — a good investment.

Test case P-3 disables the default DLX S heuristic (which always picks a column from the DLX matrix with a minimum number of entries), and enables the F heuristic. My F heuristic when applied to DLX is identical to Fletcher's F heuristic except that (like all of my DLX heuristics) have an overriding behavior of always selecting a DLX matrix column with zero or one 1s over any other target normally selected by the heuristic. This increased the run time back up to 2.050 seconds. This was a bad idea for this puzzle: sometimes DLX performance can be improved with an ordered fill (like that enforced by the F heuristic), but not for such a small puzzle. The motivation for this test case was to allow an apples-to-apples comparison between DLX and FILA with test case P-4.

Test case P-4 enables FILA each time 11 pieces are remaining. (So I still use DLX to first place the X piece in one of 7 positions in the lower left quadrant, but then FILA is used to place the remaining 11 pieces. Currently, no available FILA heuristic can select a piece as a target — only cells are selected, so I still always use DLX to place at least one piece when using the rotational redundancy filter.) FILA runs about 12 times faster than DLX using the F heuristic finding all 2339 solutions in just 0.168 seconds. This is despite the fact that I let DLX cheat by picking columns of size 0 (which leads to an immediate backtrack) or size 1 (where there is but one fill choice) over the first cell normally picked by the F heuristic.

In test case P-5, I enabled NOF filtering (POF cannot be turned off). First notice that the number of images that failed to fit in the puzzle was reduced by almost two-thirds (64.3%), from 13.1 million down to just 4.7 million. The elimination of these 8.4 million useless fit checks saved an additional 11 msec of processing time bringing the total run time down by 6.3% to just 0.157 seconds.

Test Case OP: One-sided Pentominoes in an 18x5 Rectangle

The second set of test cases (OP) examines the problem of placing the 18 one-sided pentominoes in an 18x5 box as shown in Figure 7. The set of one-sided pentominoes are the set of pentominoes unique under rotation in the plane but not reflection. For these tests I compiled the solver so that FILA uses a 128-bit occupancy bit field. This is obviously a little slower, but does allow FILA to be used for the entire puzzle. (With the default 64 bit occupancy bit field, only the last 12 pieces could be placed with FILA. This is probably all you really need since for these types of puzzles, the vast majority of the work is typically done placing the last several pieces, and so it only really matters that FILA be active for these last pieces. But for these test cases, to keep things clear and simple, I wanted to show how FILA performs on the whole puzzle.)

Test case OP-1 again uses DLX with the rotational redundancy filter to find all 686,628 solutions in 1 hour 14 minutes 18 seconds.

In test case OP-2, the volume filter is added to reduce the run time by 3.9% down to 1 hour 11 minutes 25 seconds.

In test case OP-3, FILA was activated using the F heuristic when 17 pieces remain, reducing the run time by 68% down to just 23 minutes 8 seconds.

In test case OP-4, NOF was enabled which reduced the number of non-fitting images by 70% and the overall run time by an additional 17% down to 19 minutes 12 seconds.

Test Case TC: Tetris Cube

The third set of test cases examines the Tetris Cube puzzle. This puzzle has 12 oddly shaped pieces that must be placed in a 4x4x4 box as shown in Figure 8.

Figure 8. The twelve pieces of the Tetris Cube puzzle (left) shown (almost) assembled in a 4x4x4 box (right).

Test case TC-1 starts like other test cases with DLX, the default S heuristic, and the redundancy filter enabled. Notice that in the TC test group, I supplied the argument L to the -r option. This deserves some explanation. The redundancy filter with no argument given picks a uniquely shaped piece that does the best job of eliminating rotationally redundant solutions by rotationally and/or translationally constraining that piece. If multiple pieces are equally effective in this regard, then it picks a piece among those candidates that have the minimum number of constrained images. I no longer think this is the best rule to use as a tie-breaker. I now believe that instead picking a piece that is large and/or complicated may be a better choice.This makes intuitive sense — it's easier to place the large hard pieces first, and then fill in the smaller more flexible pieces around the big complicated piece, than to place the little easy pieces first and then hope you happen to form a void that the large complicated piece happens to fit nicely into. For example, people who have spent significant time playing with pentomino puzzles know that piece X is difficult to place, and hence it makes a good choice for constraint. I'm not sure how to gauge what makes a piece 'complicated', and so I have not yet tried to modify polycube's selection criteria. In any case, my auto-selection routine, does not always make the best choice for the piece to rotationally and/or translationally constrain to eliminate rotationally redundant solutions. Stephan Westen discovered that piece L in the tetris cube is a much better choice, so in this example I'm passing piece L as the argument to the -r option to force the redundancy filter to constrain that piece to eliminate rotationally redundant solutions. Under this configuration, the solver found all 9839 solutions in about 1 minute 58 seconds. Pieces were placed in the box (and subsequently removed) around 30 million times.

Test case TC-2 enables FILA after DLX places the first piece, and also switches from the S heuristic to my E heuristic at the same time (as described above). Notice that the number of fits increases by about 60% to 49 million indicating the E heuristic does not do as good a job of picking the minimum fit target. The number of no-fit images found by searchFila also increases from 0 to over a billion, and this does not count the vastly larger number of fit checks performed by the e-heuristic itself. But despite these extensive activities and degraded ability to pick the minimum fit target, use of FILA with the E heuristic yields a better than 5-fold increase in solver performance, reducing the total run time by 80.3% down to just 22.2 seconds.

Test case TC-3 swtiches from the E heuristic to the much lighter weight F heuristic when only 3 pieces remain. All the time spent counting neighbor holes at each remaining cell, and then doing fit-counts for those candidate cells with a minimum number of open neighbors just can't pay off when there are so few pieces left. You are better off just placing pieces as fast as you can with the light weight F heuristic. This again increases the number of image fits by 65% to 80 million, and the number of no-fit images to 1.5 billion, but actually reduces the run time by another 10.2% down to 19.9 seconds.

Test case TC-4 enables NOF filtering, which reduced no fit images by 74% from 1.5 billion down to just 393 million. Note that NOF not only reduces the number of fit checks made by searchFila, but also the fit-checks made by the E heuristic. This efficiency in image processing reduced run time by 14.7% to just 17.0 seconds.

Test Case PT: Pentominoes + Tetrominoes in a 13x13 Diamond

Test case PT examines the problem of placing the 12 pentominoes and 5 tetrominoes into a diamond shaped puzzle measuring 13 squares wide and 13 high as shown in Figure 7. Five squares are eliminated from the center to achieve the correct volume.

Test case PT-1 uses DLX, with the S heuristic, and the redundancy filter enabled. All 51,184 unique solutions are discovered in about 10 minutes 46 seconds.

Test case PT-2 enables the volume filter, but instead of only applying the filter once at the beginning, the volume filter is reapplied after every piece placement until fewer than 13 pieces remain to be placed. This technique is particularly effective on this puzzle because the jagged puzzle edges and the central island make it susceptible to partition. This produced a better than 3-fold improvement in solver speed, reducing the total run time by 72% down to just 3 minutes 2 seconds.

Test case PT-3 enabled FILA and the estimate heuristic when 13 pieces remain, reducing the run time by another 62% down to just 1 minute 10 seconds.

Test case PT-4 uses the lighter weight F heuristic for the last 3 piece placements giving an additional small performance improvement of 3.6%, reducing the total run time by another 2 seconds down to 1 minute 8 seconds.

Test case PT-5 enabled NOF. No fit images were reduced by 76% (the largest percent reduction seen over all test cases examined in this document), and run time was reduced by another 10.5%, down to 1 minute 1 second.

Micro FILA Performance Characteristics

Let's focus on just test case P-4 where FILA was used with the F heuristic without NOF enabled to solve the pentominoes 10x6 problem. The informational output from that run includes the following:

# Number of placement attempts when N pieces were left to be placed:
ATTEMPTS[ 1]=        301677
ATTEMPTS[ 2]=       3478035
ATTEMPTS[ 3]=       5722296
ATTEMPTS[ 4]=       3665538
ATTEMPTS[ 5]=       1284992
ATTEMPTS[ 6]=        386776
ATTEMPTS[ 7]=        200366
ATTEMPTS[ 8]=        126819
ATTEMPTS[ 9]=         28279
ATTEMPTS[10]=          3088
ATTEMPTS[11]=           131
ATTEMPTS[12]=             7

# Number of fits when N pieces were left to be placed:
FITS[ 1]=              2339
FITS[ 2]=            302256
FITS[ 3]=            760374
FITS[ 4]=            617667
FITS[ 5]=            272072
FITS[ 6]=             82406
FITS[ 7]=             26950
FITS[ 8]=             17275
FITS[ 9]=              7994
FITS[10]=              1744
FITS[11]=               131
FITS[12]=                 7

This output gives, as a function of the remaining number of pieces $p$, the number of times the solver tried to place a piece in the puzzle (ATTEMPTS), and how many times it actually succeeded in placing a piece in the puzzle (FITS). We can learn a lot from this information through some simple calculations. This program output is transcribed to the second and third columns of Table 2. Subtracting fits from attempts gives the no-fits information in the fourth column. Note that each time a piece is successfully placed in the puzzle when, say, 5 pieces were left, produces a recursive invocation of solveFila with the number of remaining pieces $p$ reduced by 1 to 4. So by dividing the total number of attempts when 4 pieces were left (3,665,538) by the total number of fits when 5 pieces were left (272,072), gives the average number of piece fitting attempt events per cell (or piece) targeted by a single recursive call solveFila when $p = 4$: (3,665,538 / 272,072 = 13.473). Similarly, dividing the total number of fits or no-fits when $p$ pieces are left by the total number of fits when $p+1$ pieces are left, yields the number of times a piece fit or (respectively) didn't fit per target when $p$ pieces were left. This information is tabulated in the last three columns of Table 2 as attempts-per-target, fits-per-target, and no-fits-per-target.

$p$	Attempts Total	Fits Total	No-Fits Total	Attempts per Target	Fits per Target	No-Fits per Target
1	301,677	2339	299,338	0.998	0.008	0.990
2	3,478,035	302,256	3,175,779	4.574	0.398	4.177
3	5,722,296	760,374	4,961,922	9.264	1.231	8.033
4	3,665,538	617,667	3,047,871	13.473	2.270	11.202
5	1,284,992	272,072	1,012,920	15.593	3.302	12.292
6	386,776	82,406	304,370	14.352	3.058	11.294
7	200,366	26,950	173,416	11.599	1.560	10.039
8	126,819	17,275	109,544	15.864	2.161	13.703
9	28,279	7,994	20,285	16.215	4.584	11.631
10	3,088	1,744	1,344	23.573	13.313	10.260
11	131	131	0	18.714	18.714	0.000
12	7	7	0	7.000	7.000	0.000

Table 2. Piece placement attempt, fit, and no-fit statistics for test case P-4.

$p$	Attempts Total	Fits Total	No-Fits Total	Attempts per Target	Fits per Target	No-Fits per Target
1	78,883	2,339	76,544	0.261	0.008	0.253
2	1,346,664	302,256	1,044,408	1.771	0.398	1.374
3	2,551,252	760,374	1,790,878	4.130	1.231	2.899
4	1,775,873	617,667	1,158,206	6.527	2.270	4.257
5	660,506	272,072	388,434	8.015	3.302	4.714
6	196,017	82,406	113,611	7.273	3.058	4.216
7	80,724	26,950	53,774	4.673	1.560	3.113
8	62,568	17,275	45,293	7.827	2.161	5.666
9	19,118	7,994	11,124	10.962	4.584	6.378
10	2,358	1,744	614	18.000	13.313	4.687
11	131	131	0	18.714	18.714	0.000
12	7	7	0	7.000	7.000	0.000

Table 3. Piece placement attempt, fit, and no-fit statistics for test case P-5.

Table 3 gives the same information as Table 2 but for test case P-5 where NOF was enabled. Compare tables 2 and 3 to verify that NOF doesn't affect fits at all — rather it only reduces the number of images that don't fit in the puzzle that must be processed by each recursive invocation of solveFila. Comparing the last column of Table 2 with the last column of Table 3, shows the level to which NOF filtering reduces the number of no-fit images for each invocation of solveFila. As can be determined from column fits-total, over 93% of solveFila invocations are for $p$ values from 1 to 5. In this range, when NOF is enabled, the total number of images that must be considered is (in the worst case) only about 8. Of these 8, less than 5 are no-fit images. So of the original 63 images that de Bruijn examined at every recursive step of his algorithm, FILA with NOF only has to look at 8, and almost half of these do actually fit. Why so few images?

Many images are eliminated on a cell-by-cell basis due to puzzle bounds considerations: of the $63 \times 60 = 3780$ images that one could try to place at the 60 puzzle cells, only 2056 are bounded by the puzzle walls. The rotational redundancy filter reduces the number of allowed placements of the X piece by 24 (from 32 down to 8). The volume filter eliminates another 125 images (including one of the remaining X images), reducing the total number of images down to $2056-24-125=1907$. Recall that DLX is used to place the X piece in one of 7 starting locations in the lower left quadrant. Each such placement eliminates a large number of images (primarily from the left side of the puzzle) from the DLX matrix. For example, the last such placement (when the X piece is placed very close to the center of the puzzle) leaves the matrix with only 1101 images. These 1101 images are then used to populate the matrix of image list sets $A$ used by FILA. POF filtering would nominally keep the total number of images in each image list set to 63, but all of the other reductions to this point makes these lists on average much smaller. For the case where only 1101 DLX images remain after placing the X piece for the last time, POF reduces the average number of images per image list set over the remaining 55 holes to just $1101 / 55 \approx 20.0$. This average of 20 is not typical since, for example, the image list sets for cells near the X piece and near the right border wall will have fewer images. Likewise cells just to the right of the X will have significantly more than the average 20. When work is most intense (when 4 pieces are left) about an additional 8/12 (67%) of these images are discounted simply because 8 pieces are unavailable (and so their images are never attempted). Finally, NOF filtering reduces the no-fit images in the image list sets by (on average) another 64.3% (as seen from test case P-5 in Table 1) to produce the overall sizes seen here in Table 2.

After all this reduction, remember that each of the few remaining no-fit images that still have to be considered, are ruled out by a single machine instruction that performs a binary-and of the puzzle occupancy state with the layout-bit-mask of the image (line 11 of solveFila). I am skeptical, therefore, that more specialized NOF image lists based on the occupancy states of additional neighbors near the target cell, could possibly reduce no-fit images in sufficient number that the resulting reduction in image fit-check processing time could outweigh the increased processing time needed to calculate the more detailed ILSI. And this does not even consider the increased initialization times to generate the larger number of image list sets in $A$ (which grows by a factor of 2 for each additional neighbor considered).

Tables 4 and 5 provide the same piece-by-piece and per-target statistics for test cases TC-3 and TC-4 of the Tetris Cube. I won't bore you with as much detail here, but I wanted to impress upon you the usefulness of NOF when used in combination with the E heuristic and in higher dimensional puzzles (3D instead of 2D). Compare, for example, the number of no-fits-per-target when 4 pieces remain from tables 4 and 5: by enabling NOF, the number of no-fit images is reduced from 27.7 all the way down to 3.2 — a reduction factor of 8.6. There are two reasons for this this large reduction. First, the E heuristic is not an ordered heuristic, so no POF filtering is possible. Where for the pentominoes puzzle, there are only 63 images to choose from to populate the image list sets; the lack of POF filtering, and the increased rotational freedom results in 1416 Tetris Cube piece images that could possibly populate each image list set at a cell. Again, puzzle boundary considerations, the rotational redundancy filter, and the placement of the first piece by DLX, will drastically reduce the numbers of available images by the time FILA is actually activated, but we are still left with much larger image list sets. Because the E heuristic always targets a cell with a maximum number of occupied (or non-existent) neighbors, it naturally targets cells that produce ILSI with many bits set, for which NOF filtering is most effective. Because no POF filtering is possible, all filtering is due to NOF — which makes NOF just all that more useful for heuristics that don't follow a fixed targeting order.

$p$	Attempts Total	Fits Total	No-Fits Total	Attempts per Target	Fits per Target	No-Fits per Target
1	28,301,608	9839	28,291,769	4.036	0.001	4.034
2	366,812,658	7,012,542	359,800,116	11.227	0.215	11.012
3	312,127,901	32,673,158	279,454,743	19.410	2.032	17.378
4	430,889,742	16,080,756	414,808,986	28.735	1.072	27.663
5	323,871,972	14,995,303	308,876,669	45.549	2.109	43.440
6	117,222,724	7,110,405	110,112,319	57.760	3.504	54.257
7	24,545,944	2,029,464	22,516,480	64.332	5.319	59.013
8	3,200,630	381,549	2,819,081	65.828	7.847	57.981
9	256,931	48,621	208,310	59.337	11.229	48.109
10	13,816	4330	9486	47.806	14.983	32.824
11	289	289	0	24.083	24.083	0.000
12	12	12	0	12.000	12.000	0.000

Table 4. Piece placement attempt, fit, and no-fit statistics for test case TC-3.

$p$	Attempts Total	Fits Total	No-Fits Total	Attempts per Target	Fits per Target	No-Fits per Target
1	6,100,719	9,839	6,090,880	0.870	0.001	0.869
2	141,992,570	7,012,542	134,980,028	4.346	0.215	4.131
3	181,405,280	32,673,158	148,732,122	11.281	2.032	9.249
4	64,378,296	16,080,756	48,297,540	4.293	1.072	3.221
5	52,535,933	14,995,303	37,540,630	7.389	2.109	5.280
6	21,216,801	7,110,405	14,106,396	10.454	3.504	6.951
7	5,013,804	2,029,464	2,984,340	13.141	5.319	7.822
8	763,187	381,549	381,638	15.697	7.847	7.849
9	77,260	48,621	28,639	17.843	11.229	6.614
10	5,469	4,330	1,139	18.924	14.983	3.941
11	289	289	0	24.083	24.083	0.000
12	12	12	0	12.000	12.000	0.000

Table 5. Piece placement attempt, fit, and no-fit statistics for test case TC-4.

Performance Comparison of polycube 2.0 and polycube 1.2.1

Although NOF does seem to consistently provide a significant performance improvement, there were other software implementation changes that provided even greater performance benefits. Most significantly, the old EMCH algorithm counted open neighbors one neighbor at a time. FILA's new E heuristic uses either a silicon based bit population count instruction (if available) or table-based lookups to count the number of neighbor holes at each cell which is far faster. Similarly, my old variation of the de Bruijn algorithm iterated over the heads of the DLX matrix to find unoccupied cells. It did remember where it last left off (so it wasn't starting from the beginning with each request), but FILA's new F heuristic more efficiently iterates over the occupancy bit field looking for zeroes. It can use silicon based instructions (if available) or table lookups to do this efficiently. These and other small optimizations together with NOF have improved the solve times for some puzzles by almost a factor of two. For example, the best solve time I can produce with polycube version 1.2.1 for the Tetris Cube is 31.6 seconds. Version 2.0 solves the same puzzle on the same machine in just 17.0 seconds (as noted above), a reduction in run time of 46%. 2D puzzle performance (for which the E heuristic is not typically most useful), has improved by a lesser, but still significant amount. For example the best solve times for the pentominoes 10x6 puzzle have improved on my machine from about 0.212 seconds down to 0.156 seconds — a 26% improvement.

Future Work

FILA Ordering Heuristics that Target Pieces

There's nothing in the FILA-heuristic interface that precludes a heuristic from targeting a piece (rather than a cell) and returning an image list set that lists only images for a single target piece. Unfortunately, if a heuristic is targeting pieces, then it cannot be a fixed-order heuristic, which means you can't use POF to gradually reduce the size of the image list sets that target pieces as the puzzle is filled in. And NOF filtering alone would be almost completely useless for reducing the size of image list sets that target pieces. So I don't immediately see a way to, for example, extend the estimate heuristic to efficiently identify pieces that have few fits and/or identify a precalculated image list targeting such a piece that is well filtered to the current puzzle state. Still, there may be times that targeting pieces could be useful, even if it requires much work to identify the target (e.g., fit-counting across all images of a piece), and even if the returned image list set is not well filtered (e.g., just return all images of the piece with no filtering at all). Such an approach might still perform favorably compared to DLX within some limited range of puzzle sizes.

Also, for the special case where the puzzle solve is just getting started ($p = P$), every image in all image list sets are guaranteed to fit. So the S and E heuristics could easily be modified to notice that $p = P$ and instead of counting fits or neighbor holes, just look at the list sizes. Image list sets that target pieces could be included specifically for this situation, so that FILA could target, for example, a piece that's been highly constrained to eliminate rotationally redundant solutions. This would enable FILA to fully emulate de Bruijn and Fletcher for the 10x6 pentomino problem by identifying that the X piece should be placed first. It is not clear to me, however, that there would be any advantage to this approach over my current approach of always using DLX to place the first piece of a puzzle (other than the elimination of DLX itself — which is obviously no small simplification.) In fact, it is my expectation that such an approach would be inferior to the current approach of using the combination of algorithms: In the 10x6 pentomino problem, placing the X piece smack in the middle of the puzzle causes DLX to eliminate many images. Using this reduced image set as a feed for the initialization of the fixed image list sets used by the FILA F heuristic is highly advantageous and this is not a behavior that a pure FILA approach to the problem can readily replicate.

For now, I leave the subject of defining FILA ordering heuristics that can target pieces as a problem for future investigation.

Code Generation

I have not definitively answered the question of whether FILA is faster than Fletcher's algorithm for the 10x6 pentomino problem. I have not even taken the time to translate Fletcher's original program to a modern programming language. But even if I did, a comparison between that program and polycube wouldn't really be a fair comparison of Fletcher's algorithm and FILA: Fletcher's program is hard coded to solve the 10x6 pentomino puzzle which has several advantages:

much indirection and array indexing can be eliminated;
for-loops can be unrolled;
simplifications are made possible due to all pieces being the same size; and
simplifications are made possible due to all pieces having a different shape.

Because polycube is a general puzzle solver, it is necessarily more cumbersome. As a result, Fletcher's program is not only simpler, but also smaller, which allows for better CPU caching. So if someone were to compare a direct translation of Fletcher's published software to polycube 2.0 and reported Fletcher's software faster I would be neither surprised, nor deterred.

To make a more-fair comparison, I could add some generalization of Fletcher's algorithm to polycube. This would not only require finding an algorithm to efficiently assign images to a search tree, it also would require (I think) a new data model since Fletcher's algorithm requires checking the occupancy of cells just outside the puzzle boundary — something polycube doesn't currently allow. I guess I'm not interested in such an endeavor — especially since the results would not necessarily be definitive.

Alternatively, one could write a code generator that takes a puzzle as input and outputs a FILA solver program that's hard-coded and highly optimized to the particulars of the input puzzle. Such a program generated for the 10x6 pentomino problem could, I think, then be fairly compared to Fletcher's original hard-coded program for the same puzzle (with only those minimal modifications needed to translate it to a modern programming language). I suspect such an optimized FILA solver could be made to run far faster than Fletcher's original program. The more I think about this, the less hard it seems like it would be to do. Maybe I'll try it some time — not to prove FILA faster than Fletcher at pentominoes 10x6 (something I already believe to be true), but rather to be able to solve harder puzzles more efficiently.

Applications to Other Geometries

polycube 2.0 is restricted to puzzles whose cells fall on the integer lattice points of a two or three dimensional Cartesian coordinate system, but there is nothing about the FILA algorithm or POF or NOF filtering that is limited in this way. (The calculation of the ILSI given here does talk about the nearest neighbors in the 6 ordinal directions, but in general a neighbor can be in any direction, and the mapping of those neighbors to ILSI bits is arbitrary.) Other puzzle geometries, like polyiamonds, polyhexes, and polysticks could also be solved with a FILA software application that was suitably abstracted to service those geometries.

Software Download

This software is protected by the GNU General Public License (GPL) Version 3. See the README.txt file included in the zip file download for more information.

WINDOWS 64 bit: polycube_win_64_2.0.2.zip

Contents:

LICENSE.txt (GNU General Public License Version 3)
README.txt (Copyright, build and run instructions)
RELEASE_NOTES.txt (Summary of changes for each release)
polycube.exe (polycube solver executable for Windows 64 bit processors)
Several sample puzzle definition files including all puzzles used in my web docs, and others.
polycube 2.0 C++ source code.
A small subset of boost c++ library source (only those packages used by polycube).
double precision SIMD oriented Fast Mersenne Twister (dSFMT) source code (for random number generation).

LINUX / UNIX: polycube_2.0.2.tgz

Contents: same as for Windows, but no executable is provided, and all text files are carriage return stripped.

The source is about 16,000 lines of C++ code, with dependencies on two other libraries (boost and the Mersene Twister random number generator) which are also included in the download. The executable file polycube.exe is a Windows console application (sorry, no GUIs folks). For maximum platform compatibility, the provided executable has NOT been compiled to use g++ builtin bit-field operations __builtin_popcount() or __builtin_ctz(). If you make the effort to compile for your own hardware (see README.txt), you should see a moderate performance improvement to FILA. I've seen 8% to 15% depending on the puzzle and the heuristics used.

Conclusions

FILA is a fast flexible recursive backtracking algorithm that uses precalculated (fixed) lists of images that are pre-filtered to exclude images incompatible with the cells location, incompatible with cells that must have been previously filled by a heuristic (POF), or incompatible with occupancy states of the nearest neighbors of the targeted cell (NOF).

Fletcher and de Bruijn used a fixed list of 63 pentomino images that were considered for placement at each targeted cell in the 10x6 pentomino problem, but many of these images collide with puzzle walls. By using a separate list of images at each cell, images that lie partially outside the puzzle bounds can be eliminated. For this puzzle, this reduces the number of images in each list by on average 45.6%.

Fletcher and de Bruijn recognized that by filling a puzzle from left to right using a strict cell selection order, the number of images that had to be considered at each cell was greatly reduced (80%). POF filtering generalizes this technique to any heuristic that targets cells in a predetermined order, eliminating all images that conflict with cells that must have been filled prior to the targeted cell.

Instead of considering all images in a set one-at-a-time, Fletcher walked the cells near a fill target to eliminate images in groups. Instead of walking the whole tree (sections of which often correspond to regions outside the puzzle boundary, or to pieces that are not even available), NOF focuses on just the most important nearest neighbor cells, aggregating their occupancy states into a small index number used to select a set of images built specifically for that compound neighbor occupancy state. This approach eliminates on average an additional two-thirds to three-fourths of the images that don't fit the puzzle.

The combination of these three strategies eliminates the vast majority of images that don't fit the target. For example, for the solver configuration that produced fastest solve times for the Tetris Cube, the F heuristic (Fletcher's heuristic) was used when 3 pieces were left to be placed. The average number of images that had to be considered by a single recursive invocation of the algorithm at that stage was just 11.3; and the number of these images that didn't fit was just 9.2. This is as compared to the 1,416 unique tetris cube piece images that would populate these lists if no filtering was used at all. NOF filtering is particularly useful for unordered heuristics (where POF filtering is not possible). For the same tetris cube solver configuration, the E heuristic was used when 4 pieces were left to be placed. The average number of images that had to be considered at this stage was only 4.3 with only 3.2 of those images not fitting. So through these simple techniques the ~~lion's~~ whale's share of images that don't fit are eliminated from the algorithm at the cost of checking the occupancy states of at most a few cells. (This is as compared to DLX's higher-cost approach of dynamically maintaining perfect image lists with every piece placement or removal.) The net effect is faster solve times. The NOF feature alone improved solve times 6% to 17% for the puzzles examined here (though I have seen as high as 27% in other puzzles).

References

D. Knuth. Dancing Links. In J. Davies, B. Roscoe, and J. Woodcock, editors, Millennial Perspectives in Computer Science, proceedings of the 1999 Oxford-Microsoft Symposium in honour of Professor Sir Antony Hoare, Cornerstones of Computing, page 432. Palgrave Macmillan, 2000.
J.G. Fletcher. A program to solve the pentomino problem by the recursive use of macros. Communications of the ACM, 8(10):621–623, 1965.
N.G. de Bruijn. Programmeren van de pentomino puzzle. Euclides 47 (1971/72), 90-104.
M. Busche. Solving Polyomino and Polycube Puzzles; Algorithms, Software, and Solutions. Matt's Maniacal Musings, 2011.

Optimal Play of the Farkle Dice Game

matt — Fri, 28 Jul 2017 04:52:48 +0000

I sent a link to my previous blog post on the optimal play of Farkle to Professor Todd Neller, at Gettysburg College. (I thought he might be interested in it since it was largely based on his previous analysis of the simpler dice game Pig.) We ended up talking and decided to write a paper together on optimal Farkle play. Todd presented our paper at The 15th Advances in Computer Games Conference (ACG 2017), Leiden, Netherlands, July 4, 2017. Our paper was voted second place in the best paper competition.

The paper focuses on a more minimalist rule set (whereas my previous blog post solved for facebook farkle rules). The optimization equations are much simplified by using a pair of self-referential equations describing pre-roll and post-roll game states. The paper also includes a comparison of optimal play vs max-expected-score play, a mechanism allowing a human to perfectly replicate max-expected-score play, and some simple techniques you can use to win over 49% of your games against an optimal player.

As of the time of this post, the proceedings from the conference have not yet been published, but a link to our paper is provided here for your convenience:

Optimal Play of the Farkle Dice Game

There are some POV-Ray images included in the paper that graphically show the game states from which you should roll. For your viewing pleasure, I've included below links to the images in their original 16 mega-pixel detail.

Maximizing Win Probability in the Game of Farkle

matt — Tue, 02 Aug 2016 20:35:01 +0000

Image by Matt Busche using modified povray source by Piotr Borys

Neller and Presser modeled a simple dice game called pig as a Markov Decision Process (MDP) and used value iteration to find the optimal game winning strategy¹. Inspired by their approach, I've constructed a variant of an MDP which can be used to calculate the strategy that maximizes the chances of winning 2-player farkle. Due to the three consecutive farkle penalty, an unfortunate or foolish player can farkle repeatedly to achieve an arbitrarily large negative score. For this reason the number of game states is unbounded and a complete MDP model of farkle is not possible. To bound the problem, a limit on the lowest possible banked score is enforced. The calculated strategy is shown to converge exponentially to the optimal strategy as this bound on banked scores is lowered.

Each farkle turn proceeds by iteratively making a pre-roll banking decision, a (contingent) roll of the dice, and a post-roll scoring decision. I modified the classic MDP to include a secondary (post-roll) action to fit this turn model. A reward function that incentivizes winning the game is applied. A similarly modified version of value-iteration (that maximizes the value function for both the pre-roll banking decision, and the post-roll scoring decision) is then used to find an optimal farkle strategy.

With a lower bound of -2500 points for banked scores, there are 423,765,000 distinct game states and so it is not convenient to share the entire strategy in printed form. Instead, I provide some general characterizations of the strategy. For example, if both players use this same strategy, the player going first will win 53.487% of the time. I also provide samples of complete single-turn strategies for various initial banked scores. Currently, only the strategy for Facebook Farkle has been calculated, but the strategy for other scoring variants of farkle could easily be deduced using the same software.

The Rules of Farkle

Markov Decision Processes and Value Iteration

Extending the MDP to Support Farkle

Game State Characterization

The Farkle Value Iteration Equation

Performance

The Strategy

Optimal Strategy Case: $b = 0, d = 0, f = 0, e = 0$

Optimal Strategy Case: $b = 6000, d = 8000, f = 0, e = 0$

Optimal Strategy Case: $b = 8000, d = 6000, f = 0, e = 0$

Optimal Strategy Case: $b = 9000, d = 9500, f = 0, e = 0$

Optimal Strategy Case: $b = 9500, d = 9000, f = 0, e = 0$

Strategy Validation

Comparison to strategy that maximizes expected turn score

Effects of the banked score lower bound

Conclusions

Next Steps

References

The Rules of Farkle

Farkle rules differ only slightly from the rules of Zilch, but are provided here for completeness.

Farkle is played with two or more players and six six-sided dice. Each player takes turns rolling the dice. The dice in a roll can be worth points either individually or in combination. If any points are available from the roll, the player must set aside some or all of those scoring dice, adding the score from those dice to their point total for the turn. After each roll, a player may either re-roll the remaining dice to try for more points or may bank the points accumulated this turn (though you can never bank less than 300 points). When a player banks his points, the player's turn is ended and the dice are passed to the next player.

If no dice in a roll score, then the player loses all points accumulated this turn and their turn is ended. This is called a farkle, a sorrowful event indeed.

If all dice in a roll score, the player gets to continue his turn with all six dice. This is called hot dice and is guaranteed to brighten your day.

A player may continue rolling again and again accumulating ever more points until he either decides to bank those points or loses them all to a farkle.

If a player ends three consecutive turns with a farkle, they not only lose their points from the turn but also lose 500 points from their banked game score. (This is the only way to lose banked points.) After a triple farkle, your consecutive farkle count is reset to zero so you're safe from another triple farkle penalty for at least three more turns.

The game ends when one player has banked a total of 10,000. Unlike zilch, other players do not get a final turn.

Scoring is as follows:

Each 1 is worth 100 points.
Each 5 is worth 50 points.
A set of three 1s is worth 1000 points.
A set of three 2s is worth 200 points.
A set of three 3s is worth 300 points.
A set of three 4s is worth 400 points.
A set of three 5s is worth 500 points.
A set of three 6s is worth 600 points.
Unlike zilch, each extra die in a set increases the value of the set by a like amount. So four 4s are worth 800 points, five 5s are worth 1500, and six 1s are worth 4000.
Three pair is worth 750 points. Unlike Zilch, a set of four 2s and two 4s may not be scored as three pairs.
A six die straight is worth 1500 points.
Each die can only be used once when scoring. (If you roll two 1s, two 2s, and two 3s you can either count the two 1s for 200 or use all six dice for three-pair and 1500 points — you can't use the ones both ways for 1700 points.)

Markov Decision Processes and Value Iteration

Note: those familiar with MDPs may find the non-standard variable names I use to present this standard subject distracting. My aim is to use variable names most meaningful in the final value-iteration equation. I beg your indulgence. As a courtesy, I've included mouse-hover popups where each such non-standard variable name is introduced (highlighted in blue) explaining the motivation for the change.

A Markov Decision Process (MDP)² is a system having a finite set of states $S$. For each state $s \in S$, there are a set of actions that may be taken $A(s)$. For each action $a \in A(s)$, there is a set of transition probabilities $P_a(s, s')$ defining the probability of transitioning from $s$ to each state $s' \in S$ given that action $a$ was taken while in state $s$. When action $a$ is taken from state $s$, the MDP responds by randomly moving to a new state $s'$ as governed by the transition probabilities $P_a(s, s')$ and then assigning the decision maker a reward $D_a(s, s')$.

The objective is to find a strategy function $G(s)$ that returns the particular action at each state $s \in S$ that will maximize the expected cumulative reward given by:

$$\sum_{k=0}^{\infty} \gamma^k D_{a_k}(s_k, s_{k+1})$$

where $k$ is a discreet time variable, $s_k$ is the game state at time $k$, $a_k$ is the player action taken from from state $s_k$, and $\gamma \in [0,1)$ is a constant discount factor for future rewards. The expectation must be taken over all possible state transition paths, and maximized over all possible choices for the actions $a_k$ taken in each state $s_k$. Then $G(s)$ will be defined by the $a_k$ taken from each state $s_k$ that maximizes this expectation.

One technique for solving this problem is value iteration. With this technique, each state $s$ is given a decimal value $W(s)$ which is an estimate of the expected discounted sum of all future rewards gained from state $s$. The estimate for $W(s)$ is iteratively refined for all $s$ by applying this update equation sequentially to all states $s$: $$W_{i+1}(s) := \max_{a \in A(s)} \left[ \sum_{s'} P_a(s, s')(D_a(s, s') + \gamma W_i(s'))\right]$$

Note that at each state $s$, the action which provides maximum total reward is selected. So not only is the estimate of $W(s)$ iteratively refined, but the best action $a$ taken for each state $s$ is also simultaneously improved. Iteration continues until $W_{i+1}(s)$ and $W_i(s)$ converge for all $s$, and $G(s)$ is then the set of actions $a$ selected for each state $s$ in the final iteration.

Extending the MDP to support Farkle

During a farkle turn, a player must iteratively

decide whether to bank his current points,
then, if he decides not to bank, roll the dice,
then decide how to score the dice.

So that's potentially two decisions (actions) for each state transition: a pre-roll banking decision (to bank, or to roll), and a post-roll scoring decision (how to score the dice just rolled). But an MDP (as described above) consists of only one action followed by a random transition. So one may reasonably ask if farkle can even be modeled with an MDP.

To fit farkle to an MDP model, one need only consider the result of a roll as part of the game state, and then reorder the turn sequence to that of a combined scoring-and-bank/roll action followed by either an end-of-turn-event or another random-roll-event. But this increases the number of game states by at least a few orders of magnitude and makes the problem unsolvable without a commensurate increase in computer resources.

Alternatively, the roll-decision could be splintered to include any conceivable combination of scoring instructions, directing the MDP how to score each potential roll before the roll is even made, thereby allowing the state machine to transition without additional input from the player once the roll is made. Aside from being a painful way of thinking about the problem, the number of possible scoring instructions is enormous, and the approach is again not feasible.

Rather than forcing the game to fit the MDP model, I instead define an extended MDP (EMDP) to more naturally model the game. Like an MDP, an EMDP has a finite set of states $S$. For each state $s \in S$, there are a set of primary actions that may be taken $A(s)$. For each primary action $a \in A(s)$, there are a set of sets of secondary actions $R_a(s)$. Once primary action $a$ is taken, one set of secondary actions $r \in R_a(s)$ is selected randomly by the EMDP according to a probability distribution $P_a(s)$. So instead of transitioning, the EMDP responds to action $a$ by offering a randomly chosen set of secondary actions that may be taken. For each secondary action $c \in r$, a deterministic transition state $s'$ is defined by a transition matrix $s' = X_{a,c}(s)$. Selection of action $c$ causes the EMDP to transition to $s'$ and reward $D_{a,c}(s, s')$ is granted.

Because there are two actions to be taken in a turn, the optimal strategy also has two parts: $G_A(s)$ is the optimal primary action in state $s$, and $G_C(s, r)$ is the optimal secondary action to take in state $s$ given that secondary action set $r$ was randomly offered by the EMDP in response to action $a$.

Value iteration processing is also extended to account for the secondary action: $$W_{i+1}(s) := \max_{a \in A(s)} \left[ \sum_{r \in R_a(s)} P_{a,r}(s) \max_{c \in r} \Big[ D_{a,c}(s, s') + \gamma W_i(s') \Big] \right]$$

Neller and Presser chose a reward function to incentivize winning the game¹. For any transition from a non-winning game state $s$ to a winning game state $s_w$, $D_{a,c}(s, s_w) = 1$ (where $s_w$ is any state where the player's banked points plus his turn points meets or exceeds the game goal of 10,000 points, and where his turn points meets or exceeds the minimum banking threshold). For transitions to any other non-winning state $s_v$, $D_{a,c}(s, s_v) = 0$. Because all game winning states $s_w$ are terminal, all future rewards from such states must be zero, so $W(s_w) = 0$. Because $W$ is known for all game winning states $s_w$, $W(s_w)$ is never updated during value iteration. (I.e., although a game winning state can appear on the right side of the value iteration update equation, it never appears on the left.)

Normally for an MDP, $0 \le \gamma < 1$, but we do not wish to value a game you win in 30 turns, less than a game you win in 10. Following Neller and Presser's approach, I instead set $\gamma = 1$. In general this can prevent value iteration from converging, but it does not cause a problem for farkle. (I think this is because there is no circular state transition path offering unbounded rewards, and because only the player that gets to a game winning state first actually wins, which ensures that the optimal strategy can't be attracted to some infinitely long path to a game winning state.) With $\gamma = 1$, $W(s)$ converges to the probability of winning from any non-terminal state $s$ when using an optimal strategy; the $a$ selected in the $\max_a$ operation converges to the optimal banking strategy from state $s$: $a = G_A(s)$; and the $c$ selected by the $\max_c$ operation converges to the optimal scoring strategy from state $s$ given that roll $r$ was thrown: $c = G_C(s,r)$.

Given that $\gamma = 1$, simplifications can be achieved if you move the reward for transitions to a game winning state out of the reward function, and into the value function for those same game winning states. That is, instead of defining $W(s_w) = 0$, define $W(s_w) = 1$, and set $D_{a,c}(s, s') = 0$ everywhere, allowing the reward function to be completely eliminated from the update equation. This definition also results in $W(s)$ consistently being the probability of winning the game from any state $s$ (including terminal game winning states).

Applying the simplifications of $\gamma = 1$, $D_{a,c}(s, s') = 0$, and $W(s_w) = 1$ reduces the extended value iteration update formula to:

$$W_{i+1}(s) := \max_{a \in A(s)} \left[ \sum_{r \in R_a(s)} P_{a,r}(s) \max_{c \in r} W_i(s') \right]$$

This general approach for finding the optimal play strategy for games having both pre-roll and post-roll actions, is further detailed for the specifics of 2-player farkle in the sections below.

Game State Characterization

The current game state $s$ is characterized by six component state variables:

$$s = (t, n, b, d, f, e)$$

where $t$ is the number of points accumulated only from your current turn. Once $b + t \gt 9950$ (which means you've hit the goal of 10,000 points) and you've met the minimum requirement to bank $t > 250$, the game is over, so for non-terminal game states we have:

$$t \in \{0, 50, 100, ..., \max [250, 9950-b]\}\text{.}$$

$n$ is the number of dice you have to roll

$$n \in \{1, 2, 3, 4, 5, 6\}\text{,}$$

$b$ is your banked score for which I enforce a lower bound $L$

$$b \in \{L, L+50, ..., -100, -50, 0, 50, 100, ..., 9950\}\text{,}$$

$d$ is your opponent's banked score which is also lower bounded to $L$

$$d \in \{L, L+50, ..., -100, -50, 0, 50, 100, ..., 9950\}\text{,}$$

$f$ is your consecutive farkle count (from previous turns)

$$f \in \{0, 1, 2\}\text{, and}$$

$e$ is your opponent's consecutive farkle count

$$e \in \{0, 1, 2\}\text{.}$$

The Farkle Value Iteration Equation

In this section we apply the farkle component state variables and rules defined in previous sections to the EMDP value iteration equation:

$$W_{i+1}(s) := \max_{a \in A(s)} \left[ \sum_{r \in R_a(s)} P_{a,r}(s) \max_{c \in r} W_i(s') \right]$$

Let's detail $A(s)$ first. For farkle, $A(s)$ is the set of available banking actions having at most two members: BANK and ROLL. For game states where you have so far accumulated less than 300 points, you have only one pre-roll (primary) action available to you: ROLL the dice. But for all other game states you have two pre-roll actions available: BANK, or ROLL. This yields:

\begin{equation} W_{i+1}(s) := \left\{ \begin{array}{ll} \sum\limits_{r \in R_{\text{ROLL}}(s)} P_{{\text{ROLL}},r}(s) \max\limits_{c \in r} W_i(s'), & \text{if $t < 300$}.\\ \\ \max \bigg[\sum\limits_{r \in R_{\text{BANK}}(s)} P_{{\text{BANK}},r}(s) \max\limits_{c \in r} W_i(s'), \\ \hspace{30pt} \sum\limits_{r \in R_{\text{ROLL}}(s)} P_{{\text{ROLL}},r}(s) \max\limits_{c \in r} W_i(s') \bigg], & \text{if $t \ge 300$}.\\ \\ \end{array} \right. \end{equation}

In the case of a bank action the equation collapses. $R_{\text{BANK}}(s)$ has only one entry: $r_{\text{BANK}}$. There's only one member of the probability distribution: $P_{{\text{BANK}}, r_{\text{BANK}}}(s) = 1$. Also, $r_{\text{BANK}}$ has only one entry: $c_{\text{BANK}}$. So for the case of $t \ge 300$ , the first member of the outer max operation reduces to just $W_i(s')$. All the mathematical machinery in this case is flexible enough to handle the banking case, but is entirely unnecessary. But what exactly is $s'$ after you bank?

Looking back at the game state characterization from the previous section, there is no variable that encodes whose turn it is. Everything I've written so far is from the perspective of the player who controls the dice, and there is no $s'$ expressible in terms of our six component state variables that identifies your state after a banking operation. (This is by design and is consistent with Neller and Presser's approach for optimizing pig game play strategy.) After you bank, it is your opponent's turn who we assume is also playing the optimal strategy, and we can express his state after you bank. If your game state just before you banked was $s = (t, n, b, d, f, e)$, then your opponent's state after you bank will be $o' = (0, 6, d, b + t, e, 0)$. To be clear, after you bank your opponent will have 0 turn points, 6 dice to roll, a banked score of $d$, an opponent's banked score of $b + t$ (which is your new banked score), $e$ consecutive farkles, and his opponent will have $0$ consecutive farkles (your farkle count reverting to zero having just banked). Your opponent's win probability is $W(o')$, which means our win probability after the bank must be: $W_i(s') = 1 - W_i(o') = 1 - W_i(0, 6, d, b + t, e, 0)$, which yields: \begin{equation} W_{i+1}(s) := \left\{ \begin{array}{ll} \sum\limits_{r \in R_{\text{ROLL}}(s)} P_{{\text{ROLL}},r}(s) \max\limits_{c \in r} W_i(s'), & \text{if $t < 300$}.\\ \\ \max \bigg[\Big(1 - W_i(0,6,d,b+t,e,0)\Big), \\ \hspace{30pt} \sum\limits_{r \in R_{\text{ROLL}}(s)} P_{{\text{ROLL}},r}(s) \max\limits_{c \in r} W_i(s') \bigg], & \text{if $t \ge 300$}.\\ \\ \end{array} \right. \end{equation}

In the case of a ROLL action, $R_{\text{ROLL}}(s)$ corresponds to the set of all possible rolls of the dice from state $s$. More precisely, each roll $r$ is defined as a set of possible scoring decisions for some permutation of thrown dice. The $\sum_r$ operation is summing over each possible roll $r \in R_{\text{ROLL}}(s)$. $P_{\text{ROLL},r}(s)$ is the probability of making roll $r$ from game state $s$. And each $c \in r$ is one possible scoring decision given that roll $r$ was thrown. Given scoring decision $c$, the new game state $s'$ is determined. The $\max_c$ operation maximizes the expected win probability $W_i(s')$ given that roll $r$ was thrown over these possible scoring decisions.

First note that the set of potential rolls from game state $s = (t, n, b, d, f, e)$ is only dependent on the number of dice you are rolling, so:

$$R_{\text{ROLL}}(s) = R_{\text{ROLL}}(n)$$

Second note that

$$P_{\text{ROLL},r}(s) = {1 \over {6^n}}$$

Third, because the expression for $W_i(s')$ is fundamentally different for the case of farkling rolls vs. scoring rolls, it is convenient to partition $R_{\text{ROLL}}(n)$ into two subsets: the subset of all farkling rolls $R_{\text{FARKLE}}(n)$, and the subset of all scoring rolls $R_{\text{SCORE}}(n)$:

$$R_{\text{ROLL}}(n) = R_{\text{FARKLE}}(n) \bigcup R_{\text{SCORE}}(n)$$

Fourth, if roll $r$ is a farkle, then $r$ will have only one member: a zero point farkle scoring decision and the $max_c$ operation can be dropped.

Applying these four observations gives:

\begin{equation} \begin{array}{rl} \sum\limits_{r \in R_{\text{ROLL}}(s)} P_{{\text{ROLL}},r}(s) \max\limits_{c \in r} W_i(s') &= \sum\limits_{r \in R_{\text{FARKLE}}(n)} {1 \over {6^n}} W_i(s') + \sum\limits_{r \in R_{\text{SCORE}}(n)} {1 \over {6^n}} \max\limits_{c \in r} W_i(s') \\ &= {1 \over {6^n}} \sum\limits_{r \in R_{\text{FARKLE}}(n)} W_i(s') + {1 \over {6^n}} \sum\limits_{r \in R_{\text{SCORE}}(n)} \max\limits_{c \in r} W_i(s') \end{array} \end{equation}

After a farkle, it becomes your opponent's turn and there is again no expression for the new game state. We again instead express $W_i(s')$ in terms of your opponents win probability after your farkle:

$$W_i(s') = 1 - W_i(0, 6, d, b', e, f')$$

where $b'$ is your new banked score after your farkle, which may have decreased due to the three consecutive farkle penalty, but is enforced to always be at least $L$ to keep the problem tractable:

$$b' = \max [ L, b - Y_f ]\text{;}$$ where $Y_f$ is the number of points you lose from your banked score when you farkle while already having $f$ consecutive farkles

$$\begin{align*} Y_0 &= 0 \\ Y_1 &= 0 \\ Y_2 &= 500\text{;} \end{align*}$$

and where $f'$ is your new consecutive farkle count which normally just increments, but is reset back to zero if you just had your third consecutive farkle

$$f' = (f+1) \mod 3\text{.}$$

Note also that for all farkling rolls the expression inside the sum is the same, so we can replace the sum with a multiplicative factor equaling the number of ways to roll a farkle with $n$ dice. Combining that count with the ${1 \over {6^n}}$ simplifies to the probability of farkling with $n$ dice³:

$$\begin{align*} F_1 &= 2/3 \\ F_2 &= 4/9 \\ F_3 &= 5/18 \\ F_4 &= 17/108 \\ F_5 &= 25/324 \\ F_6 &= 5/216 \\ \end{align*}$$

Combining the above observations gives this substitution:

$${1 \over {6^n}} \sum\limits_{r \in R_{\text{FARKLE}}(n)} W_i(s') = F_n (1 - W_i(0,6,d,b',e,f'))$$

To detail the expression for the case of scoring rolls, first let $C_T(c)$ be the number of points taken with scoring combination $c$, and let $C_N(c)$ be the number of dice used with scoring combination $c$. So after rolling roll $r \in R_{\text{SCORE}}(n)$ and selecting scoring action $c \in r$ the state transitions from $s = (t, n, b, d, f, e)$ to

$$s' = (t', n', b, d, f, e)$$

where

$$ \begin{align*} t' &= t+C_T(c) \\ n' &= h(n-C_N(c)) \end{align*} $$

and where $h(x)$ is a hot-dice function for resetting the number of available dice back to $6$ when all dice are successfully scored:

$$h(n)=\begin{cases}6, & \text{for $n = 0$.} \\ n, & \text{otherwise.}\end{cases}$$

This gives our final value iteration equation, where (below) I repeat all the supporting equations for convenience:

$$ W_{i+1}(t,n,b,d,f,e) := \left\{ \begin{array}{ll} F_n (1 - W_i(0,6,d,b-Y_f,e,f')) + \\ \hspace{30pt} {1 \over 6^n} \sum\limits_{r \in R_{\text{SCORE}}(n)} \max\limits_{c \in r} \left[ W_i(t',n',b,d,f,e)\right], & \text{if $t < 300$}.\\ \\ \max \bigg[ \Big(1 - W_i(0,6,d,b+t,e,0)\Big), \\ \hspace{30pt} \Big( F_n (1 - W_i(0,6,d,b',e,f')) + \\ \hspace{60pt} {1 \over 6^n} \sum\limits_{r \in R_{\text{SCORE}}(n)} \max\limits_{c \in r} \left[ W_i(t',n',b,d,f,e) \right] \Big) \bigg], & \text{if $t \ge 300$}. \end{array} \right. $$

where

$$ \begin{align*} b' &= \max [ L, b - Y_f ] \\ f' &= (f+1) \mod 3 \\ t' &= t+C_T(c) \\ n' &= h(n-C_N(c)) \\ h(n) &=\begin{cases}6, & \text{for $n = 0$.} \\ n, & \text{otherwise.}\end{cases} \\ Y_0 &= 0 \\ Y_1 &= 0 \\ Y_2 &= 500 \\ F_1 &= 2/3 \\ F_2 &= 4/9 \\ F_3 &= 5/18 \\ F_4 &= 17/108 \\ F_5 &= 25/324 \\ F_6 &= 5/216 \\ \end{align*} $$

Performance

With a lower bound of $L=-2500$ there are 423,765,000 game states. Each state is modeled with a double precision floating point number requiring 8 bytes, so the entire state matrix requires 3.39012 billion bytes of RAM (plus some array overhead). With game goal $g$ the number of states grows as $(g-L)^3$. Lowering $L$ from -2500 to, say, -10,000 (to eliminate all reasonable doubt that you'll ever venture into portions of the calculated strategy that are non-optimal) will increase the number of game states to over 1.7 billion and memory requirements to 14 GB (which is more than I have on any of my computers).

The value iteration software to solve the optimal farkle problem was written in C++. Obvious optimizations were made. For example, I don't actually iterate over possible rolls, but only over a precalculated set of unique score sets (which is orders of magnitude smaller in count than rolls), and weighting each score set by the number of rolls that share that same set. I let the program iterate over all game states until the maximum relative change over all states from one iteration to the next was less than 1 part in a billion. (I.e., iterations continued until no state had a change in value from one iteration to the next of more than 1 part in a billion, so that even those states with extremely small win probabilities were calculated with precision.) With the floor set to -2500 points, it took 62 iterations for the matrix to converge. Using one core of a dual-core Intel I3 4130T, the software performed 1.30 million state updates per second, each pass of the matrix taking 5 minutes 26 seconds, and convergence taking 5 hours 37 minutes.

The Strategy

It is not practical to list the win probabilities and banking rules for all of the half billion game states in two player farkle. Here I provide only a limited view into the complete strategy by means of five examples. Each of the five subsections below provide a 2 dimensional slice of the 6 dimensional strategy matrix, sufficient for optimal play of a single turn for a particular start-of-turn game state.

Optimal Strategy Case: $b = 0, d = 0, f = 0, e = 0$

Table 1 shows the win probabilities and banking actions needed to play an opening turn optimally. Each cell shows the win probability for different turn scores (starting at $t=0$ in the top row and increasing down the page) and/or different number of dice to roll (starting at $n=6$ in the first column and decreasing as you move to the right). For all entries in this table, the other four component state variables are fixed: your banked score (b) is fixed at 0 points, your opponent's banked score (d) at 0 points, your consecutive farkle count (f) at 0, and your opponent's consecutive farkle count (e) at zero. States shaded green are states from which your optimal banking action is to roll. States shaded red are states from which your optimal banking action is to bank. States with an asterisk are inaccessible and can be ignored (although the software still calculates your win probability in-, and optimal play out of- these states which would be useful if, say through a disruption in the laws of the universe, you somehow find yourself in such a state).

Table 1. Win probabilities and banking actions for optimal turn play for b=0, d=0, f=0, and e=0.

t	n
t	6	5	4	3	2	1
0	0.534870	0.506721^*	0.493622^*	0.487801^*	0.486163^*	0.489950^*
50	0.540099^*	0.511005	0.495849^*	0.489177^*	0.487586^*	0.491718^*
100	0.545359^*	0.515776	0.499374	0.490798^*	0.489034^*	0.493525^*
150	0.550709^*	0.520616^*	0.503815	0.493135	0.490494^*	0.495375^*
200	0.556198^*	0.525472^*	0.508564	0.497158	0.492462	0.497250^*
250	0.561811^*	0.530484^*	0.513285^*	0.502074	0.495439	0.499128
300	0.567448	0.535760^*	0.518054^*	0.506680	0.503290	0.503290
350	0.573082	0.541181	0.523166^*	0.511296^*	0.509711	0.509711
400	0.578710	0.546604	0.528579	0.516146	0.516146	0.516146
450	0.584332	0.552028	0.533996	0.522592	0.522592	0.522592
500	0.589966	0.557452	0.539418	0.529048	0.529048	0.529048
550	0.595675	0.562873	0.544842	0.535513	0.535513	0.535513
600	0.601436	0.568338	0.550268	0.541986	0.541986	0.541986
650	0.607195	0.573937	0.555693	0.548463	0.548463	0.548463
700	0.612941	0.579588	0.561115	0.554944	0.554944	0.554944
750	0.618671	0.585232	0.566534	0.561427	0.561427	0.561427
800	0.624400	0.590866	0.571948	0.567910	0.567910	0.567910
850	0.630173	0.596489	0.577354	0.574392	0.574392	0.574392
900	0.635987	0.602137	0.582753	0.580870	0.580870	0.580870
950	0.641793	0.607914	0.588142	0.587343	0.587343	0.587343
1000	0.647615	0.613782	0.593809	0.593809	0.593809	0.593809
1050	0.653417	0.619633	0.600266	0.600266	0.600266	0.600266
1100	0.659192	0.625466	0.606713	0.606713	0.606713	0.606713
1150	0.664939	0.631280	0.613147	0.613147	0.613147	0.613147
1200	0.670657	0.637072	0.619567	0.619567	0.619567	0.619567
1250	0.676345	0.642842	0.625970	0.625970	0.625970	0.625970
1300	0.682002	0.648587	0.632356	0.632356	0.632356	0.632356
1350	0.687625	0.654306	0.638721	0.638721	0.638721	0.638721
1400	0.693212	0.659998	0.645065	0.645065	0.645065	0.645065
1450	0.698762	0.665660	0.651385	0.651385	0.651385	0.651385
1500	0.704289	0.671292	0.657681	0.657681	0.657681	0.657681
1550	0.709840	0.676892	0.663950	0.663950	0.663950	0.663950
1600	0.715350	0.682458	0.670190	0.670190	0.670190	0.670190
1650	0.720818	0.687989	0.676400	0.676400	0.676400	0.676400
1700	0.726243	0.693483	0.682579	0.682579	0.682579	0.682579
1750	0.731622	0.698938	0.688723	0.688723	0.688723	0.688723
1800	0.736955	0.704354	0.694832	0.694832	0.694832	0.694832
1850	0.742239	0.709727	0.700904	0.700904	0.700904	0.700904
1900	0.747472	0.715058	0.706936	0.706936	0.706936	0.706936
1950	0.752655	0.720345	0.712927	0.712927	0.712927	0.712927
2000	0.757846	0.725587	0.718875	0.718875	0.718875	0.718875
2050	0.763028	0.730782	0.724781	0.724781	0.724781	0.724781
2100	0.768162	0.735929	0.730640	0.730640	0.730640	0.730640
2150	0.773241	0.741026	0.736453	0.736453	0.736453	0.736453
2200	0.778262	0.746072	0.742217	0.742217	0.742217	0.742217
2250	0.783224	0.751066	0.747932	0.747932	0.747932	0.747932
2300	0.788127	0.756006	0.753594	0.753594	0.753594	0.753594
2350	0.792969	0.760892	0.759204	0.759204	0.759204	0.759204
2400	0.797788	0.765722	0.764759	0.764759	0.764759	0.764759
2450	0.802600	0.770495	0.770258	0.770258	0.770258	0.770258
2500	0.807367	0.775700	0.775700	0.775700	0.775700	0.775700
2550	0.812069	0.781084	0.781084	0.781084	0.781084	0.781084
2600	0.816707	0.786408	0.786408	0.786408	0.786408	0.786408
2650	0.821278	0.791671	0.791671	0.791671	0.791671	0.791671
2700	0.825781	0.796871	0.796871	0.796871	0.796871	0.796871
2750	0.830218	0.802008	0.802008	0.802008	0.802008	0.802008
2800	0.834586	0.807081	0.807081	0.807081	0.807081	0.807081
2850	0.838886	0.812087	0.812087	0.812087	0.812087	0.812087
2900	0.843116	0.817025	0.817025	0.817025	0.817025	0.817025
2950	0.847277	0.821894	0.821894	0.821894	0.821894	0.821894
3000	0.851367	0.826696	0.826696	0.826696	0.826696	0.826696
3050	0.855387	0.831428	0.831428	0.831428	0.831428	0.831428
3100	0.859336	0.836090	0.836090	0.836090	0.836090	0.836090
3150	0.863214	0.840682	0.840682	0.840682	0.840682	0.840682
3200	0.867019	0.845202	0.845202	0.845202	0.845202	0.845202
3250	0.870753	0.849650	0.849650	0.849650	0.849650	0.849650
3300	0.874415	0.854025	0.854025	0.854025	0.854025	0.854025
3350	0.878005	0.858327	0.858327	0.858327	0.858327	0.858327
3400	0.881522	0.862555	0.862555	0.862555	0.862555	0.862555
3450	0.884967	0.866709	0.866709	0.866709	0.866709	0.866709
3500	0.888343	0.870788	0.870788	0.870788	0.870788	0.870788
3550	0.891652	0.874791	0.874791	0.874791	0.874791	0.874791
3600	0.894888	0.878719	0.878719	0.878719	0.878719	0.878719
3650	0.898052	0.882571	0.882571	0.882571	0.882571	0.882571
3700	0.901145	0.886347	0.886347	0.886347	0.886347	0.886347
3750	0.904167	0.890047	0.890047	0.890047	0.890047	0.890047
3800	0.907118	0.893671	0.893671	0.893671	0.893671	0.893671
3850	0.909999	0.897219	0.897219	0.897219	0.897219	0.897219
3900	0.912809	0.900691	0.900691	0.900691	0.900691	0.900691
3950	0.915548	0.904088	0.904088	0.904088	0.904088	0.904088
4000	0.918219	0.907409	0.907409	0.907409	0.907409	0.907409
4050	0.920820	0.910655	0.910655	0.910655	0.910655	0.910655
4100	0.923353	0.913826	0.913826	0.913826	0.913826	0.913826
4150	0.925819	0.916920	0.916920	0.916920	0.916920	0.916920
4200	0.928219	0.919939	0.919939	0.919939	0.919939	0.919939
4250	0.930560	0.922884	0.922884	0.922884	0.922884	0.922884
4300	0.932847	0.925756	0.925756	0.925756	0.925756	0.925756
4350	0.935068	0.928555	0.928555	0.928555	0.928555	0.928555
4400	0.937226	0.931282	0.931282	0.931282	0.931282	0.931282
4450	0.939321	0.933937	0.933937	0.933937	0.933937	0.933937
4500	0.941354	0.936521	0.936521	0.936521	0.936521	0.936521
4550	0.943326	0.939035	0.939035	0.939035	0.939035	0.939035
4600	0.945236	0.941479	0.941479	0.941479	0.941479	0.941479
4650	0.947087	0.943854	0.943854	0.943854	0.943854	0.943854
4700	0.948879	0.946160	0.946160	0.946160	0.946160	0.946160
4750	0.950612	0.948400	0.948400	0.948400	0.948400	0.948400
4800	0.952287	0.950573	0.950573	0.950573	0.950573	0.950573
4850	0.953905	0.952679	0.952679	0.952679	0.952679	0.952679
4900	0.955468	0.954721	0.954721	0.954721	0.954721	0.954721
4950	0.956977	0.956699	0.956699	0.956699	0.956699	0.956699
5000	0.958614	0.958614	0.958614	0.958614	0.958614	0.958614

The game begins in the upper left cell at $t=0$ and $n=6$. The win probability listed here means that if both players play optimally, the player going first wins 53.487% of the time. To play a turn, first you must choose your banking action. Since cell (t=0, n=6) is green, the optimal banking action is to roll.

Second, you roll the dice. Let's say for your opening roll, you throw:

$$6, 5, 3, 3, 3, 2$$

Third, you must choose how to score the dice. Here you have three choices:

take the 5 for 50 points,
take the three 3 for 300 points, or
take the 5 and the three 3 for 350 points.

To determine your optimal scoring choice, you must look at the win probabilities from the three states corresponding to each scoring option:

$$ \begin{align*} W(t=50, n=5) &= 0.511005\\ W(t=300, n=3) &= 0.506680\\ W(t=350, n=2) &= 0.509711 \end{align*} $$

So in this case, you should score only the 5 since it moves you to the state with the best probability of ultimately winning the game. This ends one primary-action, random-response, secondary-action sequence effecting one state change in the EMDP. This process is then simply repeated (starting with another banking decision from the new state (t=50, n=5) until you either roll a farkle, or end in a state where you're supposed to bank.

It is interesting to see how the optimal play strategy for the opening turn differs from the strategy that simply maximizes your expected farkle turn score. To maximize expected score you wouldn't bank with 6 dice available to roll unless you had 16,400 or more points on the turn. In contrast, to optimize your chances of winning a game, you would bank 5000 or more points on your opening turn even when you have six dice available to roll. The banking threshold for 5 dice is similarly reduced from 3050 to a more conservative 2500. The banking threshold for other available dice counts do not differ.

Note that if you manage to bank 1000 points on your opening turn, your chances of winning increase to almost 60%; banking 1850 points on your first turn leaves you with better than 70% chance of ultimately winning; bank 2750 points and your chances of winning are above 80%; and manage to put up 3900 points on the opening turn and you're win probability is over 90%.

Optimal Strategy Case: $b = 6000, d = 8000, f = 0, e = 0$

As a second example, consider the case of Table 2 which shows how to optimally play a turn where you start with 6000 points and are 2000 points behind to your opponent who has 8000 points.

Table 2. Win probabilities and banking actions for optimal turn play for b=6000, d=8000, f=0, and e=0.

t	n
t	6	5	4	3	2	1
0	0.162365	0.135452^*	0.124658^*	0.118916^*	0.116823^*	0.120172^*
50	0.167053^*	0.138098	0.126521^*	0.120462^*	0.118340^*	0.121980^*
100	0.172282^*	0.141184	0.128485	0.122068^*	0.119912^*	0.123894^*
150	0.177898^*	0.144859^*	0.130513	0.123738	0.121540^*	0.125881^*
200	0.183766^*	0.149237^*	0.133183	0.125473	0.123237	0.127935^*
250	0.189820^*	0.153963^*	0.136653^*	0.127264	0.125005	0.130069
300	0.196092	0.158915^*	0.140895^*	0.130017	0.126838	0.132298
350	0.202623	0.164030	0.145391^*	0.133939^*	0.128728	0.134616
400	0.209467	0.169363	0.150024	0.138163	0.133991	0.137010
450	0.216533	0.174975	0.154798	0.142503	0.139457	0.139475
500	0.223831	0.180897	0.159828	0.146975	0.145115	0.145115
550	0.231322	0.187028	0.165248	0.151603	0.150876	0.150876
600	0.239035	0.193362	0.171012	0.156843	0.156843	0.156843
650	0.246918	0.199976	0.176975	0.163015	0.163015	0.163015
700	0.255037	0.206817	0.183154	0.169453	0.169453	0.169453
750	0.263357	0.213830	0.189533	0.176109	0.176109	0.176109
800	0.271945	0.221070	0.196123	0.183059	0.183059	0.183059
850	0.280741	0.228460	0.202874	0.190242	0.190242	0.190242
900	0.289790	0.236097	0.209824	0.197712	0.197712	0.197712
950	0.298954	0.243918	0.216938	0.205281	0.205281	0.205281
1000	0.308358	0.251943	0.224242	0.213245	0.213245	0.213245
1050	0.317895	0.260087	0.231740	0.221146	0.221146	0.221146
1100	0.327785	0.268509	0.239462	0.229404	0.229404	0.229404
1150	0.337937	0.277274	0.247380	0.237923	0.237923	0.237923
1200	0.348393	0.286386	0.255529	0.246764	0.246764	0.246764
1250	0.358985	0.295733	0.263897	0.255751	0.255751	0.255751
1300	0.369743	0.305337	0.272544	0.264989	0.264989	0.264989
1350	0.380541	0.315057	0.281403	0.274404	0.274404	0.274404
1400	0.391642	0.324961	0.290466	0.284353	0.284353	0.284353
1450	0.402933	0.334964	0.299633	0.294503	0.294503	0.294503
1500	0.414660	0.345257	0.308958	0.305091	0.305091	0.305091
1550	0.426484	0.355749	0.318423	0.315621	0.315621	0.315621
1600	0.438586	0.366590	0.328154	0.326360	0.326360	0.326360
1650	0.450706	0.377696	0.338163	0.337112	0.337112	0.337112
1700	0.463015	0.389278	0.348576	0.348173	0.348173	0.348173
1750	0.475329	0.401117	0.359573	0.359573	0.359573	0.359573
1800	0.488007	0.413244	0.371759	0.371759	0.371759	0.371759
1850	0.500848	0.425457	0.384134	0.384134	0.384134	0.384134
1900	0.514151	0.437907	0.397273	0.397273	0.397273	0.397273
1950	0.527546	0.450291	0.410346	0.410346	0.410346	0.410346
2000	0.541169	0.463032	0.423910	0.423910	0.423910	0.423910
2050	0.553892	0.475650	0.435848	0.435848	0.435848	0.435848
2100	0.566687	0.488532	0.448671	0.448671	0.448671	0.448671
2150	0.579544	0.501372	0.462112	0.462112	0.462112	0.462112
2200	0.593075	0.514412	0.476647	0.476647	0.476647	0.476647
2250	0.606845	0.527568	0.490860	0.490860	0.490860	0.490860
2300	0.620874	0.541235	0.505044	0.505044	0.505044	0.505044
2350	0.634230	0.554962	0.518525	0.518525	0.518525	0.518525
2400	0.647380	0.568960	0.532648	0.532648	0.532648	0.532648
2450	0.659830	0.582404	0.547160	0.547160	0.547160	0.547160
2500	0.672922	0.595574	0.563571	0.563571	0.563571	0.563571
2550	0.685577	0.608065	0.579471	0.579471	0.579471	0.579471
2600	0.699046	0.620640	0.594828	0.594828	0.594828	0.594828
2650	0.712677	0.633358	0.608315	0.608315	0.608315	0.608315
2700	0.727273	0.647372	0.621304	0.621304	0.621304	0.621304
2750	0.740566	0.661998	0.632781	0.632781	0.632781	0.632781
2800	0.752872	0.677539	0.646547	0.646547	0.646547	0.646547
2850	0.763740	0.691731	0.661928	0.661928	0.661928	0.661928
2900	0.775267	0.704645	0.683575	0.683575	0.683575	0.683575
2950	0.787428	0.716595	0.704022	0.704022	0.704022	0.704022
3000	0.801293	0.729617	0.722026	0.722026	0.722026	0.722026
3050	0.812468	0.740574	0.732915	0.732915	0.732915	0.732915
3100	0.824009	0.752452	0.744321	0.744321	0.744321	0.744321
3150	0.834312	0.764049	0.754637	0.754637	0.754637	0.754637
3200	0.844465	0.775780	0.768552	0.768552	0.768552	0.768552
3250	0.854392	0.785990	0.782612	0.782612	0.782612	0.782612
3300	0.865209	0.800609	0.800609	0.800609	0.800609	0.800609
3350	0.875145	0.811712	0.811712	0.811712	0.811712	0.811712
3400	0.887417	0.826357	0.826357	0.826357	0.826357	0.826357
3450	0.896088	0.833481	0.833481	0.833481	0.833481	0.833481
3500	0.907011	0.847263	0.847263	0.847263	0.847263	0.847263
3550	0.912996	0.864340	0.864340	0.864340	0.864340	0.864340
3600	0.918091	0.883970	0.883970	0.883970	0.883970	0.883970
3650	0.920604	0.894832	0.894832	0.894832	0.894832	0.894832
3700	0.925351	0.903017	0.903017	0.903017	0.903017	0.903017
3750	0.933956	0.903045	0.903045	0.903045	0.903045	0.903045
3800	0.950436	0.903078	0.903078	0.903078	0.903078	0.903078
3850	0.962853	0.903098	0.903098	0.903098	0.903098	0.903098
3900	0.973675	0.917497	0.903124	0.903124	0.903124	0.903124
3950	0.979061	0.930203	0.903136	0.903136	0.903136	0.903136

Given your 2000 point defecit, it is interesting to see how much more agressively this turn should be played compared to an opening turn: the banking thresholds here for 6, 5, 4, 3, 2 and 1 dice are, respectively, 4000, 3350, 1750, 600, 400, and 500 points. Compare that to the banking thresholds for an opening turn at 5000, 2500, 1000, 350, 300, and 300 points.

Another interesting observation about this strategy is that although you should bank between 3300 and 3850 points when you have 5 dice to roll, you should instead roll if you have 3900 to 3950 points (because you have such a high probability of closing out the game).

Optimal Strategy Case: $b = 8000, d = 6000, f = 0, e = 0$

Now consider the reverse of the situation in the previous example. Table 3 shows how to optimally play a turn where you start with 8000 points and are 2000 points ahead of your opponent who has 6000 points.

Table 3. Win probabilities and banking actions for optimal turn play for b=8000, d=6000, f=0, and e=0.

t	n
t	6	5	4	3	2	1
0	0.903422	0.878090^*	0.862982^*	0.857184^*	0.856210^*	0.860061^*
50	0.907970^*	0.883520	0.866464^*	0.858208^*	0.857345^*	0.861539^*
100	0.912448^*	0.888520	0.872004	0.860637^*	0.858466^*	0.863007^*
150	0.916837^*	0.893230^*	0.877734	0.864994	0.859560^*	0.864460^*
200	0.921258^*	0.898001^*	0.882899	0.871213	0.863650	0.865889^*
250	0.925550^*	0.902725^*	0.887727^*	0.877180	0.868829	0.867273
300	0.929835	0.907504^*	0.892777^*	0.882276	0.882276	0.882276
350	0.933851	0.912127	0.897734^*	0.888668^*	0.888668	0.888668
400	0.937676	0.916796	0.902726	0.894905	0.894905	0.894905
450	0.941287	0.921250	0.907616	0.900824	0.900824	0.900824
500	0.944922	0.925413	0.912287	0.907268	0.907268	0.907268
550	0.948355	0.929370	0.916532	0.913595	0.913595	0.913595
600	0.951784	0.933391	0.920430	0.919922	0.919922	0.919922
650	0.955073	0.937177	0.925407	0.925407	0.925407	0.925407
700	0.958290	0.940961	0.930450	0.930450	0.930450	0.930450
750	0.961368	0.944716	0.934554	0.934554	0.934554	0.934554
800	0.964239	0.948516	0.938914	0.938914	0.938914	0.938914
850	0.966594	0.952008	0.943151	0.943151	0.943151	0.943151
900	0.968986	0.955211	0.948312	0.948312	0.948312	0.948312
950	0.971335	0.958000	0.953239	0.953239	0.953239	0.953239
1000	0.973780	0.960716	0.958124	0.958124	0.958124	0.958124
1050	0.975945	0.963165	0.961202	0.961202	0.961202	0.961202
1100	0.978076	0.965682	0.964291	0.964291	0.964291	0.964291
1150	0.979977	0.968029	0.966812	0.966812	0.966812	0.966812
1200	0.981763	0.970175	0.970114	0.970114	0.970114	0.970114
1250	0.983328	0.973408	0.973408	0.973408	0.973408	0.973408
1300	0.984740	0.977098	0.977098	0.977098	0.977098	0.977098
1350	0.986091	0.979125	0.979125	0.979125	0.979125	0.979125
1400	0.987692	0.981344	0.981344	0.981344	0.981344	0.981344
1450	0.988993	0.982226	0.982226	0.982226	0.982226	0.982226
1500	0.990373	0.984029	0.984029	0.984029	0.984029	0.984029
1550	0.991218	0.985999	0.985999	0.985999	0.985999	0.985999
1600	0.991737	0.989914	0.989914	0.989914	0.989914	0.989914
1650	0.992073	0.992073	0.992073	0.992073	0.992073	0.992073
1700	0.992947	0.992947	0.992947	0.992947	0.992947	0.992947
1750	0.992964	0.992964	0.992964	0.992964	0.992964	0.992964
1800	0.993911	0.992981	0.992981	0.992981	0.992981	0.992981
1850	0.994722	0.992990	0.992990	0.992990	0.992990	0.992990
1900	0.995640	0.992998	0.992998	0.992998	0.992998	0.992998
1950	0.996180	0.993000	0.993000	0.993000	0.993000	0.993000

In this example, you have a 2000 point lead and are yourself only 2000 points from winning the game. It is interesting to see how much more conservatively this turn should be played compared to an opening turn: the banking thresholds here for the different number of dice to throw are 1650, 1250, 650, 300, 300, and 300 points. Compare that to the banking thresholds for an opening turn at 5000, 2500, 1000, 350, 300, and 300 points.

As in the previous example, there is another banking rule inversion in this table, but this time for the case of 6 dice, where at point values above 1750 it again becomes advantageous to roll to try to end the game.

Optimal Strategy Case: $b = 9000, d = 9500, f = 0, e = 0$

As a fourth example, consider the case of Table 4 which shows how to optimally play a turn where you start with 9000 points and are 500 points behind your opponent who has 9500 points.

Table 4. Win probabilities and banking actions for optimal turn play for b=9000, d=9500, f=0, and e=0.

t	n
t	6	5	4	3	2	1
0	0.454366	0.351125^*	0.303308^*	0.277271^*	0.270069^*	0.285659^*
50	0.463700^*	0.358576	0.309820^*	0.283131^*	0.274544^*	0.289393^*
100	0.475303^*	0.366847	0.316369	0.289394^*	0.280312^*	0.294362^*
150	0.486102^*	0.376478^*	0.323286	0.295374	0.286679^*	0.300901^*
200	0.505119^*	0.388392^*	0.331770	0.301490	0.292906	0.309194^*
250	0.525335^*	0.399404^*	0.342403^*	0.308335	0.298948	0.317272
300	0.554879	0.417884^*	0.354687^*	0.317757	0.305197	0.325207
350	0.573800	0.432225	0.366843^*	0.329936^*	0.313516	0.332809
400	0.602493	0.454655	0.382838	0.343746	0.325562	0.341277
450	0.619405	0.462471	0.391661	0.356792	0.340660	0.354200
500	0.653307	0.481158	0.404518	0.367897	0.356345	0.372251
550	0.696939	0.489452	0.408392	0.376025	0.370389	0.393025
600	0.761617	0.519126	0.416628	0.381381	0.381505	0.412584
650	0.821580	0.577904	0.419050	0.384455	0.389121	0.429136
700	0.878974	0.671873	0.455551	0.391137	0.393701	0.441170
750	0.920889	0.759474	0.538352	0.391141	0.396326	0.448721
800	0.951179	0.838237	0.658773	0.453755	0.398829	0.452947
850	0.966192	0.886061	0.752905	0.563689	0.401559	0.455471
900	0.976536	0.921141	0.831614	0.696407	0.522215	0.459381
950	0.981336	0.937788	0.873088	0.776038	0.641661	0.462492

I find it interesting that the only time you should bank during such a turn is if you have exactly 3 dice to roll and either 700 or 750 points on the turn. Why does it make sense to roll just 1 or 2 dice with those same number of points? And why does it make sense to roll three dice when you have 800 or more points?

To answer the first question, observe that it's easier to ultimately hot-dice (score all dice thrown) when you are rolling only 1 or 2 dice than when you are rolling 3. If you can manage to get back to 6 dice, your chances of ending the game this turn increase dramatically. Furthermore, any additional game points you gain by rolling 3 dice are meaningless unless you hot-dice. If only 1 or 2 of the 3 thrown dice score, you will still find yourself below the game goal and (if you then decide to bank these few extra points) will not have even reduced the number of points you have to accumulate on your subsequent turn due to the minimum banking threshold.

To answer the second question, note that it's possible to reach 10,000 points without hot-dice: in particular there's a 7% chance of rolling two ones and a third non-scoring die. These rolls end the game if you have 9800 points.

Optimal Strategy Case: $b = 9500, d = 9000, f = 0, e = 0$

Now consider the reverse of the situation in the previous example. Table 3 shows how to optimally play a turn where you start with 9500 points and are 500 points ahead of your opponent who has 9000 points.

Table 5. Win probabilities and banking actions for optimal turn play for b=9500, d=9000, f=0, and e=0.

t	n
t	6	5	4	3	2	1
0	0.801016	0.702303^*	0.658594^*	0.637598^*	0.630975^*	0.640081^*
50	0.826173^*	0.706846	0.660815^*	0.642258^*	0.639026^*	0.652003^*
100	0.863321^*	0.724040	0.665063	0.645329^*	0.645400^*	0.663218^*
150	0.897707^*	0.757952^*	0.666217	0.647091	0.649767^*	0.672708^*
200	0.930612^*	0.811876^*	0.687617	0.648362	0.652392	0.679607^*
250	0.954643^*	0.862100^*	0.735325^*	0.649657	0.653897	0.683936
300	0.972009	0.907257^*	0.804365^*	0.686823	0.655332	0.686359
350	0.980617	0.934675	0.858333^*	0.749851^*	0.656897	0.687806
400	0.986548	0.954788	0.903460	0.825942	0.726073	0.690048
450	0.989300	0.964332	0.927238	0.871597	0.794555	0.691832

Here I found it surprising that even though you have a 500 point lead and control of the dice, the optimal strategy requires that you never bank until you've reached the game-winning 500 points. This is surprisingly more aggressive play than on an opening turn where the score is tied. I would have guessed that banking 300 points when faced with rolling 1 or 2 dice would have been better to avoid the high-probability farkle, then allow your opponent a chance to force his way to a 1000 point turn, and assuming he fails, follow up with a high-probability minimum bank turn to win. It turns out if your opponent plays perfectly, he has about a 1/3 chance of reaching 1000 points to steal the win, and so playing a little more aggressively to attempt to end the game and deny him that possible steal is advantageous.

Also note how for many point totals, you are better off with fewer dice. This is often the case when playing turns where your best play is to try to reach 10,000 points and end the game, and also in cases where you are just really far behind an opponent that is fast approaching a game winning score.

Strategy Validation

In this section I share the efforts I've made to sanity check the calculated strategies for reasonableness, and to determine the deviation between the calculated strategy and the truly optimal strategy. This section definitely gets into the weeds and is not for the faint of heart, but may be of interest to skeptics or the particularly nerdy (of which I am both).

Comparison to strategy that maximizes expected turn score

Strategies that maximize expected turn score for farkle variants have been independently calculated with consistent results^4,5,6. These strategies do not consider your current banked score, your opponent's banked score or farkle count, or the end of game scoring goal. Still, one would expect that for an opening turn (where the score is tied at zero and where the end-of-game scoring goal is distant) that the game winning strategy documented here and the strategy that maximizes expected turn score would be similar for low turn score states. Furthermore, the two strategies should converge if the game scoring goal is increased from 10,000 points towards infinity. Unfortunately, the 10,000 point goal was already stretching both the memory limits and processing power of my computer. To reduce the size of the state space, I eliminated the three consecutive farkle penalty. This effectively eliminates the f and e state variables, reducing the size of the state space by a factor of 9, and also eliminates the need to model banked score states of less than 0 points, which shrinks things even more.

My Farkle Strategy Generator (FSG)⁶ produces farkle strategies that maximize expected turn score. The FSG outputs strategy tables that prioritize each turn state with a sequential preference number. Those states associated with higher expected turn scores have a higher preference number. To follow the strategy, you score thrown dice to move to the state with the highest number. This is exactly the same process you use to follow the game winning strategy presented in this document: you score your dice to move to the state with the highest probability of winning. By indexing the states from lowest win probability to highest, the two strategies can easily be compared.

First compare the two strategies for the case of a 10,000 point goal. Table 6W shows the strategy that maximizes your chances of winning the game (produced by value iteration) for an opening turn. Table 6E shows the strategy that maximizes expected turn score (produced by the FSG). Turn scores in both cases are truncated to a maximum of 1500. States with different preference numbers are highlighted. There are numerous slight differences, but they are obviously very similar: the banking thresholds are almost identical, and the state preference numbers deviate only slightly.

Table 6W. Preferred states for strategy that maximizes win probability for an opening turn of a 2-player game with a 10,000 point goal. Descrepancies with the strategy that maximizes expected score (shown in Table 6E) are highlighted.

Table 6E. Preferred states for strategy that maximizes expected score. Descrepancies with the strategy that maximizes win probability for an opening turn of a 2-player game with a 10,000 point goal (shown in Table 6W) are highlighted.

t	n
t	6	5	4	3	2	1
0	21	*	*	*	*	*
50	*	13	*	*	*	*
100	*	15	5	*	*	*
150	*	*	9	2	*	*
200	*	*	11	4	1	*
250	*	*	*	7	3	6
300	38	*	*	10	8	8
350	42	24	*	*	12	12
400	46	27	18	16	14	14
450	50	30	20	17	17	17
500	54	33	23	19	19	19
550	57	36	26	22	22	22
600	60	40	29	25	25	25
650	63	43	32	28	28	28
700	65	47	34	31	31	31
750	68	51	37	35	35	35
800	71	55	41	39	39	39
850	74	58	45	44	44	44
900	77	61	49	48	48	48
950	80	64	53	52	52	52
1000	83	67	56	56	56	56
1050	86	70	59	59	59	59
1100	89	72	62	62	62	62
1150	92	75	66	66	66	66
1200	95	78	69	69	69	69
1250	97	81	73	73	73	73
1300	100	84	76	76	76	76
1350	103	87	79	79	79	79
1400	106	90	82	82	82	82
1450	109	93	85	85	85	85
1500	112	96	88	88	88	88

t	n
t	6	5	4	3	2	1
0	21	*	*	*	*	*
50	*	13	*	*	*	*
100	*	14	5	*	*	*
150	*	*	9	2	*	*
200	*	*	11	4	1	*
250	*	*	*	7	3	6
300	39	*	*	10	8	8
350	42	24	*	*	12	12
400	46	27	18	16	15	15
450	50	30	20	17	17	17
500	54	33	23	19	19	19
550	58	36	26	22	22	22
600	61	40	29	25	25	25
650	64	43	32	28	28	28
700	67	47	35	31	31	31
750	70	51	37	34	34	34
800	72	55	41	38	38	38
850	75	59	45	44	44	44
900	78	62	49	48	48	48
950	81	65	53	52	52	52
1000	84	68	57	56	56	56
1050	87	71	60	60	60	60
1100	90	74	63	63	63	63
1150	94	76	66	66	66	66
1200	97	79	69	69	69	69
1250	100	82	73	73	73	73
1300	103	85	77	77	77	77
1350	106	88	80	80	80	80
1400	109	91	83	83	83	83
1450	112	93	86	86	86	86
1500	115	96	89	89	89	89

Tables 7W and 7E compare these same two strategies but with the scoring goal increased to 30,000 points. Here the two strategies are seen to be almost identical for these low turn score states. The two strategies do appear to be converging as the scoring goal increases. While this doesn't assure strategy correctness, it offers convincing evidence of correctness in at least this limiting case.

Table 7W. Preferred states for strategy that maximizes win probability for an opening turn of a 2-player game with a 30,000 point goal. Descrepancies with the strategy that maximizes expected score (shown in Table 7E) are highlighted.

Table 7E. Preferred states for strategy that maximizes expected score. Descrepancies with the strategy that maximizes win probability for an opening turn of a 2-player game with a 30,000 point goal (shown in Table 7W) are highlighted.

t	n
t	6	5	4	3	2	1
0	21	*	*	*	*	*
50	*	13	*	*	*	*
100	*	14	5	*	*	*
150	*	*	9	2	*	*
200	*	*	11	4	1	*
250	*	*	*	7	3	6
300	39	*	*	10	8	8
350	42	24	*	*	12	12
400	46	27	18	16	15	15
450	50	30	20	17	17	17
500	54	33	23	19	19	19
550	58	36	26	22	22	22
600	61	40	29	25	25	25
650	64	43	32	28	28	28
700	67	47	34	31	31	31
750	70	51	37	35	35	35
800	72	55	41	38	38	38
850	75	59	45	44	44	44
900	78	62	49	48	48	48
950	81	65	53	52	52	52
1000	84	68	57	56	56	56
1050	87	71	60	60	60	60
1100	90	74	63	63	63	63
1150	93	76	66	66	66	66
1200	96	79	69	69	69	69
1250	99	82	73	73	73	73
1300	103	85	77	77	77	77
1350	106	88	80	80	80	80
1400	109	91	83	83	83	83
1450	111	94	86	86	86	86
1500	114	97	89	89	89	89

t	n
t	6	5	4	3	2	1
0	21	*	*	*	*	*
50	*	13	*	*	*	*
100	*	14	5	*	*	*
150	*	*	9	2	*	*
200	*	*	11	4	1	*
250	*	*	*	7	3	6
300	39	*	*	10	8	8
350	42	24	*	*	12	12
400	46	27	18	16	15	15
450	50	30	20	17	17	17
500	54	33	23	19	19	19
550	58	36	26	22	22	22
600	61	40	29	25	25	25
650	64	43	32	28	28	28
700	67	47	35	31	31	31
750	70	51	37	34	34	34
800	72	55	41	38	38	38
850	75	59	45	44	44	44
900	78	62	49	48	48	48
950	81	65	53	52	52	52
1000	84	68	57	56	56	56
1050	87	71	60	60	60	60
1100	90	74	63	63	63	63
1150	94	76	66	66	66	66
1200	97	79	69	69	69	69
1250	100	82	73	73	73	73
1300	103	85	77	77	77	77
1350	106	88	80	80	80	80
1400	109	91	83	83	83	83
1450	112	93	86	86	86	86
1500	115	96	89	89	89	89

Effects of the banked score lower bound

The calculated strategy is non-optimal when playing in regions of the state space where either your banked score, or your opponent's banked score is near the lower bound. This discrepancy between the calculated strategy and the truly optimal strategy is further exacerbated as the consecutive farkle count associated with a banked score near the bound increases.

To estimate the extent of the discrepancy, I lowered the bound on banked scores from -2500 to -3000 and regenerated the strategy matrix. One expects the relative deviation between the win probabilities of corresponding states in the two strategies to decrease as the minimum of the player's banked scores increases. To see the rate of convergence, for each possible banking score $B$ ($-2500 <= B < 10,000$), I found the maximum relative deviation between win probabilities (calculated as ${{2|w1-w2|}\over{w1+w2}}$) over all game states satisfying $min(b,d) >= B$. Table 8 lists selected results. Note that each time B increases by 500 points (which just happens to be the triple farkle penalty), the maximum relative deviation consistently decreased by 2 orders of magnitude (with one of those two orders of magnitude coming with the last 50 point increase). Given this exponential convergence rate, for any given $B$, the maximum relative deviation between the strategies for $L=-2500$ and $L=-\infty$ must be only slightly greater (on the order of 1%) than the maximum relative deviation between the strategies for $L=-2500$ and $L=-3000$.

Table 8. For two different farkle strategies calculated with $L=-2500$ and $L=-3000$, the maximum relative deviation of win probabilities over all states satisfying $min(b,d) >= B$ is shown.

B	Maximum Relative Deviation
-2500	0.296463834
-2050	0.028572943
-2000	0.003438359
-1550	0.000249319
-1500	0.000029516
-1050	0.000002074
-1000	0.000000234
-550	0.000000016
-500	0.000000001

Note that the state win probabilities for two calculated strategies do not have to be identical for the rules about when to bank, and when to roll, and how to score the dice in each situation to be to be identical. So how close do the calculated win probabilities and the optimal win probabilities need to be to make it likely we are playing with a truly optimal strategy? Using state preference charts like those shown in the previous section, I compared complete turn strategies for all combinations of state variables $b$, $d$, $f$, and $e$. If two state preference charts are identical, then the strategies are identical. Once I found a bizarre state preference discrepancy associated with a relative difference in win probability on the same order as the convergence threshold for the entire state matrix. I ignored this and any others like it that might be lurking in the tables. Table 9 shows the banked score and consecutive farkle count thresholds above which no more strategy discrepancies otherwise appeared. So for example if both you and your opponent have at most 1 consecutive farkle, and you both have banked scores of at least -1500 points, then there are no discrepancies between the two strategies.

So although there's no way to pick a lower bound $L$ that will assure all turn strategies will be truly optimal for a given working range of banked scores and consecutive farkle counts, driving the maximum relative deviation down to within about 1 part in 10⁸ seems to assure that the vast majority of turn strategies (and likely all of them) are optimal. Given the convergence rate seen above, this means that as long as both players are about 2000 points above the lower bound on banked scores, all turn strategies are likely to be optimal.

Table 8. When both players banked scores and consecutive farkle counts satisfy the listed constraints the two strategies that maximize win probability for -2500 and -3000 point lower bound on banked scores are identical. The maximum relative deviation over all game states satisfying these constraints is also listed.

Minimum Banked Score	Maximum Farkle Count	Maximum Relative Deviation
-1500	0	0.000001175
-1200	1	0.000001529
-700	2	0.000000070

Conclusions

Extended forms of a Markov Decision Process (MDP) and value iteration were devised that allow more efficient modeling of 2-player Farkle than is possible with a standard MDP. Using this model, the strategy that maximizes win probability using Facebook scoring rules has been determined under the constraint that banked scores are lower bounded to a large negative score. The strategy is shown to be unaffected by this bound in regions where banked scores are at least 2000 points above the bound. The strategy appears to converge (as expected) to the strategy that maximizes expected turn score as the end-of-game scoring goal is increased.

With nearly a half billion game states, and so many different dice rolls and scoring options possible from each of these different states, it wasn't immediately obvious whether the problem was solvable with a home computer. I was quite pleased when I saw the first value iteration update complete, and even more pleased when I saw the matrix was actually converging! It was a nice problem in the sense that it's complexity puts finding the solution at the limits of what can be done with a home computer (or at least at the limits of what I can do with a home computer).

The optimal strategy showed many expected behaviors including more aggressive play when behind, and more conservative play when ahead. But it also showed many behaviors that were unexpected to me, such as playing aggressively even with a sizable lead when nearing the game goal of 10,000 points. Numerous times I was convinced I had discovered a flaw in the strategy (and therefore either a bug in my code or an error in my analysis) only to realize later that the strategy was actually reasonable.

Next Steps

The very limited view I've offered into the game winning strategy is unsatisfying. Time permitting, I intend to develop a web page that will allow you to explore the gargantuan optimal Farkle strategy in detail, and perhaps even play a game of farkle against a perfect computer opponent. I am motivated by your interest, so please leave a comment if you'd like to see it.

A Facebook Farkle game is won by the first player to bank 10,000 or more points, but there are other variations to end-of-game rules. In Zilch, for example, the other player always gets a final go at the dice. Some people like to assure that both players get the same number of turns. In this latter case, the optimal strategies played by the two players are not even the same — with the player going second clearly having a strategic advantage. I think any of these could be modeled, but they are each very different, and not currently supported by my solver. These are ickier problems, and I don't intend to work on them anytime soon.

References

T Neller and C Presser, "Optimal Play of the Dice Game Pig," The UMAP Journal 25(1) (2004), pp. 25-47.
R. E. Bellman. Dynamic Programming. Princeton University Press, Princeton, NJ, 1957.
Cap Khoury, Multinomial Coefficients and Farkle, Jul. 2009.
M Busche, Maximizing Expected Scores in the Game of Zilch, Matt's Maniacal Musings, Aug. 2010.
E Farmer, Analysis of Farkle, Possibly Wrong, Apr. 2013.
Farkle Strategy Generator, Matt's Maniacal Musings.

Tracking the States of a Set of Objects by PartitionAn Introduction to the Partition Container

matt — Wed, 27 Mar 2013 17:58:54 +0000

In this post I share a useful programming technique I first saw used twenty-some years ago while reading through some C code. The technique combined aspects of an array and a linked-list into a single data construct. I've here abstracted the technique into a reusable container I call a partition. (If someone knows of some library in some language that offers a similar container I'd appreciate a reference.)

A partition is a high performance container that organizes a set of N sequential integer IDs (which together are called the domain) into an arbitrary number of non-overlapping groups. (I.e., each ID of the domain can be in at most one group.) The functionality of a partition includes these constant time operations:

Given an ID, determine the group to which the ID belongs (if any).
Given a group, find an ID in that group (if any).
Given an ID, move that ID to a particular group (simultaneously removing it from the group with which it was previously a member if any).

None of the above operations use the heap and are each considerably faster than even a single push to standard implementations of a doubly linked list.

A partition has not one, but two fundamental classes: Domain and Group. The set of sequential IDs in a partition are defined by a Domain object; and a Group is a list of IDs from a Domain. Domain and Group each have two templatization parameters M and V. IDs can be bound to a user-defined member object of type M and Groups can be bound to a user defined value object of type V. Together these associations enable mappings from objects of type M (with known IDs) to objects of type V, and conversely from objects of type V (with known Groups) to a list of objects of type M.

The partition is potentially useful any time objects need to be organized into groups; and that happens all the time! In this post, I show how you can use a partition to track the states of a set of like objects. This is just one possible usage of a partition and is intended only as a tutorial example of how you can use this versatile storage class.

Software Downloads and Documentation

Motivation

Suppose you have a set of N objects of some common type each having an independent state variable. Perhaps you have hardware monitoring the bit-error-rates on N communications ports and it is your job to categorize them into signal quality states of CLEAR, DEGRADED or FAILED. Or perhaps you are tracking the BUSY-IDLE states of N processors in a massively parallel computer processing environment. Or you might be tracking the activity of N threads in a thread pool. Obviously, such problems abound.

A common use-case for such systems is to find one or more objects in a particular state. For example, you may need to find one port in the CLEAR state for allocation to some new network communications service; or you may want to find and list all processors in the BUSY state to perform an audit. If the number of objects is small, you could simply iterate over them all until you find the objects with the desired state; but if you have many objects, this approach can be highly inefficient. To solve such problems for large sets, you will naturally use containers to hold objects that share a common state; but what type of container should you use?

For C++ you might group your objects into standard STL maps. Similarly, for Java you could use HashMaps from the java.util package. These classes provide convenient solutions, but such heavy-weight containers may be prohibitively slow for some applications.

A lighter-weight alternative is to use linked-lists, but then a state change of, for example, a circuit from CLEAR to DEGRADED now requires you first find the circuit in the CLEAR list so that it can be removed from that list before adding it to the DEGRADED list; this search can again be slow for large sets. In the C++ domain you can remedy this by giving each circuit object an iterator that tracks its own location in the linked-list of which it is currently a member. This is actually an effective (if slightly cumbersome) solution. In the Java domain, such an approach is not possible if you restrict yourself to standard library containers which suffer from the limitation that iterators point between two members and are in any case necessarily invalidated whenever the container they reference is updated. (Oh, the horror! I need a Prozac!)

The partition container is specialized to handle exactly this type of problem and is both easier to use and faster than any of the above alternatives.

An Example

The problem

A partition is often useful for solving resource management problems. For our working example, we'll track the state of a box of eight crayons being shared by my two girls Becky and Cate to draw pictures. (This is not the most compelling resource management problem, but it is at least easy to understand and certainly doesn't look like any proprietary software I've ever written.) I want to show how you might model a resource that has multiple states, so we'll give each crayon both a User state (which can be one of BECKY, CATE, or BOX), and a Quality state (which can be one of SHARP or DULL). The programming problem is merely to efficiently keep track of this state information for all eight crayons and support a variety of use cases, like "find a sharp crayon in the box", "get the state of the red crayon", or "return all of Cate's crayons to the box".

A solution

#include "partition.hpp"
#include 
#include 
#include 

using namespace std;
using namespace partition;

enum Color { NONE_AVAILABLE = -1, RED, ORANGE, YELLOW, GREEN, BLUE,
                                  VIOLET, BROWN, BLACK, NUM_CRAYONS };
string crayonName[] = { "red", "orange", "yellow", "green", "blue",
                        "violet", "brown", "black" };

enum User { BOX, CATE, BECKY, NUM_USER };
string userName[] = { "Box", "Cate", "Becky" };

enum Quality { SHARP, DULL, NUM_QUALITY };
string qualityName[] = { "Sharp", "Dull" };

class CrayonState {
    private:
        User user;
        Quality quality;
    public:
        CrayonState() {}
        CrayonState(User user, Quality quality)
            :   user(user), quality(quality) {
        }
        User getUser() const { return user; }
        Quality getQuality() const { return quality; }
        friend ostream& operator << (ostream& os, const CrayonState& cs);
};

inline ostream& operator << (ostream& os, const CrayonState& cs) {
    return os << "(" << setw(5) << userName[cs.user] << ", " <<
                 setw(5) << qualityName[cs.quality] << ")";
}

typedef Group CrayonGroup;

inline ostream& operator << (ostream& os, const CrayonGroup& g) {
    os << g.getValue() << ":";
    for(CrayonGroup::ConstIterator i = g.front(); !i.isAfterBack(); ++i)
        os << " " << (*i).member;
    return os;
}

class CrayonManager {
    private:
        Domain crayons;
        CrayonGroup state[NUM_USER][NUM_QUALITY];

    public:
        CrayonManager() {
            for(int i = 0; i < NUM_CRAYONS; ++i)
                crayons.addEntry(i, crayonName[i]);
            for(int u = 0; u < NUM_USER; ++u) {
                for(int q = 0; q < NUM_QUALITY; ++q) {
                    state[u][q].setDomain(crayons);
                    state[u][q].setValue(CrayonState(User(u),Quality(q)));
                }
            }
            state[BOX][SHARP].addAll();
        }

        const CrayonState& getState(Color c) const {
            return crayons.getValue(c);
        }

        User getUser(Color c) const {
            return crayons.getValue(c).getUser();
        }

        Quality getQuality(Color c) const {
            return crayons.getValue(c).getQuality();
        }

        void setState(Color c, User u, Quality q) {
            if(c != NONE_AVAILABLE) {
                cout << CrayonState(u,q) << " <- " << getState(c) <<
                    ": " << crayonName[c] << endl;
                state[u][q].addBack(c);
            }
        }

        void setUser(Color c, User u) {
            if(c != NONE_AVAILABLE)
                setState(c, u, getQuality(c));
        }

        void setQuality(Color c, Quality q) {
            if(c != NONE_AVAILABLE)
                setState(c, getUser(c), q);
        }

        Color find(Quality q, User u = BOX) const {
            if(state[u][q].size() > 0)
                return (Color) state[u][q].peekFront().id;
            else
                return NONE_AVAILABLE;
        }

        Color findPreferred(Quality q = SHARP, User u = BOX) const {
            if(state[u][q].size() > 0)
                return (Color) state[u][q].peekFront().id;
            else
                return find(q == SHARP ? DULL : SHARP, u);
        }

        Color findPreferred(Color c, Quality q=SHARP, User u=BOX) const {
            if(getUser(c) == u)
                return c;
            else
                return findPreferred(q, u);
        }

        void moveAll(User from, User to, Quality q) {
            cout << CrayonState(to, q) << " <- " << state[from][q] << endl;
            state[to][q].addBack(state[from][q]);
        }

        void moveAll(User from, User to) {
            for(int q = 0; q < NUM_QUALITY; ++q)
                moveAll(from, to, Quality(q));
        }

        friend ostream& operator << (ostream& os, const CrayonManager& cm);
};

inline ostream& operator << (ostream& os, const CrayonManager& cm) {
    for(int u = 0; u < NUM_USER; ++u)
        for(int q = 0; q < NUM_QUALITY; ++q)
            os << cm.state[u][q] << endl;
    return os;
}

int main(int argc, char** argv) {
    CrayonManager manager;

    Color c;

    // Becky grabs three crayons, preferrably orange, blue, and a sharp.
    // She dulls the first two.
    //
    c = manager.findPreferred(ORANGE);
    manager.setState(c, BECKY, DULL);

    c = manager.findPreferred(BLUE);
    manager.setState(c, BECKY, DULL);

    c = manager.findPreferred();
    manager.setUser(c, BECKY);

    // Cate grabs two crayons, preferrably blue and green.
    // She dulls the first one.
    //
    c = manager.findPreferred(BLUE);
    manager.setState(c, CATE, DULL);

    c = manager.findPreferred(GREEN);
    manager.setUser(c, CATE);

    // Becky returns all her crayons to the box...
    //
    manager.moveAll(BECKY, BOX);

    // ...and then grabs a sharp one...
    //
    c = manager.find(SHARP);
    manager.setUser(c, BECKY);

    // ...and makes it dull...
    //
    manager.setQuality(c, DULL);

    // Cate returns her dullest crayon to the box.
    //
    c = manager.findPreferred(DULL, CATE);
    manager.setUser(c, BOX);

    // Cate notices the sharp ones are disappearing fast,
    // so she grabs all the remaining sharp ones...
    //
    manager.moveAll(BOX, CATE, SHARP);

    // Becky gets mad and shows dad.
    //
    cout << "\\nFinal States:\\n" << manager << endl;

    return 0;
}

Note: bypass animation by double-clicking the step buttons.

STEP 1

Define an enum for the 8 crayon colors. The enum values of these crayons (from 0 to 7) will form the ids of the Domain. Also define a string array that maps each crayon to its string name.

STEP 2

Likewise, define enums and string arrays for the User and Quality state variables.

STEP 3

Define a simple class to bind the two state variables together into a single compound state. One of these will be bound to each Group. They won't change value, so we won't need any setters.

STEP 4

Now create a class CrayonManager to track the crayon states. Start by giving CrayonManager a Domain to hold the crayons. We'll use the partition to map from a crayon color to a CrayonState; accordingly, the Domain has templatization parameters string and CrayonState. Alternatively, we could have defined two Domains (one to map crayons to user state, and another to map crayons to quality state), but it turns out to be more useful to have a single Domain that maps to a multi-valued state.

STEP 5

Next we'll need some Groups. Each Group will hold all the crayons in a particular compound state, so we define a double-array of Groups indexed by user state and quality state.

STEP 6

Initialize the domain by loading up all the crayon ids (zero through seven) and mapping each to its color string.

STEP 7

Initialize each Group by binding it to the Domain object and setting its value to the CrayonState it represents.

STEP 8

The last initialization step is to give all the crayons an initial state. We'll start with them all sharp and in the box.

STEP 9

Add a method to get the current compound state of a crayon. Remember that the state of a crayon is defined by its Group membership. The Domain's getValue() method finds the Group of the given crayon, and returns the user-assigned value of that Group (which is just the CrayonState object we set above). I've also added a couple of convenience methods to return just the crayon's User state and Quality state.

STEP 10

The setState method sets a crayon's full compound state by moving that crayon to the Group representing the given state. Note that we only have to add the crayon to its new Group — the crayon is removed from its previous Group automatically. I've also provided convenience methods for setting each component of a crayon's state independently. For simplicity, I'll largely be ignoring exceptional conditions, but here I at least go to the effort to ignore attempts to set the state of the NONE_AVAILABLE crayon.

STEP 11

Add a method to find a crayon with a particular quality and user state; if no crayon with that compound state exists, return NONE_AVAILABLE.

STEP 12

findPreferred will try to find a crayon with the requested quality and user state. If none exists, it will look for a crayon with the opposite quality but that same user state. If the user has no crayons at all, it returns NONE_AVAILABLE.

STEP 13

Add an overloaded version of findPreferred that first looks for a particular color in the given user state, but if that fails, it works just like the first version of findPreferred above.

STEP 14

Add a method to transfer all crayons of a particular quality from one user to another, and another method to move all crayons (independent of quality) from one user to another.

STEP 15

So we can see what's going on, update the state setters to print the state changes that are taking effect, and define a streaming operator for the crayon manager itself that prints the current state of all crayons.

STEP 16

Finally, add a main routine to exercise things. To do it right, I'd have to define what to do when any of the find methods return NONE_AVAILABLE, but for this tutorial I just pass these failures onto the state setters which ignore them.

RUN IT

Compile the program and run it.

> g++ -o crayons -I ../src crayons.cpp
> ./crayons
(Becky,  Dull) <- (  Box, Sharp): orange
(Becky,  Dull) <- (  Box, Sharp): blue
(Becky, Sharp) <- (  Box, Sharp): red
( Cate,  Dull) <- (  Box, Sharp): yellow
( Cate, Sharp) <- (  Box, Sharp): green
(  Box, Sharp) <- (Becky, Sharp): red
(  Box,  Dull) <- (Becky,  Dull): orange blue
(Becky, Sharp) <- (  Box, Sharp): violet
(Becky,  Dull) <- (Becky, Sharp): violet
(  Box,  Dull) <- ( Cate,  Dull): yellow
( Cate, Sharp) <- (  Box, Sharp): brown black red

Final States:
(  Box, Sharp):
(  Box,  Dull): orange blue yellow
( Cate, Sharp): green brown black red
( Cate,  Dull):
(Becky, Sharp):
(Becky,  Dull): violet

>

Conclusions

A software partition is an exceptionally fast container useful for organizing a set of like objects into groups. Applicable programming problems are common: I personally have used a partition-based approach to achieve elegant solutions to circuit state modeling problems, problems in graph theory, and others. If you too are a software developer, I suspect you'll find uses for Partition as well if you only look for them.

Software Downloads and Documentation

This software is protected by the GNU general public license version 3. This is free software (as defined by the License), but I'd very much appreciate it if you leave a comment to let me know if and/or how you've found the software useful.

I have not gone to the effort to put together a commercial license, but if someone is interested, I'll make one available.

Although I've used partition programming techniques at multiple companies during my software career, this design and implementation is new and should be considered experimental. I'm not sure I'm satisfied with the public interface (particularly due to arguably cumbersome method names that resulted from the decision to not include a reverse iterator). Future versions may not be backwards compatible. This software has been extensively unit-tested, but only with gcc version 4.4.3 on Ubuntu 10.04.4. Please, please contact me if you discover platform compatibility problems; bugs; design deficiencies; and/or documentation errors (including misspellings or grammatical errors). Thanks!

	Downloads	Documentation
C++	partition_c++_0.1.1.tgz	0.1.1
	partition_c++_0.1.1.zip
Java	partition_java_0.1.tgz	0.1
	partition_java_0.1.zip

Solving Polyomino and Polycube Puzzles Algorithms, Software, and Solutions

matt — Wed, 23 Mar 2011 19:05:24 +0000

Image by Matt Busche

I've implemented a set of backtrack algorithms to find solutions to various polyomino and polycube puzzles (2-D and 3-D puzzles where you have to fit pieces composed of little squares or cubes into a confined space). Examples of such puzzles include the Tetris Cube, the Bedlam Cube, the Soma Cube, and Pentominoes. My approach to the problem is perhaps unusual in that I've implemented many different algorithmic techniques simultaneously into a single puzzle solving software application. I've found that the best algorithm to use for a problem can depend on the size, and dimensionality of the puzzle. To take advantage of this, when the number of remaining pieces reaches configurable transition thresholds my software can turn off one algorithm and turn on another. Three different algorithms are implemented: de Bruijn's algorithm, Knuth's DLX, and my own algorithm which I call most-constrained-hole (MCH). DLX is most commonly used with an ordering heuristic that picks the hole or piece with fewest fit choices; but other simple ordering heuristics are shown to improve performance for some puzzles.

In addition to these three core algorithms, a set of constraints are woven into the algorithms giving great performance benefits. These include constraints on the volumes of isolated subspaces, parity (or coloring) constraints, fit constraints, and constraints to take advantage of rotational symmetries.

In this (rather long) blog entry I present the algorithms and techniques I've implemented and share my insights into where each works well, and where they don't. You can also download my software source, an executable version of the solver, and the solutions to various well known puzzles.

Motivation

Backtrack Algorithms

Puzzle Complexity

De Bruijn's Algorithm

Dancing Links (DLX) Algorithm

DLX Description

Ordering Heuristics

Most Constrained Hole (MCH) Algorithm

Combining the Algorithms

Software Optimizations

Bit Fields

Early MCH Fit Count Exit

Fast Permutation Algorithm

Constraints

Backtrack Triggers

Parity Constraint Violations

Volume Constraint Violations

Image Filters

Puzzle Bounds Constraint Violations

Rotational Redundancy Constraint Violations

Parity Constraint Violations

Volume Constraint Violations

Fit Constraint Violations

Image Filter Performance (or lack thereof)

Performance Comparisons

Test Case P: Pentominoes in a 10x6 Box

Test Case OP: One-sided Pentominoes in a 30x3 Box

Test Case TC: Tetris Cube

Test Case HP: Hexominoes in a 15x14 Parallelogram

Test Case HBD: Hexominoes Box-in-a-Diamond

Motivation

My original impetus for writing this software was an annoyingly difficult little puzzle called the Tetris Cube. The Tetris Cube consists of the twelve oddly-shaped plastic pieces shown in Figure 1 which you have to try to fit into a cube-shaped box. Each piece has a shape you could make yourself by sticking together little cubes of fixed size. You would need 64 cubes to make them all and accordingly, the puzzle box measures 4x4x4 — just big enough to hold them all. (This is an example of a polycube puzzle: a 3-D puzzle where all the pieces are formed from little cubes. We'll also be looking at polyomino puzzles: 2-D puzzles where all the pieces are formed from little squares.)

Figure 1. POV-Ray image of the 12 pieces of the Tetris Cube puzzle. Each piece is given a single character label used by the software to output solutions.

The appeal of the Tetris Cube to me is three-fold. First, it's intriguing (and surprising to most folks) that a puzzle with only twelve pieces could be so wicked hard. I had spent much time in my youth looking for solutions to the far simpler (but still quite challenging) two-dimensional Pentominoes puzzle, so when I first saw the Tetris Cube in a gaming store about three years ago, I knew that with the introduction of the third dimension that the Tetris Cube was an abomination straight from Hell. I had to buy it. Since then I've spent many an hour trying to solve it, but I've never found even one of its nearly 10,000 solutions manually.

Second, I enjoy the challenge of visualizing ways of fitting the remaining pieces into the spaces I have left, and enjoy the logic you can apply to identify doomed partial assemblies.

Third, I think working any such puzzle provides a certain amount of tactile pleasure. I should really buy the wooden version of this thing.

But alas, I think that short of having an Einstein-like parietal lobe mutation, you will need both persistence and a fair amount of luck to find even one solution. If I ever found a solution, I think I'd feel not so much proud as lucky; or maybe just embarrassed that I wasted all that time looking for the solution. In this sense, I think perhaps the puzzle is flawed. But for those of you up for a serious challenge, you should really go buy the cursed thing! But do yourself a favor and make a sketch of the initial solution and keep it in the box so you can put it away as needed.

Having some modest programming skill, I decided to kick the proverbial butt of this vexing puzzle back into the fiery chasm from whence it came. My initial program, written in January 2010, made use of only my own algorithmic ideas. But during my debugging, I came across Scott Kurowski's web page describing the software he wrote to solve this very same puzzle. I really enjoyed the page and it motivated me to share my own puzzle solving algorithm and also to read up on the techniques others have devised for solving these types of puzzles. In my zeal to make the software run as fast as possible, over the next couple of weeks I incorporated several of these techniques as well as a few more of my own ideas. Then I stumbled upon Donald Knuth's Dancing Links (DLX) algorithm which I thought simply beautiful. But DLX caused me two problems: first it used a radically different data model and would not be at all easy to add to my existing software; second it was so elegant, I questioned whether there was any real value in the continued pursuit of this pet project.

Still I wasn't sure how DLX would compare to and possibly work together with the other approaches I had implemented. The following November, curiosity finally got the better of me and I began to lie awake at night thinking about how to to integrate DLX into my polycube solver software application.

Backtrack Algorithms

The popular algorithms used to solve these types of problems are all recursive backtracking algorithms. With one algorithm that falls in this category you sequentially consider all the ways of placing the first piece; for each successful placement of that piece, you examine all the ways of placing the second piece. You continue placing pieces in this way until you find yourself in a situation where the next piece simply won't fit in the box, at which point you back up one step (backtrack) and try the next possible placement of the previously placed piece. Following this algorithm to its completion will examine all possible permutations of piece placements including those placements that happen to be solutions to the puzzle. This approach typically performs horribly. Another similar approach is to instead consider all the ways you can fill a single target open space (hole) in the puzzle; for each possible piece placement that fills the hole, pick another hole and consider all the ways to fill it; etc. This approach can also behave quite badly if you choose your holes willy-nilly, but if you make good choices about which hole to fill at each step, it performs much better. But in general you can mix these two basic approaches so that at each step of your algorithm you can either consider all ways to fill a particular hole, or consider all ways to place a particular piece. Donald Knuth gives a nice abstraction of this general approach that he dubbed Algorithm X.

Puzzle Complexity

To appreciate the true complexity of these types of problems it is perhaps useful to examine the Tetris Cube more closely. First note that most of the pieces have 24 different ways you can orient (rotate) them. (To see where the number 24 comes from, hold a piece in your hand and see that you have 6 different choices for which side faces up. After picking a side to face up, you still have 4 more choices for how to spin the piece about the vertical axis while leaving the same side up.) Two of the pieces, however, have symmetries that reduce their number of uniquely shaped orientations to just 12. For each orientation of a piece, there can be many ways to translate the piece in that orientation within the confines of the box. I call a particular translation of a particular orientation of a particular piece an image.

If we stick with the algorithmic approach of recursively filling empty holes to look for solutions, then we'll start by picking just one of the 64 holes in the puzzle cube (call the hole Z₁); and then one-by-one try to fit each of the pieces in that hole. For each piece, all unique orientations are examined; and for each orientation, an attempt is made to place each of the piece's constituent cubes in Z₁. The size of a piece multiplied by its number of unique orientations I loosely call the complexity of a piece, which gives the total number of images of a piece that can possibly fill a hole. If, for example, a piece has 6 cubes and has 24 unique orientations, then 144 different images of that piece could be used to fill any particular hole. The complexity of the twelve Tetris Cube pieces are shown in Table 1. Each time a piece is successfully fitted into Z₁, our processing of Z₁ is temporarily interrupted while the whole algorithm is recursively repeated with the remaining pieces on some new target hole, Z₂. And so on, and so on. Each time we successfully place the last piece, we've found a solution.

Table 1. Piece Complexity
Piece Name	Size	Unique Orientations	Complexity
A	6	24	144
B	6	24	144
C	5	24	120
D	5	24	120
E	6	24	144
F	5	24	120
G	5	12	60
H	5	24	120
I	5	24	120
J	5	12	60
K	5	24	120
L	6	24	144

The number of steps in such an algorithm cannot be determined without actually running it since the number of successful fits at each step is impossible to predict and varies wildly with which hole you choose to fill. It is useful however (or at least mildly entertaining) to consider how many steps you'd have if you didn't backup when a piece didn't fit, but instead let pieces hang outside the box, or even let two pieces happily occupy the same space (perhaps slipping one piece into Buckaroo Banzai's eighth dimension) and blindly forging ahead until all twelve pieces were positioned. In this case, the total number of ways to place pieces is easily written down. There are 12 × 11 × 10 . . . × 1 = 12! possible orderings of the pieces. And for each such ordering, each piece can be placed a number of times given by its complexity. So the total number of distinct permutations of pieces that differ either in the order the pieces are placed, or in the translation or orientation of those pieces is:

12! × 144⁴ × 120⁶ × 60² = 2,213,996,395,638,416,883,056,640,000,000,000

That's 2.2 decillion. The total number of algorithm steps would be a more than double that (since each piece placed also has to be removed). But this is just silliness: any backtracking algorithm worth its salt (and none of them are very salty) will reduce the number of steps to well below a quadrillion, and a good one can get the number of steps down to the tens of millions. I now examine some specific algorithms and explain how they work.

De Bruijn's Algorithm

The first algorithm examined was first formulated independently by John G Fletcher and N.G. de Bruijn. I first stumbled upon the algorithm when reading Scott Kurowski's source code for solving the Tetris Cube. To read Fletcher's work you'll either need to find a library with old copies of the Communications of the ACM or drop $200.00 for online access. (I've yet to do either.) De Bruijn's work can be viewed online for free, but you'll need to learn Dutch to read it. (It's on my to-do list.) Despite my ignorance of the two original publications on the algorithm, I'll take a shot at explaining it here. With no intended sleight to Fletcher, from here on, I simply refer to the algorithm as de Bruijn's algorithm. (I feel slightly less foolish calling it de Bruijn's algorithm since I have at least examined and understood the diagrams in his paper.)

De Bruijn's algorithm takes the tack of picking holes to fill. Now I previously said that when filling a hole, that for each orientation of each piece, an attempt must be made to place each of the piece's constituent cubes in that hole; but with de Bruijn's technique, only one of the cubes must be attempted. This saves a lot of piece fitting attempts. To understand how this works, first assume the puzzle is partially filled with pieces. De Bruijn's algorithm requires you pick a particular hole to fill next. A simple set of nested for loops will find the correct hole. The code could look like this:

        GridPoint* Puzzle::getNextBruijnHole()
        {
            for(int z = 0; z < zDim; ++z)
                for(int y = 0; y < yDim; ++y)
                    for(int x = 0; x < xDim; ++x)
                        if(grid[x][y][z]->occupied == false)
                            return grid[x][y][z];
            return NULL;
        }

This search order is also shown visually on the left hand side of Figure 2.

Figure 2. Diagram showing de Bruijn's tiling technique. The puzzle is scanned in x, y, z order until a hole is found. Only the root node of an oriented piece must be tested for fit. No other cube in the piece while held in that orientation can possibly fit at the target hole.

Because of the search order there can be no holes in the puzzle with a lesser z value than the target hole. Similarly, there can be no holes with the same z value as the target hole having a lesser y value. And finally, there can be no hole with the same y and z values as the target hole having a lesser x value.

Now consider a particular orientation of an arbitrary piece like the one shown in the center of Figure 2. Because there can be no holes with a lesser z value than the target hole, it would be pointless to attempt to place either of its two top cubes in the hole. That would only force the lower cubes of the piece into necessarily occupied GridPoints of the puzzle. So only those cubes at the bottom of the piece could possibly fit in the target hole. But of those three bottom cubes, the one with the greater y value (in the foreground of the graphic) can't possibly fit in the hole because it would force the other two bottom tier pieces into occupied puzzle GridPoints at the same height as the hole but with lesser y value. Applying the same argument along the x axis leads to the conclusion that for any orientation of a puzzle piece, only the single cube with minimum coordinate values in z, y, x priority order (which I call the root tiling node of the piece) can possibly fit the hole. This cube is highlighted pink in Figure 2.

So with de Bruijn's algorithm, a piece with 6 cubes and 24 orientations would only require 24 fit attempts instead of 144 at a given target hole. This allows the algorithm to fly through piece permutations in a hurry.

Figure 3. The twelve pentomino pieces with their single character names and a solution to the 10x6 pentomino puzzle.

De Bruijn's paper focused on the 10x6 pentomino puzzle, perhaps the most famous of all polyomino and polycube puzzles. The puzzle pieces in this problem consist of all the ways you can possibly stick 5 squares together to form rotationally unique shapes. There are 12 such pieces in all and each is given a standard single character name. Figure 3 shows the twelve pieces with their names as well as one of the 2339 ways you can fit these pieces into a 10x6 box. To be accurate, de Bruijn only used the algorithmic steps described above to place the last 11 pieces in this puzzle. He forced the X piece to be placed first in each of seven possible positions in the lower left quadrant of the box. This was done to eliminate rotationally redundant solutions from the search, and significantly sped up processing. But where possible I'd like to avoid optimization techniques that require processing by University professors. So when I speak of the de Bruijn algorithm, I do not include this special case processing. This restriction significantly weakens the algorithm. (I found it to take ten times longer to find all solutions to the 10x6 pentomino puzzle without this trick.) As I explain later, I've implemented an image filter that can constrain a piece to eliminate rotationally redundant solutions from the search. Applying this filter to the 10x6 pentomino puzzle algorithmically reproduces de Bruijn's constraint on the X piece.

Figure 4. The de Bruijn algorithm stumbling over a hole that can't be filled. The troublesome hole is created in step 2, but is not cleared until step 195.

In Figure 4, I've captured an excerpt of de Bruijn's algorithm working on the 10x6 pentomino puzzle at a point where it's behaving particularly badly. It reveals an interesting weakness of the algorithm: it can be slow to recognize a position in the puzzle that's clearly impossible to fill. The algorithm doesn't recognize this problem until it selects the troublesome hole as a fill target, but even then it won't back up all the way to the point where the hole is freed from its confinement: it only backs up one step. So depending on how far away the isolated hole is in the hole selection sequence at the time the hole appeared, it may get stuck trying to fill the hole many many times. Because pentominoes pieces are all fairly small, and because the algorithm uses a strict packing order from bottom to top and from left to right, such troublesome holes can never be that far away from the current fill target and are thus usually discovered fairly quickly. The example I've given may be among the most troublesome to the algorithm, but things can get worse if you are working with larger pieces, or if you are working in 3 dimensions instead of 2. In either case, unfillable holes can appear further down the hole selection sequence and the algorithm can stumble over them many more times before the piece that created the problem is finally removed.

The next algorithm examined does not suffer from this weakness.

Dancing Links (DLX) Algorithm

Dancing Links (DLX) is Donald Knuth's algorithm for solving these types of puzzles. The DLX data model provides a view of each remaining hole and each remaining piece and can pick either a hole to fill or a piece to place depending on which (among them all) is deemed most advantageous.

DLX Description

Knuth's own paper on DLX is quite easy to understand, but I'll attempt to summarize the algorithm here. Create a matrix that has one column for each hole in the puzzle and one column for each puzzle piece. So for the case of the Tetris Cube the matrix will have 64 hole columns + 12 piece columns = 76 columns in all. We can label the columns for the holes 1 through 64, and the columns for the pieces A through L. The matrix has one row for each unique image. (Only images that fit wholly inside the puzzle box are included.) If you look at one row that represents, say, a particular image of piece B, it will have a 1 in column B and a 1 in each of the columns corresponding to the holes it fills. All other columns for that row will have a 0. (Those are the only numbers that ever appear in the matrix: ones and zeros.) Now, if you select a subset of rows that correspond to piece placements that actually solve the puzzle, you'll notice something interesting: the twelve selected rows together will have a single 1 in every column. And so the problem of solving the puzzle is abstracted to the problem of finding a set of rows that cover (with a 1) all the columns in the matrix. This is the exact cover problem: finding a set of rows that together have exactly one 1 in every column. With Knuth's abstraction there is no distinction between filling a particular hole, or placing a particular piece; and that is truly beautiful.

In each iteration of the algorithm, DLX first picks a column to process. This decision is rather important and I discuss it at length below. Once a column is selected, DLX will in turn place each image having a 1 in that column. For each such placement, DLX reduces the matrix removing every column covered by the image just placed, and removing every row for every image that conflicts with the image just placed. In other words, after this matrix reduction, the only rows that remain are for those images that still fit in the puzzle, and the only columns that remain are for those holes that are still open and for those pieces that have yet to be placed. Knuth uses some nifty linked list constructions to perform this manipulation, which you can read about in his paper if interested.

DLX maintains the total number of ones in each column as state information. If the number of ones remaining in any column hits zero, then the puzzle is unsolvable with the current piece configuration and so the algorithm backtracks. The situation in Figure 3 that gave the de Bruijn algorithm so much trouble gives DLX no trouble at all: it immediately recognizes that the matrix column corresponding to the hole that can't be filled has no rows with a one and backtracks, removing the piece that isolated the hole immediately after it was placed.

Some benefits of DLX are:

the elimination of all processing associated with "checking" to see if a particular image fits (which is where the de Bruijn algorithm spends most of its time);
tracking of the number of fit options for each hole and piece (which is used to trigger backtracks and can also be used to make good decisions about which hole to fill or piece to place next as discussed below);
radically reducing the processing complexity of sub-branches of the new puzzle configuration; and
a data model that is well suited to other types of puzzle processing. (See image filters below.)

Ordering Heuristics

As noted above, the first step in each iteration of the algorithm is to pick a column to process. If the column selected is a hole column, then the algorithm will one-by-one place all images that fill that hole. If the column selected is a piece column, then the algorithm will one-by-one place all the images of that piece that fit anywhere in the puzzle. There are any number of ways to determine this column selection order, which Knuth refers to as ordering heuristics. Knuth found that the minimum fit heuristic (simply picking the column that has the fewest number of ones) does well. Using this selection criteria, DLX will always pick the more constrained of either the hole that's hardest to fill or the piece that's hardest to place. By reducing the number of choices that have to be explored at each algorithmic step, the total number of steps to execute the entire algorithm is greatly reduced. In the case of the Tetris Cube with one piece rotationally constrained (to eliminate rotationally redundant solutions), the de Bruijn algorithm places pieces in the box almost 8 billion times, whereas DLX running with the min-fit heuristic places pieces only 68 million times: a reduction in the number of algorithmic steps by two orders of magnitude. (Remember though that each DLX step requires much more processing and DLX was actually only twice as fast as de Bruijn for this problem.)

Knuth stated in the conclusions of his paper, "On large cases [DLX] appears to run even faster than those special-purpose algorithms, because of its [min-fit] ordering heuristic." But I don't think things are quite this simple. I have found that for larger puzzles, the min-fit heuristic is often only beneficial for placing the last N pieces of the puzzle where the number N depends upon both the complexity of the pieces and upon the geometries of the puzzle. I also believe that using the min-fit heuristic for more than N pieces can actually negatively impact performance relative to other simpler ordering heuristics.

Figure 5. The 35 free hexominoes.

To see the problem, we need a larger puzzle: let's up the ante from pentominoes to hexominoes. There are 35 uniquely shaped free hexominoes shown in Figure 5. Each piece has area 6 so the total area of any shape constructed from the pieces is 210 — 3.5 times the area of a pentomino puzzle.

Figure 6. The 35 hexominoes placed into the shape of a 15x14 parallelogram.

Consider first a hexomino puzzle shaped like a parallelogram consisting of 14 rows of 15 squares each stacked in a sloping stair-step pattern. Figure 6 shows one solution to this puzzle. (As I explain in a later section, you can't actually pack hexominoes in a rectangular box, so the parallelogram is one of the simplest feasible hexomino constructions.) The first time I ran DLX on this puzzle I used a one-time application of a volume constraint filter which throws out a bunch of images up front that can't possibly be part of a solution (see below). It ran for quite some time without finding a single solution. A trace revealed that DLX had placed the first few pieces into the rather unfortunate construction shown in Figure 7. Note the small area at the lower left that has almost been enclosed. Every square in that area has many ways to cover it, so the min-fit heuristic didn't consider this pocket very constrained and ignored it. There is actually no way to fill all the squares: the only piece that could fill it has already been placed on the board. DLX didn't recognize the problem and so continued to try to solve the puzzle. I call such a well concealed spot in the puzzle that can't be filled a landmine.

Figure 7. DLX plants a landmine. There is no way to fill the almost enclosed area at the lower left, but DLX can't see the problem and continues to fill in the rest of the puzzle, stumbling over this landmine many times before dismantling it.

This behavior is exactly similar to the problem the de Bruijn algorithm exhibited in Figure 4: DLX can also create spaces in the puzzle that can't possibly be filled, not immediately see them, and stumble upon them many times before dismantling the problem. It is interesting that the de Bruijn algorithm is actually less susceptible to this particular pitfall. Although de Bruijn's algorithm can also create landmines, it can't wander all over the puzzle before discovering them. DLX running with the min-fit heuristic is able to wander far-and-wide filling in holes it thinks more constrained than the landmine; finally step on it; get stuck; back up a little and wander off in some other direction. And because the landmine created in Figure 7 was created so early in the piece placement sequence, there were many ways for DLX to go astray: it took almost two million steps for DLX to dismantle this landmine. (I decided not to make a movie of this one.)

As a second example, consider the box-in-a-diamond hexomino puzzle of Figure 8. Due to the center rectangle, DLX frequently partitions the open space into two isolated regions as shown. Each time the open space is divided, there's only 1 chance in 6 that the two areas will have a volume that can possibly be filled. Out of 1000 runs of the solver each using a different random ordering of the nodes in each DLX column, 842 runs resulted in a partitioning of the open space into two (or more) large unfillable regions before the eleventh-to-last piece was placed. When such a partition is created, DLX examines every possible permutation of pieces that fails to fill (at least one of) the isolated regions.

Figure 8. DLX partitioning the box-in-a-diamond hexomino puzzle. The area of the smaller open region is 39 which is unfillable by pieces of size 6. DLX doesn't recognize the problem and gets stuck on this configuration for many algorithmic steps before removing enough pieces to join the two isolated regions back together.

So here again, the min-fit ordering heuristic has lead to the creation of a topology that can't possibly be filled. And again, DLX can't see the problem, and wastes time exhaustively exploring fruitless branches of the search tree. De Bruijn's algorithm can also be made to foolishly partition the open space of puzzles: if you ask the de Bruijn algorithm to, say, start at the bottom of a U-shaped puzzle and fill upwards, it will inevitably partition the open space into the left and right columns of the U. But aside from such cruel constructions, de Bruijn's algorithm is relatively immune to this pitfall.

When I first saw these troubles, my faith in the min-fit heuristic was unshaken: the extreme reduction in the number of algorithmic steps seen for the Tetris Cube and other small puzzles had me convinced it was the way to go. So I built landmine sweepers and volume checkers to provide protection against these pitfalls. These worked pretty well, but as I thought about what was happening more, I began to doubt the approach. As you pursue the most constrained hole or piece, you end up wandering around the puzzle space haphazardly leaving a complex topology in your wake. This strikes me as the wrong way to go: it's certainly not the way I'd work a puzzle manually. When you finally get down to the last few pieces you are likely to have many nooks and crannies that simply can't all be filled with the pieces that are left. And I think that is ultimately the most important thing with these kinds of puzzles: you want to fill the puzzle in such a way that when you near completion you have simple geometries that have a higher probability of producing solutions.

One could argue that wandering around the board spoiling the landscape is a good thing! Won't that make it more likely to find a hole or a piece with very few fit options; reducing the number of choices that have to be examined for the next algorithmic step and ultimately reducing the run time of the entire algorithm? I used to have such thoughts in my head, but I now think these ideas flawed. When solving the puzzle, your current construction either leads to solutions or it doesn't. You want to minimize how much time you spend exploring constructions that have no solutions. (The day after I wrote this down, Gerard sent me an email saying exactly the same thing...so I decided it was worth putting in bold.) The real advantage of picking the min-fit column is not that it has fewer cases to examine, but rather that there's a better chance that all the fit choices lead quickly to a dead end. In other words, by using the min-fit heuristic DLX tends to more quickly identify problems with the current construction that preclude solutions, and more quickly backtrack to a construction that does have solutions. The problem with this approach is that as it wanders about examining the most constrained elements, it can create more difficulties than it resolves.

For large puzzles, instead of looking for individual holes or pieces that are likely to have problems, I think it is better to look at the overall topology of the puzzle and fill in regions that appear most confined (a macro view instead of a micro view). By strategically placing pieces so that at the end you have a single simply shaped opening, you will find solutions with a higher probability. This is just another way of saying that there is a high probability that you are still in a construction that has solutions — which is your ultimate goal.

So if the puzzle is shaped like a rectangle, start at one narrow side and work your way across to the other narrow side. If the puzzle is shaped like a U, then fill it in the way you'd draw the U with a pencil: down one side, around the bottom and up the other side. If the puzzle is a five pointed star, fill in the points of the star first, and leave the open space in the middle of the star for last. (Hmmm, or maybe it would be better to finish heading out one point? I'm not sure.)

So if what I say is true, then why does the min-fit heuristic work so well for the Tetris Cube? I think the min-fit heuristic works well once the number of pieces remaining in a puzzle drops to some threshold which depends on the complexity of the pieces and the geometries of the puzzle. Because Tetris Cube pieces are rather complicated, and because the geometry of the puzzle is small relative to the size of these pieces, the min-fit heuristic works well for that puzzle from the start.

Knuth explored the possibility of using the min-fit heuristic for the first pieces placed, but then simply choosing holes in a predefined order for the last pieces placed. The thinking was that for the last few pieces, picking the most constrained hole or piece doesn't buy you much and you're better off just trying permutations as fast as you can and skipping the search for the most constrained element (and skipping the maintenance of the column counts that support this search). Knuth was not able get any significant gains with this technique. I propose the opposite approach: initially, deterministically fill in regions of the puzzle that are most confined, then when you've worked your way down to a smaller area (and placement options are more limited) start using the min-fit heuristic.

To explore this idea, my solver supports a few alternative ordering heuristics. You can turn off one heuristic and enable another when the number of remaining pieces hits some configured threshold. One available heuristic (named heuristic x) has these behaviors:

If there exists a DLX column with no ones in it, then this column is selected;
else, if there exists a DLX column with exactly one 1 in it, then this column is selected;
otherwise, the hole column with the minimum x coordinate value is selected. In the case of a tie, the column with fewest ones (i.e., the hole with fewest fits) is selected.

So I ran the solver against the 15x14 parallelogram initially applying the x heuristic, but switching to the min-fit heuristic when the number of remaining pieces hit some configured number. Unfortunately, an exhaustive examination of the search tree for this puzzle is not feasible. Instead, I used Monte-Carlo techniques to estimate the best time to stop using the x heuristic and start using the min-fit heuristic. Each data point shown in Figure 9 shows the average number of solutions found per second over 10,000 runs of the solver each initialized with a different random ordering of the nodes in each column of the DLX matrix. Each run was terminated the first time the 16th-to-last piece was removed from the puzzle. (In other words, once the 16th-to-last piece is placed, the solver examined all possible ways to place the last 15 pieces, but then terminates that run.) Solutions to this puzzle tend to appear in great bursts, and even at 10,000 runs I think there is quite a bit of uncertainty in these solution rates. It should also be noted that the DLX processing load for the early piece placements is enormous compared to the latter pieces. Terminating algorithm processing each time the 16th-to-last piece is removed means the great efforts expended to reduce the matrix were largely wasted. This results in reduced average performance.

Despite these weaknesses the analysis offers evidence that when there are more than approximately 20 pieces left to be placed in this puzzle, there is no real benefit to using the min-fit heuristic. In fact, using the min-fit heuristic beyond 20 pieces seems to show some slight degradation in performance; although the last data point (where the min-fit heuristic is used for the last 34 pieces placed) seems to again offer a performance increase. This could be a statistical fluke, but I rather suspect there is some significant benefit to filling the two opposite acute corners of the puzzle early. The x-heuristic simply ignores the tight corner at the opposite side of the puzzle. It is my suspicion that as puzzle sizes increase application of the min-fit heuristic across the entire puzzle will result in ever worsening performance relative to heuristics that (at least initially) pack confined regions of the puzzle first; but larger puzzles are exceedingly difficult to analyze even with Monte-Carlo techniques.

Figure 9. DLX solution rates for the 15x14 hexomino parallelogram. The chart shows DLX solution rates for the 15x14 hexomino parallelogram when pieces are initially packed from left to right (using the x ordering heuristic), but when the number of remaining pieces is at or below a configured threshold (shown on the x-axis of the chart), the min-fit heuristic is used. Each data point on the chart represents the average solution rates over 10,000 runs of the solver where for each run the nodes in each column of the DLX matrix were randomly ordered, and each run was terminated the first time the 16th-to-last piece is removed from the puzzle.

Depending on the geometry of the puzzle, it can be even more important to follow your intuition. Gerard's Polyomino Solution Page includes a puzzle similar to the one shown in Figure 10. This puzzle, however, is 2 squares longer and somewhat more difficult to solve. I was unable to find any solutions to this puzzle through exclusive use of the min-fit heuristic even after hours of execution time; but by initially picking the holes farthest from the geometric center of the puzzle (my "R" heuristic) and then switching to min-fit heuristic once the less-confined central cavity was reached I averaged about 9 solutions per minute over 6 hours of run time.

Figure 10. A hexomino puzzle shaped like a rifle. Exclusive use of the min-fit heuristic on this puzzle performs very badly. Packing the long skinny channels first and switching to the min-fit heuristic once the central cavity is reached yields solutions at a much higher rate.

Most Constrained Hole (MCH) Algorithm

DLX is quite crafty, but all the linked list operations can be overly burdensome for small to medium sized puzzles. In a head-to-head match up, my implementation of de Bruijn's algorithm runs more than 6 times faster than DLX for the 10x6 pentomino puzzle (and remember — that's without the de Bruijn algorithm using the trick of placing the X piece first.) For the more complex three dimensional Tetris Cube puzzle, DLX fairs much better, but still takes more than twice as long to run as de Bruijn's algorithm.

The Most Constrained Hole (MCH) algorithm attempts to reap at least some of the benefits of DLX without incurring the high cost of its matrix manipulations.

I present this algorithm third, because the story flows better that way, but it is actually the first polycube solving algorithm I implemented and is of my own design. I make no claims of originality: it is a simple idea and others have independently devised similar algorithms. I first implemented a variant of this technique to solve pentomino puzzles one afternoon in the summer of 1997 whilst at a TCL/TK training class in San Jose, CA.

MCH simply chooses the hole that has the fewest number of fit possibilities as the next fill target. Therefore DLX (when using the min fit ordering heuristic) and MCH only deviate in their decision making process in those situations where a piece turns out to be more constrained than any hole. To find the MCH, the software examines each remaining hole in the puzzle, and for each of those holes it counts the number of ways the remaining pieces can be placed to fill that hole. The MCH is then taken to be the hole with the fewest fits.

Although DLX will sometimes choose a piece to place next rather than a hole to fill, the biggest difference between MCH and DLX is not in the step-by-step behavior of the two algorithms, but rather in their implementation. My polycube solving software has as one of its fundamental data structures an object called a GridPoint. During Puzzle initialization I create an array of GridPoint objects with the same dimensions as the Puzzle. So for the Tetris Cube I create a 4x4x4 matrix of GridPoints. To support MCH, at each GridPoint I create an array of 12 lists — one list for each of the twelve puzzle pieces. The list for piece B at GridPoint (3, 1, 2) contains all the images of piece B that fill the hole at grid coordinates (3, 1, 2). To count the total number of fits at (3, 1, 2) I traverse the image lists for all the pieces that have not yet been placed in the puzzle and count the total number that fit. To find the most constrained hole, I perform this operation at every open hole in the puzzle and take the hole with the minimum fit count.

Recall that for the de Bruijn algorithm a piece of size 6 with 24 unique rotations only requires 24 fit attempts, but that only works because the algorithm restricts itself to filling the hole with minimum coordinate values in z, y, x priority order. For MCH the image lists must contain all the images of a piece that fill a hole. So a piece with 6 cubelets and 24 unique rotations would have nominally 144 entries in the list. (I actually throw out images that don't fit inside the puzzle box as discussed in the section on image filters below.) So these lists can be rather long, and many lists at many holes have to be checked to find the MCH.

The whole idea sounds loopy I know, but for the case of the 10x6 pentomino puzzle, MCH runs 25% faster than DLX (which still makes it almost 5 times slower than de Bruijn). For the case of the Tetris Cube, MCH is the fastest of the three algorithms running about 2.5 times faster than DLX and about 10% faster than de Bruijn.

The solver also includes a variant of MCH that only considers those holes with the fewest number of neighboring holes. I call this estimated MCH (EMCH). This approach sometimes gets the true MCH wrong, but overall seems to perform better — about 25% faster for 10x6 Pentominoes and more than a third faster for the Tetris Cube.

I think for larger puzzles, when the number of images at each grid point starts to increase by orders of magnitude, this approach of explicit fit counting will break down. There are other ways you can estimate the MCH: I had one MCH variant that didn't count fits at all, but rather looked purely at the geometry of the open spaces near a hole to gauge how difficult it was going to be to fill. In any case, I only apply MCH on smaller 3-D puzzles because this is where I've found it to outpace the other two algorithms.

Combining the Algorithms

Each of the three algorithms examined had different strengths. When there are very few pieces, the simple de Bruijn algorithm had best performance. For medium sized 3-D puzzles, EMCH performed best. Only DLX can choose to iterate over all placements of a piece which can provide huge performance benefits in the right situation. (See, for example, the section on rotational redundancy constraint violations.) Also the ability to define different ordering heuristics makes DLX quite useful for large puzzles with non-trivial topologies.

To allow the best algorithm to be applied at the right time to the right problem, I've implemented all three algorithms into a single puzzle solving application with the capability to turn off one algorithm and turn on another when the number of remaining pieces reaches configured thresholds. As you shall see, this combined algorithmic approach gives much improved performance for many puzzles.

Software Optimizations

I still have the broad topic of constraints to discuss, but I first want to share some software optimizations I've made on the de Bruijn and MCH algorithms. Together these software optimizations reduced the time to find all Tetris Cube solutions with these algorithms by about a factor of five. (This probably just means my initial implementation was really bad, but I think the optimizations are still worth discussing.)

Bit Fields

I was originally tracking the occupancy state of the puzzle via a flag named occupied in each GridPoint object. To determine if an image fit in the puzzle, this flag was examined for each GridPoint used by the image. Most of the popular polyomino (e.g., Pentominoes) and polycube puzzles (e.g., Soma, Bedlam, Tetris) have an area or volume of not more than 64. This is rather convenient as it allows one to model the occupancy state of any of these puzzles with a 64-bit field. So I ditched all the occupied flags in the GridPoint array and replaced them all with a single 64 bit integer variable (named occupancyState) bound to the Puzzle as a whole. Each image object was similarly modified to store a 64-bit layoutMask identifying the GridPoints it occupies. To see if a particular image fits in the puzzle you now need only perform a binary-and of the puzzle's occupancyState with the image's layoutMask and check for a zero result. To place the piece, you need only perform the binary-or of the puzzle's occupancyState with the image's layoutMask and store that result as the new occupancyState. This is really greasy fast and cut the run times by more than a factor of two.

The only down-side to this approach is that it prevents you from solving puzzles that are larger than the size of the bit field. You could increase the size of the field, but this quickly starts to wipe out the benefit. But you can still take advantage of the performance benefit of bit masks for puzzles that are bigger than size 64 by simply using DLX until the total volume of remaining pieces is 64 or less. Then you can morph the data model into a bit-oriented form and use the MCH or de Bruijn algorithms to place the last several pieces (which is the only time speed really matters). For very large puzzles (e.g., a heptominoes puzzle) I think this approach will break down: by the time you get down to an area of size 64 the search tree is collapsing and it's probably too late for a data model morph to pay off.

Early MCH Fit Count Exit

The MCH routine examines different holes remaining in the puzzle and finds the number of possible fits for each of them. I modified the procedure that counts fits to simply return early once the number of fits surpasses the number of fits of the most constrained hole found so far. This trivial change sped the software up 20% for the Tetris Cube.

Fast Permutation Algorithm

In my original implementation of both MCH and de Bruijn's algorithm, I was lazily using an STL set (sorted binary tree) to store the index numbers of the remaining pieces. (Some of you are rudely laughing at me. Stop that.) Only the pieces in this set should be used to fill the next target hole. The index of the piece being processed is removed from the set. If the piece fits, the procedure recurses on itself starting with the now smaller set. Once all attempts to place a piece are exhausted, the piece is added back to the set, and the next entry in the set is removed and processed. This worked fine, but STL sets are not the fastest thing in the galaxy. As you might imagine there's been lots of research on fast permutation algorithms (dating back to the 17th century). I settled on an approach that was quite similar to what I was already doing, but the store for the list of free index numbers is a simple integer array instead of a binary tree. An index is "removed" from the array by swapping its position with the entry at the end of the array. So my STL set erase and insert operations were replaced with a pair of integer swaps. This change improved the fastest run times by about another 20%.

Constraints

The algorithms above observe the constraint that when a hole can't be fitted, or (in the case of DLX) a piece can't be fit they back up. But other constraints (beyond this obvious fit constraint) exist for polycube and polyomino puzzles which if violated prohibit solutions. My solver can take advantage of these constraints in two different ways. First I've implemented monitors that watch a particular constraint and when a violation is detected an immediate backtrack is triggered. Second, I've implemented a set of image filters that remove images that would violate constraints if used.

Backtrack Triggers

Let's first look at the technique of monitoring constraints during algorithm execution and triggering a backtrack when the constraint is violated.

Parity Constraint Violations

I first read about the notion of parity at Thorleif's SOMA page where in one of his Soma newsletters he references Jotun's proof. It's a simple idea: color each cube in the solution space either black or white in a three dimensional checkerboard pattern and then define the parity of the solution space to be the number of black cubes minus the number of white cubes. When you place a piece in the solution space, the number of black cubes it fills less the number of white cubes it fills is the parity of that image. Suppose that the parity for some image of a piece is 2. If you move that piece one position in any ordinal direction, all of its constituent cubes flip color and the parity of the piece will become -2. But those would be the only possible parities you could achieve with that piece: either 2 or -2. So the magnitude of the parity of a piece is defined by its shape, but depending where you place the piece, it could be either positive or negative.

As you place pieces, the total parity of all placed pieces takes a random walk away from an initial parity of zero, but to achieve a solution the random walk must end exactly at the parity of the solution space. It is possible for the random walk to get so far from the destination parity that it is no longer possible to walk back before you run out of pieces. More generally, you can get yourself in situations where it's just not possible to use the remaining pieces to walk back to exactly the right parity.

It is possible to show that some puzzles can't possibly be solved because the provided pieces have parities that just can't add up to the parity of the solution. As an example, consider again the 35 hexominoes shown in Figure 5. The total area of these pieces is 35x6 = 210. It is quite tempting to try to fit these pieces in a rectangular box. You could try boxes measuring 14x15, 10x21, 7x30, 6x35, 5x42 or even 3x70. The parity of all of these boxes is 0, so our random parity walk must end at 0. Of the 35 hexominoes 24 have parity 0 and the other 11 have parity magnitude 2. Because there is no way to take 11 steps of size 2 along a number line and end up back at 0, there is no way to fit the 35 hexominoes into any rectangular box.

Knowing that certain puzzles can't be solved without ever having to try to solve them is quite useful, but how can we make more general use of the parity constraints to speed up the search for solutions in puzzles?

Knuth attempted to enhance the performance of DLX though the use of parity constraints for the case of a one-sided hexiamond puzzle. The puzzle has four pieces with parity magnitude two (and the rest have parity zero). The puzzle as a whole has parity zero, so exactly two of these four pieces must be placed so their parity is -2 and two must be placed so their parity is 2. Knuth took the approach of dividing this problem into 6 subproblems, one for each way to choose the two pieces that will have positive parity. His expectation was that since each of the four pieces were constrained to have half as many images, that each subproblem would run 16 times as fast. Then, the total work for all 6 subproblems should be be only 6/16 of the work to solve the original (unconstrained) problem. But the total work to solve all 6 subproblems was actually more than the original problem. (I offer an explanation as to why this experiment failed below.)

I use a different approach to take advantage of parity constraints: simply monitor the parity of the remaining holes in the puzzle and if it ever reaches a value that the remaining pieces cannot achieve, then immediately trigger a backtrack.

To implement this parity-based backtracking feature, after each piece placement you must determine if the remaining puzzle pieces can be placed to achieve the parity of the remaining holes in the puzzle. This may sound computationally expensive, but it's not. Consider the Tetris Cube puzzle as an example. Piece A has parity 0, pieces B, E and L have a parity magnitude of 2, and the remaining eight pieces have a parity magnitude of 1. We can immediately forget about piece A since it has parity 0. So we have three pieces with parity magnitude 2 and eight pieces with parity magnitude 1. If you look at the parity of the pieces that are left at any given time, there are only (3+1) x (8+1) = 36 cases. During puzzle initialization I create a parity state object for each of these situations. So, for example, there is a parity state object that represents the specific case of having three remaining pieces of parity magnitude 1 and two remaining pieces of parity magnitude 2. In each of these 36 cases, I precalculate the parities that you can possibly achieve with that specific combination of pieces. I store these results in a boolean array inside the state object. So if you know your parity state, the task of determining if the parity of the remaining holes in the puzzle is achievable reduces to an array lookup. It looks something like this:

        if ( ! parityState->parityIsAchievable[parityOfRemainingHoles] )
            // force a backtrack due to parity violation
        else
            // parity looks ok so forge on

In addition to this boolean array, each parity state object also keeps track of its neighboring states in two arrays indexed by parity. One array is called place which you use to lookup your new state when a piece is placed; and the other is called unplace which you use to lookup your new state when a piece is removed. The only other task is to update the running sum of the parity of the remaining holes in the puzzle. So the processing for a piece placement looks like this:

        parityState = parityState->place[parityOfPiecePlaced];
        parityOfRemainigHoles -= parityOfPiecePlaced;

and piece removal processing looks like this:

        parityOfRemainigHoles += parityOfPieceRemoved;
        parityState = parityState->unplace[parityOfPieceRemoved];

Here, I'm using a double sided arrays so place[-2] and place[2] actually take you to the same state, saving the trouble of calculating the absolute value of parityOfPiecePlaced.

So the cost of parity checking is quite small, but typically parity violations do not start to appear until the last few pieces are being placed. In the 10x6 pentomino puzzle, the first parity violations did not appear until the 9th piece was placed; and adding the parity backtrack trigger to the fastest solver configuration for that puzzle actually increased run times by about 8%. (So adding just the above 5 lines of code to the de Bruijn processing loop increased the work to solve the problem by 1 part in 12! Indeed, even the time required to process the if statements that are used to see if various algorithm features are turned on, or if trace output is enabled, etc, measurably impairs the performance of the de Bruijn algorithm for this puzzle.) For the Tetris Cube, parity violations started to appear after only 6 pieces were placed, and use of the parity monitor improved performance by about 3%.

This parity backtrack trigger technique leaves the algorithms blinded to the true constraints on pieces with non-zero parity; so parity constraints are only hit haphazardly as opposed to being actively sought out by the algorithms. There is likely some better way to take advantage of the parity constraints on a puzzle. Thorleif found that for the case of the the Soma cube puzzle, forcibly placing pieces with non-zero parity first improved performance markedly; but I am skeptical that such an approach would work well in general because typically it's so much better to fill holes, rather than place pieces. One approach might be to simply assign some fractional weight to the counts maintained for piece columns that have non-zero parity. This would gently coax DLX into considering placing them earlier. I have not pursued such an investigation.

Figure 11. Checker board pattern for the box-in-a-diamond hexomino puzzle. Counting the number of black squares and subtracting the number of white squares shows the parity of this puzzle to be 22, which is the maximum parity a 35 piece hexomino puzzle can possibly achieve.

Still, with the right puzzle, monitoring parity constraints can be more useful. I've reproduced the box-in-a-diamond puzzle layout in Figure 11 to call attention to the parity of this puzzle. I designed this puzzle to have the the interesting property that its parity is exactly 22, which is the maximum parity the 35 hexomino puzzle pieces can possibly achieve. Any solution to this puzzle requires all eleven hexomino pieces with non-zero parity to favor black squares. Figure 12 shows one such solution with the 11 pieces having non zero parity highlighted. Monte Carlo estimation techniques showed that enabling the parity backtrack trigger on this puzzle produces about a two-thirds increase in performance. Although a substantial performance boost, this is less than I would have expected.

Figure 12. A solution to the box-in-a-diamond hexomino puzzle. The 11 non-zero parity pieces are marked with ■ symbols.

Volume Constraint Violations

The volume backtrack trigger, when enabled, performs the following processing after each piece placed:

scans the open space of the puzzle, measuring the volume of each partition it finds, and
triggers a backtrack if a volume is discovered that cannot be achieved by the remaining pieces.

To find and measure the volumes of subspaces in step 1, I use a simple fill algorithm. Step 2 of the problem — determining whether a particular volume is achievable — is easy if all pieces have the same size; but to handle the problem generally (when pieces are not all the same size) I use the same technique used by the parity monitor above: I precalculate the achievable volumes for each possible grouping of remaining pieces and track the group as a state variable.

The solver allows you to configure the minimum number of remaining pieces required to perform volume backtrack processing. As long as you turn off the volume backtrack feature sufficiently early, its cost is insignificant.

I originally implemented this backtrack trigger to keep DLX from partitioning polyomino puzzles into unfillable volumes while it wandered about the puzzle pursuing constrained holes or pieces; but I now believe it's often better to initially follow a simple packing strategy that precludes the creation of isolated volumes. I think this backtrack trigger may still be useful for some puzzles once the min-fit heuristic is enabled, but I have not had the time to study its effects.

Image Filters

In the previous sections we examined the technique of monitoring a constraint during algorithm execution and triggering a backtrack when the constraint is violated. Another (more aggressive) way you can take advantage of a constraint is to check all images one-by-one to see if using them would result in a constraint violation. For each image that causes a violation, simply remove it from the cache of available images. Some of the image filters discussed below are applied only once before the algorithms begin. Other image filers can be applied repeatedly (after each piece placement). In my solver, image filters can only be applied when DLX is active because the linked list structure of DLX makes it easy to remove and replace images at each algorithmic step.

Puzzle Bounds Constraint Violations

N.G. de Bruijn's original software used to solve the 10x6 pentomino problem predefined the layouts of the 12 puzzle pieces in each of their unique orientations. There are 63 unique orientations of the pieces in total, but the various possible translations of these orientations were not precalculated. This was a simple approach, and perhaps more importantly (in those days) made for a quite small program. This results, however, in the de Bruijn algorithm spending a lot of time checking fits for images that clearly fall outside the 10x6 box. These days, memory is cheap, so it is easy to improve on this basic approach. I've already explained the technique in the section on MCH above: for MCH I keep at each GridPoint a separate list for each piece which holds only those images of a piece that both fill the hole and actually fit in the puzzle box. I've done exactly the same thing for the de Bruijn algorithm: I created another array of image lists at each GridPoint, each list holding only the images of a particular piece that fill the hole with its root tiling node and also fit in the puzzle box. This completely eliminates all processing associated with placement attempts for images that don't even fit in the solution box.

By filtering out images that don't fit in the box, the average number of de Bruijn images at each hole in the 10x6 pentomino puzzle drops from 63 to 33.87 -- an almost 50% reduction. This should translate to a significant performance boost, though I can't say for sure since this is the only way I've ever implemented the algorithm.

Rotational Redundancy Constraint Violations

If you picked up a Tetris Cube when it was already solved; turned it on its side; and then excitedly told your brother you found a new solution, you'd likely get thumped. Because the puzzle box is cubic, there are actually 23 ways to rotate any solution to produce a new solution that is of the same shape as the original. My software can filter out solutions that are just rotated copies of previously discovered solutions (just enable the --unique command line option), but the search algorithms as described so far do actually find all 24 rotations of every solution (only to have 23 of them filtered out).

If by imperial decree, we only allow rotationally unique solutions, then it is possible to produce an image filter to take advantage of this constraint. If we simply fix the orientation of a piece that has 24 unique orientations, then the algorithms will only find rotationally unique solutions. Why does this work? If you fix the orientation of a piece, any solution you find is going to have that constrained piece in its fixed orientation; and the other 23 rotations of that same solution cannot possibly be found because those solutions have the constrained piece in some orientation that you never allowed to enter the box. Application of just this one filter reduced the time it takes DLX to find all solutions to the Tetris Cube from over seven hours down to about 20 minutes. Quite useful indeed.

It is possible to apply this same technique to puzzles that are not cubic; but instead of keeping the orientation of the piece completely fixed, you limit the set of rotations allowed.

But what if all of the pieces have fewer unique rotations than the puzzle has symmetric rotations? In this case you can also try constraining the translations of the piece within the solution box. This is slightly harder to do (it was for me anyway), and is not always guaranteed to eliminate all rotationally redundant solutions from the search. As an example try eliminating the rotationally redundant solutions from a 3x3x3 puzzle by constraining a puzzle piece that is a 1x1x1 cube. It can't be done. The best you can do is to constrain the piece to appear at one of four places: the center, the center of a face, the center of an edge and at a corner. This will eliminate some rotationally redundant solutions from the search, but not all.

A much harder problem is to try to eliminate rotationally redundant solutions from the search when none of the pieces in the puzzle have a unique shape. In this case, you can't simply constrain a single piece, but must instead somehow constrain in concert an entire set of pieces that share the same shape. I have some rough ideas on how one might algorithmically approach this problem, but I have not yet tried to work the problem out in detail.

For now, you can ask my solver to constrain any uniquely shaped piece so as to eliminate as many rotationally redundant solutions as possible. But even better, you can ask the solver to pick the piece to constrain. In this case it compares the results of constraining each uniquely shaped puzzle piece and picks the piece that will do the best job of eliminating rotationally redundant solutions. If two or more pieces tie, then it will pick the piece that after constraint has the fewest images that fit in the puzzle box. If for example you ask my solver to pick a piece for constraint on the 10x6 pentomino puzzle, it will pick X (the piece that looks like a plus sign), and constrain it so that it can only appear in the lower-left quadrant of the box. This is exactly the approach de Bruijn took when he solved the 10x6 pentomino puzzle 40 years ago, but de Bruijn identified this as the best constraint piece through manual analysis of the puzzle and programmed it as a special initial condition in his software. With my solver, you need only add the option -r to the command line.

Often times a piece that has been constrained will have so few remaining images that it becomes the best candidate for the first algorithm step. But of the algorithms I've implemented, only DLX will consider the option of iterating over the images of a single piece. So when running my solver with a piece constraint I usually use the DLX algorithm with a min-fit heuristic for at least the first piece placement. For the 10x6 pentomino problem, if you turn on constraint processing (which constrains the images of the X piece), but fail to use DLX for the first piece placement you'll find the run time to be eight times longer.

This feature was far and away the most difficult part of the solver for me to design and implement. (Perhaps some formal education in the field of spatial modeling would have been useful.) I have copious comments on the approach in the software. There are two parts to the problem: I first identify which rotations of the puzzle box as a whole produce a new puzzle box of exactly the same shape. This is normally a trivial problem, but the solver also handles situations where some of the puzzle pieces are loaded into fixed positions. If some of those pieces have a shape in common with pieces that are free to move about, then things get tricky. One-sided polyomino problems (which the solver also handles) also add complexity. Once I know the set of rotations that when applied to the puzzle can possibly result in a completely symmetric shape, I apply a second algorithm that filters the images produced for a (uniquely shaped) piece through a combination of rotational and/or translational constraints that eliminate these symmetries and has the net effect of preventing the algorithms from discovering rotationally redundant solutions. For a more exacting description of these techniques, please read my software comments for the methods Puzzle::initSymmetricRotationAndPermutationLists() and for Puzzle::genImageLists().

Parity Constraint Violations

You can also filter images based on parity constraints. So instead of waiting around for an image to be placed to trigger a parity backtrack; after each piece placement, you can look at the parity of each remaining image and determine if placing that image would introduce a parity violation; and if so, remove the image.

Of course I don't actually do the check for all images — only twice for each remaining piece with non-zero parity (once for positive parity and once for negative parity). If a violation would be introduced through the use of that piece when its parity is, say, negative, then I traverse the list of images for that piece and remove all the copies that have a negative parity. Also, the parity filter is skipped completely if the parity of the last piece placed was zero: nothing has changed in that case and it's pointless to look for new potential parity violations.

Applying the parity filter to the box-in-a-diamond puzzle causes the solver to filter out roughly half of the images of eleven pieces before DLX even starts. Replacing the parity backtrack trigger with the parity filter for this puzzle increased performance by more than 40%. In total, the solver running with the parity filter generates solutions 2.4 times as fast as it does without any parity constraint-based checks at all.

Volume Constraint Violations

You can also use volume constraints to filter images. This is very much akin to using volume constraints to trigger backtracks, but instead of waiting around for an image to be placed that partitions the open space; you can instead, one-by-one, place all remaining images in the puzzle and perform a volume check operation. This can be particularly useful as an initial step before you even set the algorithms to working. Of the 2056 images that fit in the 10x6 pentomino puzzle box, 128 of them are jammed up against a side or into a corner in such a way as to produce a little confined area that can't possibly be filled as seen in Figure 13. Searching for and eliminating these images up front improved my best run times for this puzzle by about 13%. This is the only technique I've found (other than the puzzle bounds filter discussed above) that actually improved performance for this classic puzzle.

Figure 13. Volume constraint violations in the 10x6 pentomino puzzle. 128 of the 2056 pentomino images (over 5%) produce volume constraint violations. The volume constraint filter can expunge these from the image set before the algorithmic search for solutions begins.

The previous polyomino puzzles were all based on free polyominoes: polyomino pieces that you are free to not only rotate in the plane of the puzzle, but are also free to flip up side down; but there is another class of puzzles based on one-sided polyominoes: polyomino pieces that you are allowed to rotate within the plane, but are not allowed to flip up-side-down. Where there are only twelve free pentominoes, there are eighteen uniquely shaped one-sided pentominoes. Consider the problem of placing the eighteen one-sided pentominoes into a 30x3 box as shown in Figure 14. Because pieces can actually reach from one long wall of this puzzle box to the other, 40% of the images (776 out of 1936) that fit in this box produce unfillable volumes. (See Figure 15.) Applying the volume constraint filter to the images of this puzzle improved performance by about a factor of nine.

Figure 14. One solution to the 30x3 one-sided pentomino puzzle.

Figure 15. A volume constraint violation in the 30x3 one-sided pentomino puzzle. No matter how you rotate the Z piece, it will always partition the puzzle box. Roughly 4 out of 5 of these images result in a volume constraint violation.

Consider next another puzzle I came across at Gerard's Polyomino Solution Page: placing the 108 free heptominoes into a 33x23 box with 3 symmetrically spaced holes left over as shown in Figure 16. One of the heptomino pieces has the shape of a doughnut with a corner bit out of it. This piece is shown in red in Figure 16. There's no way for another piece to fill this hole, so heptomino puzzles always leave at least one unfilled hole. To solve this puzzle, the doughnut piece clearly must be placed around one of these holes; but none of the algorithms are smart enough to take advantage of this constraint and will only place the doughnut around a hole by chance. And this could take a very long time indeed! Applying the volume constraint filter to this problem, removes not only the images that produce confined spaces around the perimeter of the puzzle, but also all images of the doughnut piece except those few that wrap one of the prescribed holes. The DLX min-fit heuristic will then correctly recognize the true inflexibility of this piece and place it first.

Figure 16. Solution to a 33x23 heptomino problem with 3 symmetrically spaced holes. The piece highlighted in red must be placed around one of these holes. All other images of this piece are nicely eliminated by the volume constraint filter.

For 3-D puzzles, I think it would be rare for pieces to construct a planar barrier isolating two volumes large enough to cause serious trouble; accordingly, I have not studied the effects of this filter on 3-D puzzles.

In all of these examples I've only applied the volume filter once to the initial image set (prior to algorithm execution), but you can also apply the filter repeatedly, after each step in the algorithm (turning it off when the number of remaining pieces reaches some prescribed threshold). This should have the effect of giving DLX a better view of the puzzle constraints; but I haven't studied this primarily because my current implementation of the filter is so inefficient: at each algorithmic step each remaining image is temporarily placed and a graphical fill operation is performed to detect isolated volumes. This is simple, but the vast majority of these image checks are pointless. The next time I work on this project, I'll be improving the implementation of this filter which I hope will offer performance benefits when reapplied after each piece placement.

Fit Constraint Violations

Another filter I've implemented is based on a next-step fit constraint. By this I mean, if placing an image would result in either a hole that can't be filled, or a piece that can't be placed, then it is pointless to include that image in the image set. Running this fit filter on the 2056 images of the 10x6 pentomino puzzle finds all of the 128 images found by the volume constraint filter plus an additional 16 images like those shown in Figure 17. There can obviously be no puzzle solution with these piece placements. If the rotational redundancy filter is also enabled (which constrains the X piece to 8 possible positions in the lower left quadrant of the box), then the fit filter will eliminate 227 images. (There are numerous images that conflict with all of the constrained placements of the X piece.)

Figure 17. Samples of two images expunged by the fit filter for the 10x6 pentomino puzzle. These images of the I and L pentomino pieces preclude the possibility of solutions. There are 16 such images and the fit constraint filter detects and removes them all.

Note that running the fit filter twice on the same image set can sometimes filter additional images: on the first run you may remove images that were required for previously tested images to pass the fit check. If you run the fit filter twice on the 10x6 pentomino problem while the X piece is constrained, the number of filtered images jumps from 227 to 235. To do a thorough job you'd have to filter repeatedly until there was no change in the image set.

Although this filter is interesting, its current implementation is too slow to be of much practical use. I use DLX itself to identify fit-constraint violations and it takes 45 seconds to perform just the first fit filtering operation for the box-in-a-diamond hexomino puzzle on my 2.5 GHz P4. I suspect I could write something that does the same job much quicker, but I'm skeptical I could make it fast enough to be effective. Still, if your aim is the lofty goal of finding all solutions to this puzzle, this filter could prove worth-while: 45 seconds is nothing for at least the first several piece placements of this sizable puzzle.

Image Filter Performance (or lack thereof)

Some of the image filters discussed above are only run once before the algorithms begin. I wanted to share some insight as to why such filters sometimes don't give the performance gains you might expect.

As a first example, consider the effects of filtering pentominoes images that fall outside the 10x6 puzzle box. This cuts the total number of images that have to be examined by the de Bruijn algorithm at each algorithm step by almost a factor 2. In an extraordinary flash of illogic, you might conclude that since there are 12 puzzle pieces, the performance advantage to the algorithm would be a factor of 2¹² = 4096. The problem with this logic is that the algorithm immediately skips these images as soon as they are considered for placement anyway.

For the same reason, filtering images that produce volume constraint violations before you begin running the algorithms do not give such exponential performance gains: such images typically construct tiny little confined spaces that the algorithm would have quickly identified as unfillable anyway.

But the filter that removes images of a single piece to eliminate rotational redundancies among discovered solutions seems different: the images removed are not images that will necessarily cause the algorithm to immediately backtrack and so you might reasonably expect the filter to not only reduce the number of solutions found for the Tetris Cube by a factor of 24 (which it does); but also to improve the overall performance of the algorithm by a factor of 24, but it only gave a factor of 21. (Close!)

Knuth expected that reducing the number of images of 4 pieces each by a factor of 2 (to take advantage of a parity constraint on a one-sided hexiamond puzzle) would lead to a reduction in the work needed to solve the puzzle by a factor of 16, but the gains again fell far short of this expectation.

And although I wasn't expecting a performance improvement factor of 2¹¹ from the parity filter for the box-in-a-diamond problem, I thought I'd get a lot more than a factor of 2.4 (which is all it gave me). This result was very surprising to me.

The problem in all of these cases is that you're trying to extract efficiencies from a search tree that is already significantly pruned by the algorithm. Here are some other observations that might be illuminating.

First, consider the case where you a priori force a piece whose images are to be filtered to be placed first; and then reduce the number of images of that piece by a factor of N. Then the number of ways to place that first piece is reduced by a factor of N. Assuming each of those placements originally had similar work loads, then the total work would indeed be reduced by a factor of N. But what if you always placed this piece last? Would performance still improve by a factor of N? Of course not! The vast majority of search paths terminate in failure before you even get to the last piece. Now assuming the piece is not being placed first, or last but is instead placed uniformly across the search tree, you'll find that a sizable percentage of search paths don't even use the filtered piece: they die off before that piece can be placed. Filtering the images of that piece won't reduce the weight from these branches of the search tree at all.

Second, the vast majority of the appearances of the piece will be high up in the branches of the search tree. At this part of the tree, the branching factor is small and obviously drops below 1 at some point. Because of this, when you prune the tree at one of the appearances of the constraint piece, you can't assume that the weight of the path left behind is negligible (even though that weight is shared by other paths).

These arguments are obviously imprecise and contain (at least) one serious flaw: the DLX algorithm (unlike the other two) can reshape the entire search tree after you constrain the piece to take advantage of its new constrained nature, but if during execution of the algorithm, the piece is still found to be less constrained than other elements of the puzzle, then the arguments above still apply. Even if DLX decides to place the newly constrained piece first (and it often does), the average branching factor will still not typically improve sufficiently to achieve a factor of N performance improvement.

Performance Comparisons

Table 2 shows the performance results for a few different puzzles with many different combinations of algorithms, backtrack triggers and image filters. Many of these results have already been discussed in earlier sections but are provided here in detail. The run producing the best unique solution production rate is highlighted in yellow for each puzzle. The table key explains everything.

Using the unique solution generation rate as a means of comparing algorithm quality is flawed as these rates are not completely consistent from run-to-run. The relative performance of the different algorithms can also change with the processor design because, for example, one algorithm may make better use of a larger instruction or data cache. I liked Knuth's technique of simply counting linked list updates to measure performance, but since I'm comparing different algorithms, such an approach seems difficult to apply.

Table 2. Test Cases
Test Case	Algorithms				Image Filters				Backtrack Triggers		Monte-Carlo			Attempts	Fits	Unique	Run Time (hh:mm:ss)	Rate
Test Case	DLX	MCH	EMCH	de Bruijn	R	P	V	F	P	V	N	R	S	Attempts	Fits	Unique	Run Time (hh:mm:ss)	Rate
P-1	12f	0	0	0	OFF	OFF	OFF	OFF	OFF	OFF	-	-	-	3,615,971	3,615,971	2339	00:00:38.761	60
P-2	0	12	0	0	OFF	OFF	OFF	OFF	OFF	OFF	-	-	-	138,791,760	9,077,194	2339	00:00:29.250	80
P-3	0	0	12	0	OFF	OFF	OFF	OFF	OFF	OFF	-	-	-	191,960,438	12,114,615	2339	00:00:22.485	104
P-4	0	0	0	12	OFF	OFF	OFF	OFF	OFF	OFF	-	-	-	178,983,597	25,848,915	2339	00:00:06.086	384
P-5	12f	0	0	0	ON	OFF	OFF	OFF	OFF	OFF	-	-	-	892,247	892,247	2339	00:00:09.449	246
P-6	0	12	0	0	ON	OFF	OFF	OFF	OFF	OFF	-	-	-	114,753,421	7,646,476	2339	00:00:24.504	95
P-7	0	0	12	0	ON	OFF	OFF	OFF	OFF	OFF	-	-	-	153,036,807	9,875,973	2339	00:00:19.992	117
P-8	0	0	0	12	ON	OFF	OFF	OFF	OFF	OFF	-	-	-	133,086,329	20,073,791	2339	00:00:04.700	498
P-9	12f	11	0	0	ON	OFF	OFF	OFF	OFF	OFF	-	-	-	12,411,752	924,167	2339	00:00:02.701	866
P-10	12f	0	11	0	ON	OFF	OFF	OFF	OFF	OFF	-	-	-	20,374,275	1,425,356	2339	00:00:02.569	911
P-11	12f	0	0	11	ON	OFF	OFF	OFF	OFF	OFF	-	-	-	17,703,679	2,455,947	2339	00:00:00.579	4049
P-12	12f	0	0	11	ON	OFF	OFF	OFF	ON	OFF	-	-	-	17,572,247	2,454,746	2339	00:00:00.620	3781
P-13	12f	0	0	11	ON	OFF	12	OFF	OFF	OFF	-	-	-	15,198,004	2,091,215	2339	00:00:00.510	4592

OP-1	18f	0	0	12	ON	OFF	OFF	OFF	OFF	OFF	-	-	-	38,479,316	7,060,175	46	00:00:03.611	12.7
OP-2	18f	0	0	12	ON	OFF	18	OFF	OFF	OFF	-	-	-	1,930,304	668,117	46	00:00:00.411	112.1

TC-1	12f	0	0	0	OFF	OFF	OFF	OFF	OFF	OFF	-	-	-	1,502,932,134	1,502,932,134	9839	07:21:31	0.37
TC-2	0	12	0	0	OFF	OFF	OFF	OFF	OFF	OFF	-	-	-	58,306,235,943	1,604,152,199	9839	02:54:07	0.94
TC-3	0	0	12	0	OFF	OFF	OFF	OFF	OFF	OFF	-	-	-	109,746,141,977	2,835,090,958	9839	02:19:27	1.18
TC-4	0	0	0	12	OFF	OFF	OFF	OFF	OFF	OFF	-	-	-	737,892,116,733	38,637,085,619	9839	03:29:26	0.78
TC-5	12f	0	0	0	ON	OFF	OFF	OFF	OFF	OFF	-	-	-	68,141,081	68,141,081	9839	00:20:37	7.95
TC-6	0	12	0	0	ON	OFF	OFF	OFF	OFF	OFF	-	-	-	9,727,894,584	297,896,605	9839	00:33:17	4.93
TC-7	0	0	12	0	ON	OFF	OFF	OFF	OFF	OFF	-	-	-	19,436,156,238	551,894,232	9839	00:28:45	5.70
TC-8	0	0	0	12	ON	OFF	OFF	OFF	OFF	OFF	-	-	-	140,658,669,459	7,992,209,655	9839	00:43:08	3.80
TC-9	12f	11	0	0	ON	OFF	OFF	OFF	OFF	OFF	-	-	-	2,153,543,323	72,670,225	9839	00:07:20	22.35
TC-10	12f	0	11	0	ON	OFF	OFF	OFF	OFF	OFF	-	-	-	4,196,219,275	129,746,342	9839	00:06:02	27.16
TC-11	12f	0	0	11	ON	OFF	OFF	OFF	OFF	OFF	-	-	-	32,810,876,767	1,898,921,763	9839	00:09:48	16.74
TC-12	12f	11	6	4	ON	OFF	OFF	OFF	OFF	OFF	-	-	-	10,380,361,756	453,289,747	9839	00:04:04	40.25
TC-13	12f	11	6	4	ON	OFF	OFF	OFF	ON	OFF	-	-	-	9,421,945,256	439,737,621	9839	00:03:57	41.53

HP-1	35x 0f	0	0	0	ON	OFF	OFF	OFF	OFF	OFF	10,000	16	5	1,131,766,165	1,131,766,165	17,435	05:28:15	0.885
HP-2	35x 2f	0	0	0	ON	OFF	OFF	OFF	OFF	OFF	10,000	16	5	1,131,686,519	1,131,686,519	17,435	05:29:15	0.883
HP-3	35x 4f	0	0	0	ON	OFF	OFF	OFF	OFF	OFF	10,000	16	5	1,097,629,194	1,097,629,194	17,435	05:28:58	0.883
HP-4	35x 6f	0	0	0	ON	OFF	OFF	OFF	OFF	OFF	10,000	16	5	849,818,771	849,818,771	17,435	04:48:02	1.009
HP-5	35x 8f	0	0	0	ON	OFF	OFF	OFF	OFF	OFF	10,000	16	5	507,044,709	507,044,709	17,435	03:14:58	1.490
HP-6	35x 10f	0	0	0	ON	OFF	OFF	OFF	OFF	OFF	10,000	16	5	326,708,081	326,708,081	17,435	02:07:31	2.279
HP-7	35x 12f	0	0	0	ON	OFF	OFF	OFF	OFF	OFF	10,000	16	5	272,000,566	272,000,566	17,435	01:44:36	2.778
HP-8	35x 14f	0	0	0	ON	OFF	OFF	OFF	OFF	OFF	10,000	16	5	241,487,173	241,487,173	17,435	01:34:34	3.073
HP-9	35x 16f	0	0	0	ON	OFF	OFF	OFF	OFF	OFF	10,000	16	5	420,363,728	420,363,728	30,945	02:15:46	3.799
HP-10	35x 18f	0	0	0	ON	OFF	OFF	OFF	OFF	OFF	10,000	16	5	880,638,007	880,638,007	60,415	04:04:34	4.117
HP-11	35x 20f	0	0	0	ON	OFF	OFF	OFF	OFF	OFF	10,000	16	5	1,363,568,660	1,363,568,660	106,960	05:53:36	5.041
HP-12	35x 22f	0	0	0	ON	OFF	OFF	OFF	OFF	OFF	10,000	16	5	1,508,462,348	1,508,462,348	112,975	06:18:54	4.970
HP-13	35x 24f	0	0	0	ON	OFF	OFF	OFF	OFF	OFF	10,000	16	5	1,728,349,705	1,728,349,705	119,370	07:10:09	4.625
HP-14	35x 26f	0	0	0	ON	OFF	OFF	OFF	OFF	OFF	10,000	16	5	1,718,965,943	1,718,965,943	119,862	07:09:28	4.652
HP-15	35x 28f	0	0	0	ON	OFF	OFF	OFF	OFF	OFF	10,000	16	5	1,858,118,645	1,858,118,645	133,872	07:44:54	4.799
HP-16	35x 30f	0	0	0	ON	OFF	OFF	OFF	OFF	OFF	10,000	16	5	1,700,793,486	1,700,793,486	108,882	07:07:56	4.241
HP-17	35x 32f	0	0	0	ON	OFF	OFF	OFF	OFF	OFF	10,000	16	5	1,735,853,837	1,735,853,837	115,466	07:19:01	4.384
HP-18	35x 34f	0	0	0	ON	OFF	OFF	OFF	OFF	OFF	10,000	16	5	1,912,535,104	1,912,535,104	140,085	07:53:32	4.930

HBD-1	35f	0	0	0	OFF	OFF	OFF	OFF	OFF	OFF	10,000	22	0	9,381,994,687	9,381,994,687	16,735	49:54:40	0.093
HBD-2	35a@160 15f	0	0	0	OFF	OFF	OFF	OFF	OFF	OFF	10,000	22	0	978,785,202	978,785,202	3731	05:28:58	0.189
HBD-3	35a@160 15f	0	0	0	OFF	OFF	OFF	OFF	ON	OFF	10,000	22	0	968,436,589	968,436,589	5761	05:08:05	0.312
HBD-4	35a@160 15f	0	0	0	OFF	35	OFF	OFF	OFF	OFF	10,000	22	0	1,156,373,212	1,156,373,212	8738	05:27:04	0.445
HBD-5	35a@160 15f	0	0	0	OFF	35	35	OFF	OFF	OFF	10,000	22	0	1,629,292,460	629,292,460	5652	02:44:54	1.582

KEY
Test Case	P	Pentomino 10x6	Test Cases P, OP, and TC were run on a 2.5 GHz P4 running Unbuntu Linux. Test Cases HP and HBD were run on a 2.4 GHz Core 2 Quad CPU Q6600 running Windows XP (using only one of the four processors).
	OP	One-Sided Pentomino 30x3
	TC	Tetris Cube
	HP	Hexomino 15x14 Parallelogram
	HBD	Hexomino Box-in-a-Diamond
Algorithms	The number in each algorithm column is the number of remaining pieces when the algorithm is activated. For the case of DLX multiple activation numbers can be given, each with a different ordering heuristic. An entry 12f means the min-fit ordering heuristic is activated when 12 pieces remain. Other heuristics used are the x heuristic which picks the hole with minimum x coordinate value; and the a@160 heuristic which picks the hole that forms the minimum angle from the center of the puzzle with an initial angle of 160 degrees.
Image Filters	R	Rotational Redundancy Filter	A number in a column gives the minimum number of remaining pieces for the image filter to be applied.
	P	Parity Constraint Filter
	V	Volume Constraint Filter
	F	Fit Constraint Filter
Backtrack Triggers	P	Parity Constraint Backtrack Trigger	A number in a column gives the minimum number of remaining pieces for the backtrack trigger to be applied.
Backtrack Triggers	V	Volume Constraint Backtrack Trigger
Monte-Carlo	N	Number of trials.
	R	If after removing a piece from the puzzle there are exactly R pieces left to place, the Monte-Carlo trial is ended.
	S	Seed value to the Mersene Twister random number generator.
Attempts	The number of times pieces were attempted to be placed.
Fits	The number of times pieces were successfully placed.
Unique	The number of unique solutions found.
Run Time	The total run time for the test.
Rate	The number of unique solutions found per second (Unique / Run Time).

Test Case P: Pentominoes in a 10x6 Box

The first set of test cases (P) examines the 10x6 pentomino puzzle shown in Figure 3. Runs 1 through 4 show the performance of the four basic algorithms.

Comparing these first four runs with runs 5 through 8 shows the significant performance advantage of the rotational redundancy filter. This filter consistently offers significant performance gains when looking for all solutions to a symmetric puzzle. Also note that DLX performs relatively better with this filter enabled as it's the only algorithm capable of iterating over the possible placements of the piece constrained by the filter.

Runs 9 through 11 use DLX only for the first piece placement (to take full advantage of the rotational redundancy filter) but then switch to the other lighter-weight algorithms to place the last 11 pieces. Comparing run 8 with run 11 shows this combined algorithmic approach to be about eight times faster than any single algorithm.

Run 12 shows that although the parity filter does offer a very moderate reduction in attempts and fits, the net effect is a reduction in the production rate of unique solutions.

Run 13 uses a one-shot volume filter to expunge many useless images from the image set and results in about a 13% increase in performance.

Test Case OP: One-sided Pentominoes in a 30x3 Box

The second set of test cases (OP) examines the problem of placing the one-sided pentominoes in a 30x3 box as shown in Figure 14. The volume filter is shown to be particularly useful for this puzzle delivering a factor-of-nine performance improvement.

Test Case TC: Tetris Cube

The third set of test cases (TC) examines the Tetris Cube as shown in Figure 1. The first four runs show the performance of MCH and EMCH to be superior to DLX and de Bruijn for this small 3-D puzzle.

Runs 5 through 8 again show the huge performance benefits of the rotational redundancy filter; and again DLX performs relatively better than the other algorithms with the rotational redundancy filter active, even outperforming de Bruijn for this 3D puzzle.

In runs 9 through 11 I start to combine the algorithms only using DLX to place the first piece (to get it to iterate over the possible placements of the piece constrained by the rotational redundancy filter) but then switching to just one of the simpler algorithms. As can be seen from the table, the benefits of this combined approach are quite significant.

In run 12 all four algorithms are combined to solve the puzzle. If you number pieces as you place them in the box counting down from 12 (so the last piece placed is numbered 1); then DLX was used to place piece 12; MCH to place pieces 11 through 7; EMCH to place pieces 6 and 5; and de Bruijn was used to place pieces 4 through 1. As the number of remaining pieces gets smaller, it pays to use simpler algorithms. Compare the performance of run 12 with the performance of runs 5 through 8 (where just one algorithm was used) and you see that the combined algorithmic approach is more than 5 times faster than the fastest of those single algorithm runs.

Run 13 shows that the parity backtrack trigger offers a small benefit (about 3%) for this puzzle. It is interesting that run 13 is well over 100 times faster than the straight DLX approach used in run 1.

Test Case HP: Hexominoes in a 15x14 Parallelogram

The fourth set of test cases (HP) examines the problem of placing the 35 hexominoes in the 15x14 parallelogram shown in Figure 6. Here I did not try to find the overall best solver configuration, but instead only studied the effects of packing pieces simply from left to right (using the x ordering heuristic) for initial piece placements and then switching to the DLX min-fit heuristic for latter piece placements. I should not have had the rotational redundancy filter active for these tests — this only slows solution production rates when examining such a small portion of the search tree — but I didn't want to tie up my computer for another week to rerun the tests. The best performance was had when using the min-fit heuristic only for the last twenty piece placements. Using the min-fit heuristic for more than twenty pieces resulted in little performance change but seems to exhibit some small degradation.

It is likely that application of the volume constraint filter, the parity constraint backtrack trigger, and the de Bruijn algorithm (for latter piece placements) would offer additional performance gains for this puzzle.

Test Case HBD: Hexominoes Box-in-a-Diamond

The last set of test cases (HBD) examines the hexomino box-in-a-diamond puzzle shown in Figure 12. The first run is a straight no-frills DLX run using the min-fit heuristic. For the second run I instead used my angular ordering heuristic which packs pieces into the puzzle in a counter clock-wise direction. I started placing pieces at 160 degrees (about ten o-clock) so that the less confined region at the top of the puzzle would be left for last. Once there were only 15 pieces left I switched to the min-fit heuristic. The number 15 was just a guess and probably too low for best performance; but this approach was still twice as fast as using a pure min-fit heuristic.

Run 3 shows that enabling the parity constraint backtrack trigger improved performance by about 65% in this very high-parity puzzle. Run 4 switches to the parity constraint filter which improves performance by another 42%.

Most interestingly, Run 5 shows a one-shot application of the volume constraint filter increased performance by a factor of 3.5.

Software

This software is protected by the GNU General Public License (GPL) Version 3. See the README.txt file included in the zip file download for more information.

EDIT (February 11, 2019): the software described and linked below is several years old, but is retained here as it is consistent with this document. I encourage you, however, to instead download and use my improved version of polycube linked from my more recent article on FILA.

Download

WINDOWS: polycube_1.2.1.zip

Contents:

README.txt (copyright, build and run instructions)
RELEASE_NOTES.txt (summary of changes for each release)
polycube.exe (polycube solver executable for Windows)
tetriscube_def.txt (definition file for the Tetris Cube puzzle)
bedlamcube_def.txt (definition file for the Bedlam Cube puzzle)
somacube_def.txt (definition file for the Soma Cube puzzle)
Several definitions for the Pentominoes puzzle for solutions of different shapes.
Puzzle solver C++ source

LINUX / UNIX: polycube_1.2.1.tgz

Contents: same as for Windows, but no executable and all text files are carriage return stripped.

The source is about 10,000 lines of C++ code, with dependencies on two other libraries (boost and the Mersene Twister random number generator) which are also included in the download. The executable file polycube.exe is a Windows console application (sorry, no GUIs folks). For non-Windows platforms you'll need to compile the source.

Running the Software

The README.txt file gives the full details about how to run the software (algorithm control, solution output control, trace output control, puzzle definition file formats, etc); but here is a brief introduction. Simply pass polycube one (or more) puzzle definition files on the command like this:

    polycube def/pentominoes_10x6_def.txt

This will immediately start spitting out solutions to the 10x6 pentominoes puzzle. Once you see that it's working you'll probably want to explore available command line options. To see them run:

    polycube --help

There are several puzzle definition files provided. These are simple text files that look like this:

# Tetris Cube Puzzle Definition
D:xDim=4:yDim=4:zDim=4
C:name=A:type=M:layout=0 0 2,  1 0 2,  2 0 2,  2 0 1,  2 0 0,  2 1 0  # Blue angle with end-notch
C:name=B:type=M:layout=0 0 0,  1 0 0,  2 0 0,  2 1 0,  2 2 0,  2 1 1  # Blue angle with mid-notch
C:name=C:type=M:layout=0 0 0,  1 0 0,  1 0 1,  2 0 1,  2 1 1          # Blue angled staircase
C:name=D:type=M:layout=0 0 0,  1 0 0,  2 0 0,  2 1 0,  3 1 0          # Blue 2-D serpent
C:name=E:type=M:layout=0 0 1,  0 0 0,  1 0 0,  1 1 0,  1 2 0,  2 1 0  # Red dog with tail
C:name=F:type=M:layout=0 0 0,  1 0 0,  0 0 1,  1 0 1,  0 1 0          # Red ziggurat
C:name=G:type=M:layout=0 2 0,  0 1 0,  0 0 0,  1 0 0,  2 0 0          # Red angle
C:name=H:type=M:layout=0 0 0,  1 0 0,  2 0 0,  3 0 0,  2 1 0          # Red line with mid-notch
C:name=I:type=M:layout=0 0 1,  1 0 1,  2 0 1,  0 1 1,  0 1 0          # Yellow pole with twisty top
C:name=J:type=M:layout=0 0 0,  1 0 0,  1 0 1,  1 1 1,  2 1 1          # Yellow cork-screw
C:name=K:type=M:layout=0 0 0,  1 0 0,  2 0 0,  1 1 0,  1 1 1          # Yellow radar dish
C:name=L:type=M:layout=0 1 0,  1 1 0,  1 1 1,  2 1 1,  1 0 1,  1 2 1  # Yellow sphinx
~D

The type=M means the piece is mobile and free to move about (the typical case), but you can also declare a piece to be type S (stationary) to forcibly load the piece at the given coordinates. It's often easier to define the puzzle pieces graphically. Here's a definition file for the box-in-a-diamond hexomino puzzle that uses graphical layouts for piece definitions.

# Box-in-a-Diamond Hexomino Puzzle Definition
D:xDim=23:yDim=23:zDim=1
L
. . . . . . . . . . . A . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . A . B B . C . . D . . . E . . F . . . . . . . . . . . .
. . . . . . . . . . . A . B . . C C . D . . E E . . F . . . . . . . . . . . .
. . . . . . . . . . . A . B . . C . . D D . E . . F F . . . . . . . . . . . .
. . . . . . . . . . . A . B . . C . . D . . E . . F . . . . . . . . . . . . .
. . . . . . . . . . . A . B . . C . . D . . E . . F . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. G G . H H . I I . J . . K . . . L . M M M . N . . . O . . . P . . . Q . . .
. G G . H . . I . . J J . K K . L L . M . . . N N N . O . . . P . . . Q . . .
. G . . H H . I . . J J . K K . L . . M . . . N . . . O O O . P P . . Q Q Q .
. G . . H . . I I . J . . . K . L L . M . . . N . . . . O . . . P P . . . Q .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
R . . . S . . . T . . . U U U . V V . . W W . . X X . . . Y . . . Z . . 1 . .
R R R . S S . . T T . . . U . . . V V . . W . . . X . . Y Y Y . Z Z . . 1 1 .
. R . . . S S . . T . . . U . . . V . . . W W . . X . . . Y . . . Z Z . . 1 1
. R . . . S . . . T T . . U . . . V . . . W . . . X X . . Y . . . Z . . . . 1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 2 2 . 3 . . . 4 . . . 5 . . . 6 . . . 7 7 . . 8 8 . . 9 9 . . . . .
. . . . . 2 2 . 3 3 3 . 4 4 . . 5 5 5 . 6 . 6 . 7 7 7 . 8 8 . . . 9 9 . . . .
. . . . . 2 2 . 3 3 . . 4 4 4 . 5 . 5 . 6 6 6 . . 7 . . . 8 8 . 9 9 . . . . .
~L
L:stationary=*
* * * * * * * * * * * . * * * * * * * * * * *
* * * * * * * * * * . . . * * * * * * * * * *
* * * * * * * * * . . . . . * * * * * * * * *
* * * * * * * * . . . . . . . * * * * * * * *
* * * * * * * . . . . . . . . . * * * * * * *
* * * * * * . . . . . . . . . . . * * * * * *
* * * * * . . . . . . . . . . . . . * * * * *
* * * * . . . . . . . . . . . . . . . * * * *
* * * . . . . . . . . . . . . . . . . . * * *
* * . . . . * * * * * * * * * * * . . . . * *
* . . . . . * * * * * * * * * * * . . . . . *
. . . . . . * * * * * * * * * * * . . . . . .
* . . . . . * * * * * * * * * * * . . . . . *
* * . . . . * * * * * * * * * * * . . . . * *
* * * . . . . . . . . . . . . . . . . . * * *
* * * * . . . . . . . . . . . . . . . * * * *
* * * * * . . . . . . . . . . . . . * * * * *
* * * * * * . . . . . . . . . . . * * * * * *
* * * * * * * . . . . . . . . . * * * * * * *
* * * * * * * * . . . . . . . * * * * * * * *
* * * * * * * * * . . . . . * * * * * * * * *
* * * * * * * * * * . . . * * * * * * * * * *
* * * * * * * * * * * . * * * * * * * * * * *
~L
~D

Note that a single stationary piece named * is used to shape the puzzle.

Planned Features

When I get time to work on this again, these are the features I'll be adding:

Improve the implementation of the volume constraint filter. This filter often offers remarkable performance enhancements, but I think I can make it faster and increase the range of problems where it is effective.
Extend the rotational symmetry filter to also detect and filter 3-D reflective symmetries. This is exactly like one-sided polyomino constraints: one-sided polyomino pieces are free to rotate in 2 dimensions but not in 3; but because every polyomino piece has a mirror that's also in the piece set, constructions that have mirror symmetries can be flipped up-side-down (a 180 degree rotation through the third dimension) to produce a new valid solution. Similarly, all 3-D puzzles are effectively one-sided relative to the 4th dimension: we puny humans cannot take a puzzle piece and rotate it 180 degrees through the 4th dimension. If we could, we could transform any piece into its mirror image. But, if the piece set as a whole is complete (again, in the sense that the mirror of every piece is also in the available piece set); then it is possible to rotate any solution to any puzzle having mirror symmetries 180 degrees through the 4th dimension to produce a new valid solution. I think it should be straight-forward to further constrain the allowed rotations and/or translations of a piece to eliminate such trivial solutions from the search and achieve additional performance gains for those puzzles that have this property. (E.g., any soma cube or pentacube construction with mirror symmetries should be able to take advantage this technique.)
Add support for multi-threading and/or distributed processing.
Add support for graphical rendering of puzzle trace and solution output.
Additional constraint-based image filters and/or backtrack triggers.
Possibly add other algorithms.

Solutions

Here are all the solutions to the Tetris Cube and a few other popular puzzles:

WINDOWS: solutions.zip

Contents:

bedlamcube_out.txt (all 19186 solutions to the Bedlam Cube puzzle)
pentominoes_10x6_out.txt (all 2339 solutions to the 10x6 pentomino puzzle)
pentominoes_onesided_30x3_out.txt (all 46 solutions to the 30x3 one-sided pentomino puzzle)
somacube_out.txt (all 480 solutions to the Soma Cube puzzle)
tetriscube_out.txt (all 9839 solutions to the Tetris Cube puzzle)

LINUX / UNIX: solutions.tgz

Contents: same as for Windows, but all text files are carriage return stripped.

The solutions for 3-D puzzles need explanation. The first three solutions to the Tetris Cube are shown below. Each solution is displayed as four horizontal slices of the puzzle box like the layers of a four-layer cake. The first slice (on the left) shows the bottom layer of the box; the next slice is the second layer of the box; etc. The letters match the labels of the pieces shown in the photo above, identifying which piece occupies each cell in each layer of the puzzle box. The background color is also set to match the color of the pieces. Because the pieces are three dimensional, they can span multiple layers of this output.

One thing I find fun to do is to use the solutions to place just the first 6 or 7 pieces in the box, and then see if I can solve it from there. It's still challenging, but won't cost you a week of vacation to find a solution.

Conclusions

Applying a host of different algorithms and constraints-based optimizations to a single polyomino or polycube problem can deliver great performance benefits. For large problems it appears that initially simply packing confined regions of the puzzle works well. When the number of remaining pieces reaches some critical threshold (that depends on the complexity of the piece and the topology of the remaining puzzle), switching to an algorithm that seeks out constrained holes or pieces does better. Examples of such algorithms include DLX using a min-fit ordering heuristic, and the MCH algorithm. For the final piece placements, the deBruijn algorithm appears most efficient. Parity-based constraints can offer performance benefits especially for high-parity puzzles. Investing the time to purge the images of pieces that partition a puzzle into two regions that are clearly unfillable based on volume considerations consistently offered considerable performance benefits for all polyomino puzzles examined. Constraining the allowed images of a piece to eliminate rotationally redundant solutions from the search also provides great performance benefits when enumerating all solutions to a puzzle that has rotational symmetries. Some of these techniques are easy to apply, others (like knowing when to start applying the min-fit heuristic, or when to switch to the de Bruijn algorithm) unfortunately require some combination of intuition, experimentation, or experience to use effectively.

These types of puzzles are certainly a marvelous distraction and I end this effort (at least for the time being) leaving many ideas unexplored. I haven't even examined the effectiveness of some of the techniques I've implemented in the solver (e.g., the volume constraint backtrack trigger). Correspondence with other folks interested in these puzzles has brought other promising strategies for attacking these types of problems to my attention, but I must for now return to other more practical projects.

References

Donald Knuth, Dancing Links, Nov 2000.
N.G. de Bruijn, Programmeren van de pentomino puzzle, Euclides 47 (1971/72), 90-104.
Scott Kurwoski, Tetris Cube Solved. [Scott's web pages inspired me to share my own software and his source code offered me my first view of the Fletcher/de Bruijn algorithm.]
Gerard's Polyomino Solution Page [Wonderful stuff. His solver is really fast. He told me how it works in an email. Pure genius.]
David Goodger, Polyform Puzzler: Puzzles and Solutions [Really impressive. The software appears to be a python-based implementation of DLX, but I haven't looked at the source myself.]
Daniel Tebutt, Bedlam Cube Solver. [The solver includes a graphical rendering of solutions via java applet.]
Thorleif Bundgård, Thorleif's SOMA page. [I first learned of coloring tricks at this extensive website. Thorlief's solver is written in BASIC.]
Seppe Vanden Broucke, Solving A Tetris Cube, Recursive Backtracking, Algorithm X, Oh My!. [The solver is written in PHP.]
[Stephen Montgomery-Smith developed distributed processing facilities with which he enumerated all solutions to some larger polyomino problems.]
Josh Carrier, Tetris Cube Solver Cloud. [Josh also setup a distributed processing environment and enslaved some of his friend's computers to solve the Tetris Cube.]
Wikipedia, Polyomino.

Matt’s Double Octagon Deck (Part 1)

matt — Sun, 29 Aug 2010 07:04:30 +0000

For ten years I’ve thought about replacing the decrepit deck behind my house. I wanted to do something unique to appease my sinful pride. This was the initial concept sketch for my deck consisting of two 15 foot diameter octagons placed side-by-side.

The octagon on the right is only about 14 inches above grade. The octagon on the left is one step up (maybe 7 inches higher) and surrounds an octagon-shaped hot-tub. I wanted the decking for each octagon to be laid out circularly (as shown) to emphasize the octagon shapes.

Design

Framing design

Although my web searches did turn up a few pictures of octagon decks with circular decking, I found no plans on how to build them. Figuring out how to frame this deck was quite challenging to me — someone with zippo framing experience. Let me emphasize this point with the following disclaimer:

DISCLAIMER: I am not a professional engineer and have no training or experience in construction. I’m a novice. This page chronicles my own deck building experience. I hope you find the information provided here useful, but it is provided WITHOUT ANY WARRANTY; without even any implied warranties. I make no claims as to the structural integrity of this deck design or whether it is fit for any particular use.

I spent hours talking to my father-in-law (who’s built a deck or two before) and to the folks at Front Range Lumber who familiarized me with the various hardware available from Simpson. With their help, here’s the framing design I finally came up with:

Unfortunately, this drawing is drawn looking north instead of south so left and right are reversed relative to the previous sketch, so I’ll refer to the two octagons as lower (the octagon without the hot tub) and upper (the octagon that’s one step up and surrounds the hot-tub).

Lower octagon design

The lower octagon has one long double-2×10 beam that stretches between two opposite corners, and two more double-2×10 beams that connect at the center of the long beam to form a cross. Each side of the octagon is also a load-bearing double 2×10 beam. The beams forming a cross, split the octagon into four quadrants. All joists are of 2×8 lumber. The joists in each quadrant all run parallel to each other. Note that the joist running through the center of each quadrant is double-wide to provide ample surface area to screw down the ends of the deck boards. Because I could find no hardware to interconnect eight incident double-wide 2x lumber, these four double wide joists don’t run all the way to the center of the deck but instead hang from a side of the small diamond structure which is made just big enough to clear all hanger hardware at the center.

Upper octagon design

Each side of the upper octagon is a double-wide 2×12 beam. (Note that 2×10 beams would be sufficient structurally, but I’m using 2x12s to provide more room to bolt the two octagons together on their common side.) All joists on this side are made of 2×6 lumber because they are so short. At the outside the joists hang from the beams that form the sides of the octagon. On the inside the joists rest on top of (and are cantilevered over) four underlying double 2×10 beams laid out in a simple square. The four concrete piers at the corners of this square have good clearance from the concrete slab for the hot tub.

Post design

Each octagon corner (both lower and upper), has three incident beams (or two beams and a double-wide joist) coming in at 67.5 degree angles. So how do you tie these all together and support them? There’s no post cap I could find that lets you do this. My solution was to notch a post in such a way to support all three beams. I needed a rather fat post to have enough cross sectional area to both support all three beams and still have enough wood left over to bolt everything together. I settled on a post size of 6 inches by 8 inches which I fabricated from two 4×6 rough-cut treated posts that are bolted together. Here is an orthographic three-view drawing of one half of one of my deck posts.

The other half of the deck post is just the mirror image of this drawing. Note that all dimensions in the drawing are in inches and all bolt holes are 1/2 inch in diameter. The notch is shown as nine inches high which is appropriate for a double-2×10. (You’d obviously need to modify this dimension to accommodate 2×8 or 2×12 beams.) It’s important to get the 1.5 inch depth cut just right to match the thickness of your lumber, lest you snap the post when you sandwich the two post halves around the radial beam. The 3.247 inch dimension is chosen so that the exterior face of the side beam will just grace the underlying post corner (so the corner of the post will not stick out underneath the beam).

Here are a couple of pictures of my sample-post.

This sample-post served to give me some confidence that I could actually build this deck before I placed the big order for all the building materials. It was also useful during my first visit to the Lakewood building office. They were initially saying I’d need to pay an engineer to approve my post design, but when I showed them my sample post they decided the thing looked so solid I could forgo an engineer’s approval.

Construction

To minimize the delay before having at least a portion of the deck ready for use and also to reduce the risk of lost time and money in the woeful event this whole project flops, I decided early on to build the lower octagon first and save the other half of the deck for some later summer.

Marking post locations

Once I finished all the drawings required to get my building permit, I marked the post hole locations. I also had to move some sprinklers around. (The black tube is a section of a sprinkler zone that ran under the deck’s new location.)

Digging post holes

Digging the holes to the three foot depth required by local building code was truly a challenge. I started by renting a rather large 9 horse-power auger which worked great until I hit some nasty breccia about 15 inches down. I rented the auger for two days (at $110.00 per day) trying to drill through this stuff, but made very little progress. Then I rented a jack hammer which easily broke up the breccia, but since I could only loosen about 2 inches of rock at a time, several applications of the jack hammer were required on each hole to get them to depth. Getting the loose dirt and rock out of a hole after using the jack hammer was a tedious, and back-wrenching process. (My father-in-law did actually wrench his back.) After accruing over $300.00 in rental fees, I got annoyed and bought my own electric jack hammer for about $600.00. I just wish I had back all the money I’ve spent renting jack hammers and that all but useless auger.

Building a temporary support structure

Usually when you bulid a deck, you pour your piers first; then place your posts; then cut or notch them all to the right height; then connect your beams to your posts, etc. But I needed my posts to be located exactly at the octagon corners. I doubted my ability to get them located accurately enough to make the octagon look true. With so many posts, I thought it would also be quite challenging to get them all level. Lastly, the funny 22.5 degree notch cuts are all made on a band-saw and table-saw — not something I can do once the post is mounted. So instead I built this temporary square support structure out of some 14-foot long 2x4s.

The idea is to build the basic frame of the octagon on top of this structure, with the posts at each corner floating over their respective holes. And then once everything is lined up nice, pour the piers. There’s actually three 2x4s on each side glued and nailed together to provide sufficient strength to support the framing which I estimated to weigh around 1500 pounds (without the joists). Each corner of this square is supported by a threaded rod which itself is stuck in a short length of buried 4×4 to keep it from sinking into the earth. A nut and washer under each corner is used to adjust the height to get the top of the square structure level with the top of the foundation wall of my house.

Making the corner posts

Now it was time to start making posts. Because the lower octagon is so low to the ground my posts are only about twelve inches long. Nine inches of the twelve is notched out so that the post really only extends a scant three inches below the bottom of the beams it supports. (This is actually my major worry about my deck design: that the three inch tall block of wood the beams sit on could split off from the rest of the post. On the bright side, if my deck does ever fail, it can’t possibly fall more than a couple of inches.) Here I am cutting sixteen 12-inch-long post-halves from 4×6 rough cut treated lumber. (I was using clamps to hold the lumber steady because it was a little crooked and I didn’t want it shifting while I cut it.)

SO MUCH SAWDUST! Although I used a full breather most of the time, there was at least one evening I worked late in my garage without a mask and assuredly breathed in more chromated copper arsenate (CCA) than anyone should. Hopefully I’ll finish this deck before lung cancer sets in.

Each post-half required 4 cuts to make the needed notches. Two of the four cuts were trivial, but the other two required a jig to hold the lumber in place as I ran it through the saw. Here’s one of the two jigs I had to make:

And here’s the other. This one was only needed because my band-saw table only tilted clock-wise. You sort of expect this kind of thing from a table saw, but I was surprised to find this limitation on my band saw.

Interconnecting the beams

With the posts all cut, I could start attaching posts to beams (and double-wide joists) and begin framing the deck. One thing I noticed before I started was that not all my 2x lumber was the same size. For example my 2x10s ran anywhere from 9 3/16 inches to 9 7/16 inches in width: a full quarter inch of variance. Normally this isn’t a big deal, but I didn’t want to deal with notching my posts all differently to accommodate this variance so I ended up ripping all my 2x10s down to the standard width of 9 1/4 inches. I figured since this is the nominal size of a 2×10 the inspector wouldn’t complain too much and it did give the added benifit of straightening my lumber. It’s probably worth mentioning that the small diamond structure can’t be nailed with a hammer: I had to buy an air-powered palm nail driver to work in the tight spaces on the interior of this diamond.

Once in place, I verified the lengths of all the radial beams and double-wide joists (measured from one corner to the opposite corner) were close to their designed size of 188 5/16 inches. I then cut the eight perimeter beams to their design length of 72 and 1/16 inches (measured on their shorter inside). I set them on the posts and stretched a rope around the octagon to pull them all in as tight as I could. I then started bolting them in place. I used a couple of clamps to make sure the beam was well seated in the notch and to hold the beam tight against the post face. A sledge hammer was useful for tapping the post (and radial beam) into proper alignment before drilling the holes. I also counter-sinked the holes from the front so the carriage-bolt heads wouldn’t get in the way of decking and/or steps I intend to mount around the periphery of the deck.

Oct 28, 2010

Dropping in the sonotubes and attaching post bases

After a two month hiatus (spent pursuing the creation of this web-site and replacing some rotting siding on my house) I am again working on my deck. I used my clamps in spreader mode between the deck frame and the under-lying square support structure to scootch the deck a few inches to the side giving me access to the holes. I cleaned out the holes (that had partially filled with loose soil from recent rains) and dropped in the sonotubes. I trimmed the tubes to their proper length.

The previous deck happened to have a concrete pier where I could reuse it for my center post. While I had the deck frame slid to the side, I drilled a hole in this pier and installed a wedge anchor bolt to which the post base will attach. It was really hard to find a wedge anchor bolt that was both galvanized and long enough to meet code. I highly recommend wholesalebolts.com. They have a great selection and charge only about 20% of what Home Depot charges.

I scootched the deck frame back into place and then attached the Simpson Strong Tie post bases to each of the eight corner posts. Before I attached each base to its post, I attached a 5/8″ L-Bolt to the bottom of the base. I used two nuts to attach these bolts: one nut on the top-side of the base and one on the bottom. In this way the bolts will be held tight to the base during the concrete pour.

Bridging the deck to the house

I installed the two beams that bridge the deck to the house. These beams simply sit on the house foundation wall and are secured to the house rim joist by lag bolts. I think it’s more typical to use a hanger to connect these to the rim joist, but since my basement is unfinished I was easily able to lag them into place.

Leveling the deck

I was having a devil-of-a-time trying to get everything level with my four-foot level. Is gravity crooked in Lakewood? I began spending my evening hours lusting after absurdly expensive laser levels. Then an old college friend of mine, David Cenedella, told me about a water level. If you have a bucket and some clear plastic tubing, you have everything thing you need to make an extremely accurate level. Just put some water in the bucket; put one end of the tube in the bucket; and fill the tube with water (like you would to make a siphon). Holding the other end of the tube slightly higher than the water level in the bucket will ensure no water siphons out. The water level at the end of the tube is always the same no matter where you move it. Using this technique I leveled each corner of the deck with the top of the beams at the foundation wall. I made gross adjustments by raising or lowering the nuts that the support square rested on; and made finer adjustments with some thin wooden shims. This was so simple and worked so very very well! Thanks David!

Pouring the concrete piers

My father-in-law came out again to visit and together we hand mixed the thirty-four 80 pound bags of concrete needed to fill the eight sonotubes. We were worried about how hard it would be to get concrete in the small gap between the post base and the top of the tube. Initially I made a huge funnel out of a piece of sheet metal, but then we found a simple sheet metal chute worked wonderfully. The concrete pour was then actually straight-forward. It took us about seven hours of very hard work over two days. Here’s one thing I learned: it’s about four times harder to simultaneously mix two bags of concrete than it is to mix one!

After the piers had cured for a few days, I removed the square structure that the beams had previously been sitting on.

Ground leveling

The ground in my backyard slopes toward my house. Although I made my posts as short as I dared, the ground level at the posts farthest from the house was still an inch or two higher than the top of the concrete pier so the bottom of these posts would be about an inch below ground. This would cause the posts to rot out far faster than they otherwise would. So I spent a lot of time lowering the ground level around and under the deck so that the tops of all of the concrete piers were at least 2 inches above ground and to create a slight ground slope away and to the side of my house. I moved probably three or four yards of dirt (which I’m still working to get rid of).

Installing the center post

My center post isn’t quite in the center because the center of the deck has hanger hardware in the way. Instead it’s about a foot-and-a-half off center under the main deck beam. The only thing of note about this step is that my 16′ long main beam had begun to sag under the weight of the other beams hanging off of it, so I had to get out my car jack to raise it about a half inch before installing this post. Fortunately, this upward force wasn’t enough to crack my new concrete piers.

November 11, 2010

Installing flashing

The aluminum sill under the sliding glass door has been precariously unsupported since I removed the old deck. To remedy this, I shaped a couple of boards to sit under this sill and fill the gap in the exterior wall. I also shaped some flashing that should help keep water out of the wall. Once the boards and flashing were installed, I sealed up all the joints with silicone (not shown).

I followed this same procedure to fill the pocket hole in the brick wall.

November 26, 2010

Bolting the beams

Over the months that I’ve been working on this deck the 2x lumber has cupped and curled. About half of my beams developed sizeable gaps between the two boards as seen here.

If I had angled the nails in when I put them together I suspect I would have prevented much of this gapping. To remedy the problem I bolted all the beams and double-wide joists together.

I placed several bolts on each beam placing them alternately about 2 inches from the top and bottom edge of the beam as shown below. (You don’t want to drill any holes too close to the edge of the beam as that would structurally weaken the beam. Check your local building codes for the exact rules.)

Installing joists

The hangers I installed months earlier (the double-wide hangers supporting the crossing beams and the small diamond structure) gave me some troubles because things wanted to move around on me as I nailed in the hangers. I’ve since learned I should have first toe-nailed the beams into place and then installed the hanger. I did much better installing the joist hangers. Here are the steps I took for each joist.

I first cut a joist to length (chopped off at 45 degree on one side and square on the other).
Once I had it to the right length (which would take a couple of attempts as I always initially cut them a bit long), I’d nail on the slopeable/skewable hanger (Simpson Strong-Tie model LSSU28) to the square end of the board.
I jammed the board into place with a spreader clamp and used a sledge-hammer to tap the joist into proper alignment.
I toe-nailed the 45 degree side (just one nail) for additional stability.
At this point the LSSU28 was easily nailed into place.
On the 45 degree side, I used either an SUL26 or SUR26 hanger. These hangers are made of a fairly heavy-gauge metal and were difficult to get into proper alignment by hand, so I used some clamps to hold these hangers flush against the wood before I nailed them.

I used galvanized 10d nails for all angled connections into the beam; but used galvanized “hanger” nails (1.5 inch long nails with a fat shank and a thick head) for everything else (as permitted by Lakewood building code). Here’s a look at the installation of one of my joists.

And here’s a look at the deck after all joists were installed.

The Lakewood building inspector thought everything looked great and was musing he should show pictures of my deck to the professionals to show them how things should be done. I suspect he was blowing smoke with this remark, but appreciated the compliment none-the-less.

August 26, 2011

Incredible Shrinking Joists

Over the winter months, the framing lumber (which was initially weeping wet with CCA treatment) dried resulting in significant shrinkage in board width and thickness. (Board lengths, in contrast, seemed relatively unaffected.) This caused the tops of every one of my joists to drop so they were an eighth of an inch (or more) below flush with the tops of the beams. This same shrinkage caused the bottoms of the joists to pull upwards so they were no longer seated in their hangers. What a mess! Here’s a look at the top of one of my troubled joists.

I ended up pulling every nail out of every joist, and using hundreds and hundreds of metal shims to get the joists back to flush with the tops of the beams. It turns out that the fat heads of hanger nails are easy to grab with a pair of vice grips which gave me something to pry against to pull the nail; so this job wasn’t as hard as you might think. If I had taken the hangers off altogether it would have saved the time to cut all the shims, but the longer nails used to mount the hanger to the beams had no such fat heads making the removal of these nails all but impossible; and in any case I didn’t relish the idea of repositioning all the hangers. I couldn’t figure out where to buy metal shims, so I cut them from galvanized landscape edging. This was a cheap (but time consuming) solution. Here’s a look at one of my joists after I’ve shimmed it to bring it back to flush with the beam.

Many web-sites offer warnings that treated lumber can shrink significantly, but I’ve found very little documenting how shrinking joists in hangers can lose level with the beam or pull up out of the hanger seat. When (and if) I get around to building the other half of this deck, I’m going to search long and hard to find KDAT (kiln-dried-after-treatment) lumber. I also intend to redesign the second half of this deck to use a cantilever arrangement where the joists instead rest on top of the beams (avoiding the use of joist hangers altogether).

Blocking the joists

Over the winter months, my joists also bowed quite a bit. If I had managed to get the decking on before winter this surely would not have been a problem. To straighten them, I blocked all the joists. This should stiffen the floor up nicely too.

Weed block and rock

I put down a bunch of weed block under the deck and bought some landscaping rock to hold it down. This step may have been unnecessary as I don’t expect much sunlight will make it down through the decking.

Ice and Water

At the suggestion of a local deck builder professional (whom I was chatting with at the building office), I’m placing ice-and-water down over all the beams, joists and post-heads. He claimed that water tends to sit on all the horizontal surfaces under the decking eventually leading to rot. Having seen how my cedar siding rotted away where a joist (from the old deck) was lagged against the side of the house, I decided to follow his advice.

Although the ice-and-water is quite easy to cut to shape (with an exacto knife), it would not stay folded down on the sides of the joists and beams, so I spent a lot of time tacking it down. I was surprised by this as a couple of different people told me this stuff is so sticky it’s impossible to pull up once placed.

I’m only covering a pie-piece of the deck at a time since (as I understand) ice-and-water degrades in full sunlight.

Decking

I first needed to mark the center of the deck. Now you could simply measure to the mid-point of opposite corners, but I decided to stretch some mason lines between corners to get a precise placement. Remarkably, all the lines crossed at a single point. I put a nail at this point which I used to make sure all the decking is placed concentrically.

To deck one pie-section, I started by placing a nail at two adjacent corners of the deck positioned so they fall in the gap between the outer two deck-boards. In my case, this was 93 and 1/4 inches from the center nail. I left these nails in place so I could later use them to mark the cut line for the deck boards.

Later, I’ll be using some cedar decking as fascia boards to cover the sides of the perimeter beams (for a clean appearance). I want the outer deck board to over hang that by almost half an inch, which means it needs to over hang the beam by an inch-and-a-half. I also cut a drip edge on this first board which (supposedly) will help keep the rain off the fascia boards.

For each pie section, I chopped off one end of a bunch of deck boards to the needed 22.5 degree angle and let the other end of the board run long. I jammed each deck board tight against the end of the deck board from the previous pie-section. Once the first deck board was screwed down, I stretched some mason strings along each joist-line to help me keep all my screws in a neat line with the screws in the first board.

I actually used 10 penny nails as spacers between each deck board for my first pie-section, but because the widths of my boards varied by a quarter inch, this made it difficult to match this spacing on subsequent sections. I wish instead I would have precisely placed about every fourth board (assuming some nominal board width) and then simply placed the boards between by eye to achieve a good look. This would have kept the variance in board width from accumulating during placement.

Once all the boards are screwed down, I used a circular saw to cut a gap between the pie-section just finished and the previous pie section. (Be sure to use a high-tooth-count blade to minimize splintering. I failed to do this on my first cut — you can see how the boards on the right show a lot more chipping.) This cut gives the boards a little room to expand with moisture and heat and also makes a nice looking line. The cut depth must be set very slightly less than the board thickness to remove most of the board but without cutting into the ice-and-water below.

I then moved the straight edge to chop off the long ends of the deck boards. To find the right line, I simply stretch a line between the guide nail at the corner and the center nail. I clamped my straight edge so the circular saw will follow this line. After the cut (which again was about 1/32 of an inch shy of cutting through the board), I snap off the boards and flipped up the splinters left behind (seen in the image below) with a knife edge.

This procedure worked fine until I got to the final pie-section at which point I had to carefully cut each deck board to length before slipping it into place. I cut these boards so they fit tight at their ends so I could again cut the nice spacer lines as I did between other pie-sections.

To simplify the task of decking the bridge between the house and the octagon, I carefully setup some mason lines to mark the outer edge of the octagon and then temporarily removed the adjacent deck boards from the octagon. This allowed me to again cut the deck boards sloppy long. Once screwed down I cut the ends off with the circular saw along the line marked by the mason lines; and finally replaced the outer deck boards from the octagon. I should have taken a picture of this process, but here’s a look at the finished work.

For the center piece, I cut some decking down to about 2-inches wide and glued several together with polyurethane glue and biscuits. (Gorilla Glue sure is messy stuff.) I’m skeptical this will hold together in the weather. If it falls apart I’ll try building something different.

I wanted this center piece to be of a different color, so I went ahead and stained it chocolate brown before screwing it down. Here’s a look at the deck with completed decking.

Fascia Boards

I screwed deck boards to the sides of the beams giving the deck a polished look.

Rail Posts

Keeping with the octagon theme, I first planed some rough-cut cedar 4×4 posts down to a consistent 3⁵/₈" x 3⁵/₈" and ripped the corners off on my table saw to give the posts an octagon-shaped cross-section. I followed this up with some cross-cuts to fashion an octagon-shaped ball on the post head. This was all pretty darn easy. I then cut slots through the post to accommodate the top and bottom rails. The slots were really hard. Unable to find a mortising bit that would do the job, I actually cut them with a jig-saw; but the wander on the 7 inch blade was bad enough that I had to clean all the cuts up with a hand chisel. A neighbor friend of mine (who happens to be a retired craftsman) tells me I should have used a fluted router bit with a jig to guide the cuts. Live and learn.

I bolted the railposts to the beams that run around the perimeter of the deck. (I had to unscrew a few deck boards to do this — no big deal.) A few posts wanted to lean in or out noticeably (because the perimeter beams weren’t all vertically aligned perfectly). I’ve heard people use washers to correct this problem. I also heard it’s a good idea to install washers to produce an air-gap between the post and the beam (or fascia board in my case) to prevent rot; but it seems to me that concentrating post torque on the smaller washer-sized surface area would lead to wood fiber compression and ultimately a loose post. This is just my guess, I’m likely wrong. In any case I chose to instead use a power sander to knock off a sixteenth of an inch on the top or bottom of the fascia boards to correct my leaning posts.

While I was installing these posts, I also stapled some hail-screen to the same perimeter beams. This screen hangs to the ground, forming a fencing that should keep skunks, foxes, cats and other varmints out from under the deck.

I also had to put a few posts in the ground around my window well. These posts were tricky since they not only had to be plum, square and in good alignment with posts on the deck, but also had to be at exactly the right height (since the rail slots were pre-cut). So I first poured concrete for just one post at the far end of the window well and let that set. I dug the post holes for the other two posts slightly deeper than needed and dropped the posts in, but before I poured the concrete, I installed all my rails around the window-well propping them up as necessary. This suspended the last two posts in proper position while I poured the concrete.

December 24, 2011

Railings

For some reason, I can only seem to find rough sawn cedar in my area, so I used my planer to manufacture smooth and consistently sized lumber. Although the slots in my rail posts were a full 1 1/2 inch in width; I ended up planing my rails down to 1 7/16″ so they would slide in easily and allow for expansion.

I slid the top and bottom rails into place and then marked positions for the 3/4 inch round metal balusters which I spaced at four inches on center. I then took all the rails down and drilled 3/4 inch diameter holes one half-inch deep at each baluster mark. For the bottom rail, I drilled a quarter inch diameter hole through the center of each baluster hole all the way through the board. This should allow any water that gets in the hole to drain out the bottom of the rail. I also counter-sinked some screw holes from the under side of the top rail that will allow me to screw down the rail caps from the under side.

I then put everything back together. Notice that the rail slots for the top rail are an inch taller than the rail itself. Once the rails are slid into place, you can pull the top-rail up by one inch which gives you room to drop the balusters into the holes on the bottom rail and then slide the top-rail back down over the metal balusters. I drove two 3 inch deck screws through the posts at each rail slot to secure the rails. When it comes time to restain the deck, I can remove these screws, slide the top rails up, and remove all balusters in just a few minutes time.

Window Well Gate

I wanted the window well gate to have the same look as the fencing, so I wanted to avoid using the diagonal brace that is typical of out-door gates. I used my band-saw and planer to make a dozen cedar boards that were 1/2″ thick and 3 1/2″ wide. I sandwiched and glued these boards together to make a gate frame that was 3 boards thick on each side, the corners interleaved to make strong joints. (I’m not sure what you call this kind of joint or if it even has a name.) Note that the baluster holes at the top are drilled all the way through so the balusters can be dropped in. I dropped some cedar sticks into the holes before I attached the top rail so the balusters could not slide up once everything was put together. I used a spacer board and some clamps to hold the gate to the post while attaching hinges and latch hardware. This worked great.

At this point I had my final building inspection. I had a different inspector this time and he didn’t have much to say, which I guess is a good thing.

I was working on my steps around the deck when the winter snows hit. I’ll finish this up and finish with the staining as soon as whether conditions permit, but since this deck is on the north side of my house, this may not be till spring time.

May 22, 2012

Steps

I framed out a single continuous step that wraps half-way around the deck. Only two sides are functional, the rest lead up to a railing and are just for aesthetics. Perhaps we’ll put some potted plants here. The steps are fashioned from box frames that rest on half-buried cinder blocks and are bolted to the side of the deck (and to each other). I had to trim down some 2x4s to use as spacers to clear the rail posts.

Staining the Deck

I applied two coats of Cabot Transparent Cedar stain. The color matches well with the rail posts that I stained last year. (No fading or darkening.) This stain does have a bit of a sheen. I do like the color contrast of the dark brown center piece, but I don’t like the difference in gloss. Someday, I may sand down and restain the center piece with a different Cabot stain.

November 16, 2012

Flagstone Patio Around the Deck

Here’s my first attempt at a flagstone patio. Because I’m both inexperienced and wholly ignorant of good flagstone installation techniques, I strongly advise you to read other flagstone installation guides and make your own informed decisions about how best to install a flagstone patio. Trying to do anything seen here will probably lead to uneven stones, stones that slide around, smashed fingers, loss-of-eye, excessive weed growth, environmental damage to your soil, the growth of brain-killing fungus in and around your patio, gunfire from angry neighbors, dementia, global warming, lawsuits from multiple government agencies and environmental groups, and the spontaneous evolution of a new globally dominant life form. Proceed at your own risk.

Ground Leveling

You don’t want to place flagstone on dirt. Dirt is difficult to level, difficult to compact, and susceptible to settling. So I started by excavating a few inches of earth around the deck. I then put down 1-to-3 inches of crusher fines. I’ve read other sites that recommend a layer of road-base, followed by a layer of crusher-fines or sand. Given the hardness of my soil, I suspect I’ll do just fine with only the fines.

I buried lengths of 2×4 lumber every 6 feet or so, and leveled the tops with the desired level of the fines. I used a small mason level to level each 2×4 along its length and used a long straight edge between three adjacent 2x4s to make sure they were all level with each other.

I spread the crusher fines in a section with a rake, and then dragged a straight edge across adjacent 2x4s to get a level surface.

I shoveled out any excess soil and tossed it into the next section. I bought a tamper to compact the surface, but the crusher fines really don’t seem to need much compaction. If I had to do it over I probably wouldn’t have bought the tamper and just skipped this step.

I pulled out the 2×4 closest to the flagstone already laid and patched up the whole using a little board with a straight-edge. I then put down flagstone in this section before leveling the next section. I only leveled one section at a time because I couldn’t seem to avoid disturbing the ground in the work area.

Positioning the Flagstone

I found selecting which piece to lay next a time consuming and laborious task. Each of my stones were nearly 2 inches thick and weighed between 100 and 150 pounds. It was no fun lugging them around trying to find pieces that fit well with the pieces already laid. And finding just one piece that fits well is not really enough: you might find a stone that fills a nook well, only to later find it created a new space that is difficult to fill. For this reason I think it important to always know how the next 2 or 3 pieces are going to be laid. Usually I’d do this by putting pieces roughly into place (slightly laying them on top of each other at the edges) to be sure the next few pieces would cover the immediate work area without much waste. Here you can see how I’m testing the fit of a couple of pieces.

The remaining open space above was the final space in the patio. I only had four pieces left and it wasn’t obvious how to best cover this area with them. And of course they were all big heavy things. To save my back I traced the remaining stones onto newspaper and cut them out. I then used these paper copies to puzzle out how to cover the hole. In the first image below I managed to cover the hole with just 3 of my 4 paper cutouts. I doubt this saved time, but it sure saved a lot of grunting and also kept the flagstone scratch free. To make things look right I had to trim the existing pieces as well as the new pieces. The second picture below shows how things looked in the end. (I failed to take the second photo at the same angle. It’s a bit confusing. Sorry!)

Cutting the Flagstone

Before cutting flagstone, be sure to don protective gear for your eyes and ears. Little bits of rock would regularly fly into my safety glasses, and cutting rock is really really loud. Here’s a picture of me in my safety gear. I’m still waiting for that first big movie contract.

Cutting stone puts out a lot of dust which seems to destroy my power tools. I’m sure there are expensive professional quality concrete saws built to withstand this dust, but I chose instead to sacrifice an old cheap nearly worn-out circular saw that I probably paid $25.00 for 25 years ago. By the end of the project this saw sounded horrible, but it still spins.

Because I was only putting in a 3-foot-wide patio around the deck, most flagstone pieces needed one straight edge. Some pieces had one edge that was straight enough and I just used it as is, but others I decided to straighten. I started by flipping the stone over and drawing a chalk line (using some of my kid’s side-walk chalk). I set the blade depth so it only cut a little over half way through the stone and made the cut. Then I’d flip the stone back over and tap the edge with a sledge hammer to snap it off. I made all cuts from the back side because it leaves a natural looking broken edge on the visible top side of the stone. Be sure to knock it off from the top (uncut) side: hitting it from the bottom (cut) side will sometimes knock off a layer of rock on the walking surface.

To trim a piece so that it fits well with other pieces already laid, I’d start by slipping the new piece under any pieces already laid that it intersects. I then traced the edge of the pieces already laid onto the new stone with chalk. I’d mark the line to get about a one inch gap. I’d then pull the piece out and transcribe the chalk line to the back of the stone. I did this by hand: I found simply placing a pointed finger from my left hand on the chalk line and trying to touch the same point on the other side with a finger from my right hand works very well. After marking a three or four points this way, I’d just connect the dots to sketch the line. It doesn’t have to be perfect: some imperfection looks better anyway.

I’d then cut the stone, sometimes making a few straight straight cuts to carve out a curved line. For interior angle cuts, I’d make a series of cuts perpendicular to the chalk line and knock it out with a sledge. There may be better ways to do this. This worked ok though.

Leveling the flagstone

After getting a piece to the right shape and placing it, it would almost always need adjustment either because it wasn’t quite level with the existing stones or because it would rock slightly when you walk on it. I’d typically have to remove a piece just placed three or four times before I got it both level and stable.

Weed Block

I wanted to be sure no weeds grew in the gap between my deck and the flagstone. The plan was to use the weed block I had previously put down under the deck which I purposefully left long to extend out under the flagstone patio. Here I’ve pulled back the rock on some of Home Depot’s “Contractor Grade” weed block fabric. This has only been down for a year and grass has already penetrated the fabric. Complete Junk. Frankly, I’m not sure where to buy good weed block. So I ended up supplementing this fabric with some very heavy black plastic. After I had a large section of flagstone down and level, I temporarily pulled up the stones near the deck and put down some plastic under the deck and under the inner patio. Using plastic like this is bad for your soil and promotes the growth of harmful fungus and mold, so don’t do what I did! If someone knows where I can buy landscaping fabric that will last more than a year in the Denver area please let me know!

Filling the Cracks

I filled the cracks with a product called Envirostone. You water it into place and it sets up hard; but if it ever cracks you can just get it wet again and repack it. I’m not sure this is the best product, but some friends of mine used it on their patio and were pleased. I started by sweeping it into the cracks. The instructions on the bag then recommend blowing the remaining dust off the flagstone with a leaf blower. This removed some of the dust for me, but not all. I went ahead with the last step of using a hose to soak the Envirostone in the cracks and also to further clean the flagstone surface. I thought things looked good, but after drying I found I was still left with some residue on my walking surface making my previously red flagstone look pink. On the whole, I think the patio still looks very good, but maybe a power washer would have served me better (if I could have avoided blasting the Envirostone out of the cracks). I don’t own a power washer, but I may buy or rent one to try to further clean the surface of the flagstone. Short of that I’m hopeful the red color will return on its own in time.

The End!

Crooked Dice in Facebook Super Farkle?

matt — Sun, 29 Aug 2010 07:02:28 +0000

I modified my Zilch strategy generation software to model the scoring rules for the Super Farkle game available at Facebook. I wasn’t previously a Facebook user so I created my account just to try out my strategy. Over several days, I played about 180 games of Farkle and was winning about 55% of the time. But I’m not sure if this means much for a few reasons.

First, almost everyone I played, played very well. I guess this makes sense since in Super Farkle you play for chips; and if you don’t play reasonably well, it will be very difficult to win enough chips to play at the higher stakes tables. The people I played against rarely made strategy errors that cost more than a few points. One notable exception was the common mistake of taking two-ones on an opening roll instead of just one-one. This play returns 57 fewer points for your expected score and it occurs with enough frequency to be significant in a typical game. But in general, I was impressed with how closely people played to the strategy that maximizes expected scores — especially at low turn score states. So if my strategy offered any advantage at all, it was probably very slight and I doubt 180 games was enough to make a clear differentiation.

Second, in Super Farkle whoever forms the table rolls first. The average score for a well-played Super Farkle turn is just under 550 points. One can argue the disadvantage to the player going second is half that or about 275 points. (Why don’t they just roll to see who goes first?) To make a fair test of the strategy, I should have played half my games by forming a new table, and played the other half by joining an existing table, but my competitive nature just wouldn’t allow me to concede 275 points to my opponent. So instead I spoiled my own test by always forming my own table. I suspect that the advantage of going first may have overshadowed any advantage my strategy was offering over the high quality play of my opponents.

Finally, and perhaps most importantly, there’s almost certainly something wrong with the Super Farkle dice. I detailed why I believe this to be so on the Farkle review page at Facebook. Here’s the text from that review:

This game is quite nice; but there is a serious problem. The probability of rolling a 6-die FARKLE is exactly 1 chance in 43.2. You can find this calculation all over the web. Here’s one professor at Michigan Ann Arbor that shows the calculation: http://notaboutapples.wordpress.com/2009/07/27/multinomial-coefficients-and-farkle/

Apparently I’ve played about 180 games of Farkle, but I’ve never once thrown a 6 die Farkle. If you assume a typical game has 15 turns, then that’s 15 x 180 = 2700 6-die rolls — and that’s not even considering hot-dice rolls. The probability of not throwing even one 6 die farkle in that many rolls is exactly:

(1-(1/43.2))^2700 = .000 000 000 000 000 000 000 000 000 344

If I’m counting my zeros right, that’s less than one chance in an octillion. Yes, octillion is a real number — a very very big number. So I suggest there is something wrong with the dice. Can I be sure there’s something wrong with the dice? Of course not, but I can say this. According to wikipedia, the visible universe is about 92 billion light years across. And 1 light year is about 6 trillion miles. And there are 5280 feet in a mile. If you lined people up one foot apart (you’d have to use skinny people) across our entire visible universe; and then sat them all down in front of their own laptop playing farkle; and had them all roll 6 dice over-and-over only stopping when they had their first 6-die farkle; then you’d expect about ONE of them (yes just one) to go as far as 2700 rolls without farkling. I suppose I could be that one person….uhmmm…yeah…right.

Maybe some manager made a marketing decision that 6-die farkles just annoyed people too much and the developers were simply asked to reroll the dice one time when a 6-die farkle showed up. Or maybe they are just using a really bad random number generator for their dice rolling engine. Or maybe there’s something more insidious going on. But something is surely amiss.

Interestingly, shortly after I posted this review, I was mysteriously logged out of Facebook and subsequent login attempts were denied. Coincidence? In any case, my foray into Super Farkle play is ended. I played enough games to at least see that the strategy was doing very well — and was highly consistent with the play of seasoned Farkle addicts veterans.

Maximizing Expected Scores in the Game of Zilch

matt — Thu, 19 Aug 2010 19:38:52 +0000

Image by Thunderchild7

Zilch is a fun little dice game codified into an online game by Gaby Vanhegan that can be played at http://playr.co.uk/. Zilch is actually a variation of the game Farkle which goes by several other names including Zonk, 5000, 10000, Wimp Out, Greed, Squelch and Hot Dice¹. I've worked out the strategy that maximizes your expected game score and wanted to share the analysis, my strategy finder software, and the strategy itself. Depending on whether you have zero, one or two consecutive zilches from previous turns, three successively more conservative turn-play strategies are required to maximize your long term average score. Using these three strategies you rack up an average of 620.855 points per turn, which is the best you can possibly do.

Beyond the scope of Gaby's implementation of Zilch, the scoring rules of Farkle vary from venue to venue and the strategies provided here do not generally apply, but the analysis and the software do.

If you understand conditional probabilities, expectations, and can do a little algebra, you should be able to follow along. If you're just here to take the money and go pound someone in the game, you'll need to at least read and understand Strategy Formulation before you try to interpret the tables.

Finding E(s,n) - the Zilch strategy function

Maximizing expected score across all turns

Results

The optimal strategies

Strategy T₀: playing with no consecutive zilches

Strategy T₁: playing with one consecutive zilch

Strategy T₂: playing with two consecutive zilches

Software

Conclusions

References

Background

I've found several blog postings where folks have offered probabilistic analyses of various aspects of the game^2,3,4, but none (that I've seen) find the game strategy that maximizes your expected points. It is possible that I'm the first to publish these solutions. If not, it was still a fun problem. I've always enjoyed software, algorithms, optimization, and probabilities and this problem delves into all of these areas.

The Rules of Zilch

Zilch is played with two players and six six-sided dice. (Though really there's nothing to stop you playing with more people, but this is not supported in the online game.)

Each player takes turns rolling the dice. The dice in a roll can be worth points either individually or in combination. If any points are available from the roll, the player must set aside some or all of those scoring dice, adding the score from those dice to their point total for the turn. After each roll, a player may either reroll the remaining dice to try for more points or may bank the points accumulated this turn (though you can never bank less than 300 points).

If no dice in a roll score, then the player loses all points accumulated this turn and their turn is ended. This is called a zilch, a sorrowful event indeed.

If all dice in a roll score, the player gets to continue his turn with all six dice. This is called a free roll and is guaranteed to brighten your day.

A player may continue rolling again and again accumulating ever more points until he either decides to bank those points or loses them all to a zilch.

If a player ends three consecutive turns with a zilch, they not only lose their points from the turn but also lose 500 points from their banked game score. (This is the only way to lose banked points.) After a triple zilch, your consecutive zilch count is reset to zero so you're safe from another triple zilch penalty for at least three more turns.

The game ends when one player has banked a total of 10,000 points and all other players have had a final turn.

Scoring is as follows:

Each 1 is worth 100 points.
Each 5 is worth 50 points.
A set of three 1s is worth 1000 points.
A set of three 2s is worth 200 points.
A set of three 3s is worth 300 points.
A set of three 4s is worth 400 points.
A set of three 5s is worth 500 points.
A set of three 6s is worth 600 points.
Each extra die in a set doubles the value of the set. So four 4s are worth 800 points and six 1s are worth 8000.
Three pair is worth 1500 points.
A six die straight is worth 1500 points.
Six dice with no other scoring options at all are worth 500 points. (And this is why a 6 die roll is called a free roll: you can't zilch when rolling 6 dice.)
Each die can only be used once when scoring. (If you roll two 1s, two 2s, and two 3s you can either count the two 1s for 200 or use all six dice for three-pair and 1500 points — you can't use the ones both ways for 1700 points.)

Limitations

The strategy presented will maximize your expected Zilch scores, but this is not necessarily the same strategy that will let you reach 10,000 points in the fewest number of turns; and certainly falls short of giving a complete gaming strategy that will maximize your chances of winning the game⁵, the holy grail of Zilch analysis. In particular, the strategy considers neither your current overall score, nor your opponent's score, nor the fact that the game ends when a player reaches 10,000 points (after the other player gets a final turn). All that I offer is a way to maximize your expected Zilch scores.

My intuition is that when you're in the lead you should play more conservatively; and when you're behind you should play more aggressively. (Though I think it a common mistake to be too aggressive too early when behind.) Consider this extreme example. Let's say you're currently beating your opponent 7500 to 1500 and it's your turn. On your turn you rack up 2500 points and are faced with the choice of either banking the 2500 or rolling five dice to go for more points. The strategy identified here advises you to roll the five dice; but surely in this case it is better to bank the 2500, putting you at the game goal of 10,000 points and forcing your opponent to try to put out 8550 points in a single turn to steal the win away from you.

Analysis

Strategy Formulation

I will start by showing how to maximize the expected points for a particular turn. Because of the three consecutive zilch rule, the strategy that actually maximizes the average points gained across all turns is different: it is possible to trade off some expected gain in those turns where you have either zero or one consecutive zilches to reduce your zilch probability and more strongly avoid even getting into a turn where you are facing your third consecutive zilch. I will solve for this more complete strategy later, but for now let's stick with maximizing the expected points for a single turn and just ignore how such a greedy strategy might negatively affect the outcome of subsequent turns.

For my purposes, a Zilch turn has a state that may be completely defined by two variables (s, n) where s is the number of points accumulated in the current turn, and n is the number of dice you are about to roll. At the beginning of a new turn, the turn state is (s=0, n=6). Let's say for your opening roll you throw:

1, 3, 3, 4, 4, 6

The turn state will then advance to (s=100, n=5). You actually have no choice here: you must always select at least one scoring die and since the 1 (worth 100 points) is the only scoring die, you must select it. Furthermore, you are not allowed to bank less than 300 points so you must roll the five remaining dice.

Suppose with the remaining 5 dice you roll:

1, 1, 2, 3, 5

Here you have three scoring dice: two 1s and a 5. You now have a choice of turn states that you may enter:

(s=150, n=4) (take just the 5)
(s=200, n=4) (take a single 1)
(s=250, n=3) (take a single 1 plus the 5)
(s=300, n=3) (take the two 1s)
(s=350, n=2) (take the two 1s plus the 5)

Note that s includes not just the points taken from this roll, but also all points accumulated in previous rolls during this turn as well. It should be clear that state B is better than A and state D is better than C. Of the remaining three states (B, D and E) it's not so obvious which is better. You also have the option of banking from either of states D or E (but not from B since you don't have 300 points in that case). Obviously, banking from state D is just plain dumb: if you're going to bank you'll do so from state E to bank as many points as you can! That leaves you with four reasonable choices:

enter state B = (s=200, n=4) and roll;
enter state D = (s=300, n=3) and roll;
enter state E = (s=350, n=2) and roll; or
enter state E = (s=350, n=2) and bank.

My objective is to find the optimal turn play strategy that defines what to do in all such situations which when followed will maximize the expected number of points for the entire turn starting from any given turn state.

Let E(s, n) be the expected number of additional points you will gain for the turn if you (perhaps non-optimally) roll while in state (s, n) but then follow the optimal turn strategy (which we hope to find) for all subsequent decisions in the turn. Note that E(s, n) includes not just the expected points for the upcoming roll, but all the expected points from all subsequent rolls, if any, as dictated by chance and the optimal play strategy.

Suppose we somehow solve for E(s, n) and find that:

	E(s=200, n=4) =	149
	E(s=300, n=3) =	34
	E(s=350, n=2) =	-20

Applying this information to the example leads to the following final expected scores for the turn.

Final expected score by rolling from state B = 200 + 149 = 349.
Final expected score by rolling from state D = 300 + 34 = 334.
Final expected score by rolling from state E = 350 - 20 = 330.
Final expected score by banking from state E = 350.

So, the choice that leads to the highest expected score for the turn is to bank the 350 points. From this example, it should be clear that if we can find E(s, n) for all possible game states (s, n) we'll have the optimal Zilch turn play strategy.

Finding E(s,n) - the Zilch strategy function

Let,

T(s, n) =

{

s + max(0, E(s, n))
s + E(s, n)

for s ≥ 300
for s < 300

(1)

T(s, n) is simply the total expected points for the turn given that you are in turn state (s, n) and you follow the optimal strategy. The special case for s < 300 models the rule that you can't bank less than 300 points. The max function used when s ≥ 300 models the requirement that you bank when E(s, n) is negative, and roll otherwise.

Suppose we are in some particular state (S, N) then let r₁, r₂, … r_R be all possible rolls of N dice that do not result in a zilch. For any given roll r_i you can potentially enter multiple game states (s₁, n₁), (s₂, n₂), … (s_K, n_K) (depending on which combination of scoring dice you choose — just like in the previous example). Define C(r_i, S, N) to be the particular scoring combination among all scoring combinations possible with roll r_i that when applied to turn state (S, N) will advance the turn to the new state (S_i, N_i) that maximizes T. What could be simpler? Let C_S be the number of points taken in scoring combination C, and let C_N be the number of dice used in scoring combination C.

I also need a simple little function F(n) to reset the state variable n back to 6 when a score is selected that uses all remaining dice:

F(n) =

{

6
n

for n = 0
for n ≠ 0

(2)

We can now express E(S, N) as a weighted sum of the expected scores of all states reachable from (S, N):

E(S, N) = -p_N(S+y) +

∑_i

T(S_i, N_i) - S

6^N

(3)

where

S_i	=	S + C_S(r_i, S, N)
N_i	=	F(N - C_N(r_i, S, N))
p_N	=	probability of zilching when you roll N dice
y	=	zilch penalty.

To handle the three zilch rule, I've introduced the constant y which gives the additional penalty (beyond loss of all turn points) for rolling a zilch. Setting y to 0 models turns where you have only zero or one consecutive zilches. Setting y to 500 models turns where you are playing with two consecutive zilches. As we shall see, these two cases will lead to two different turn play strategies.

The term -p_N(S+y) gives the expected decrease in your score due to the likelihood of a zilch. The terms (T(S_i, N_i) - S) give the expected increase in your score given that you throw r_i. Summing over all possible r_i and multiplying by the probability of rolling any particular r_i gives the appropriate weighted sum.

Equation 3 expresses E(S, N) in terms of the T values of all the game states reachable from (S, N). But here's the important thing: any game state (s, n) reachable by any roll r from (S, N) has s > S. (Your score can only go up if you don't zilch and by definition r is not a zilching roll.) So, if we already know T(s, n) for all s > S, then we can calculate E(S, N) using the above summation.

I claim there exists some large value of accumulated turn points S_BIG where the optimal turn play strategy is to always bank when faced with rolling less than six dice and to always roll when you have six dice to roll. If I set S_BIG equal a million points, then I'm claiming that if you've somehow accumulated a million or more points on the current turn (an absurdly large number of points to be sure) you'll want to bank them if you're ever faced with rolling five (or fewer) dice: the 7% chance of losing all of your points far outweighs any comparatively meager gains you might achieve by continuing to roll. This claim is equivalent to saying:

E(s, n) < 0 for s ≥ S_BIG, 1 ≤ n ≤ 5

(4)

Now if you have six dice to roll, you risk nothing so you might as well further insult your opponent by adding to your million point score. The number of points you expect to gain in this situation through the end of your turn is a constant:

E(s, n) = E_BIG6 for s ≥ S_BIG, n = 6

(5)

Here's how to solve for E_BIG6. Let

E_B	=	the expected number of points gained from a single roll of 6 dice given that the roll does not grant another free roll (so you have to bank).
E_F	=	the expected number of points gained from a single roll of 6 dice given that the roll does grant another free roll.
p_F	=	probability of a 6 die roll granting another free roll.

These terms are easily calculable by simply enumerating all the six die rolls and determining the best possible scoring combination in each case. (There's a subtlety here I'm not going to bore you with regarding how to score a roll of four 1s and a pair of either 2s, 3s, 4s or 6s; I explain this in detail in the software comments for the interested reader.) Once found they can be used in the following sum:

E_BIG6 = (1-p_F) E_B + p_F (E_F + (1-p_F) E_B + p_F (E_F + ... ))

(6)

This nicely simplifies to,

E_BIG6 =

E_B +

p_F

1-p_F

E_F

(7)

Combining Equations 1, 4 and 5 give

T(s, n) =

{

s + E_BIG6
s

for s ≥ S_BIG, n = 6
for s ≥ S_BIG, 1 ≤ n ≤ 5

(8)

Knowing T(s, n) for s ≥ S_BIG, we can now use Equation 3 to iteratively calculate E(S_BIG - 50, n), E(S_BIG - 100, n), … E(0, n). The rest is just the grunt work of writing the software to implement the curious function C; solving for E_BIG6; and solving for all E(s, n) for s < S_BIG. (Did I just slander my own profession?) But before we start grunting let's see what we can do about the three consecutive zilch problem.

Maximizing expected score across all turns

There are actually three different types of turns:

T₀: a turn played with no previous consecutive zilches,

T₁: a turn played with one previous consecutive zilch; and

T₂: a turn played with two previous consecutive zilches.

Using the technique already described, we can find the strategies that will maximize the expected points in each of these turns independently, but what we really want is a strategy for each turn that when used together will maximize the average score for all of these turn types when weighted by the frequency of the appearance of the turn type in a game.

If z_i is the probability of zilching while in turn type T_i (while following some strategy designed specifically for that turn type) then we have the state transition diagram shown in Figure 1.

Figure 1. State transition diagram governing the transitions between states T₀, T₁ and T₂.

Performing a steady state analysis of this system we can find the probability t_i of being in any particular state T_i. (I.e., we want to find what fraction of our turns will be of each type.) We have these flow equations which must balance:

t₀ = (1-z₀)t₀ + (1-z₁)t₁ + t₂
t₁ = z₀t₀
t₂ = z₁t₁

Also

t₀ + t₁ + t₂ = 1

Solving gives

t₀ = 1 / (1 + z₀ + z₀z₁)
t₁ = z₀ / (1 + z₀ + z₀z₁)
t₂ = z₀z₁ / (1 + z₀ + z₀z₁)

Define E_i to be the expected points gained for a turn of type T_i. Then the average score for all turns is:

E_AVG = t₀E₀ + t₁E₁ + t₂E₂

E_AVG =

E₀ + z₀E₁ + z₀z₁E₂

1 + z₀ + z₀z₁

(9)

E_AVG is what we want to maximize. Both E_i and z_i are just a function of the strategy used to play a turn of type T_i. The strategy employed for T₂ only affects E₂, so E₂ can be independently maximized — something we already know how to do. That leaves the strategies for T₀ and T₁. E₂ is the term that's pulling down our average score since it's the turn played with the 500 point penalty for zilching. Can we modify our strategies for T₀ and/or T₁ in such a way so as to trade off some of our expected gains in those turns to reduce the coefficient z₀z₁ on E₂ and thereby actually increase E_AVG?

In Equation 3, I introduced the variable y to model the penalty for a zilch in a game. I said it should be set to 0 normally, but set to 500 if we are playing a turn where the third consecutive zilch is imminent. If we extend this idea and allow y to become a free variable, we can examine different levels of trade-off between expected score and the probability of zilching. For each y value, we'll find the optimal strategy given that zilch penalty; and then find both the expected number of points per turn and the probability of zilching on the turn for that strategy. E_AVG then becomes a function of just two variables y₀ and y₁. We then need only to find the particular values Y₀ and Y₁ that maximize E_AVG. Piece of cake!

When doing this analysis, it's important to understand that the penalties y₀ and y₁ are artificial. The true zilch penalty for these turns is of course zero. Accordingly, the values calculated for E(s, n) will not represent the true expected change in points for the turn from state (s, n). But the values E(s, n) do still define a strategy, dictating that you roll if E(s, n) is positive, and that you bank when E(s, n) is negative. Likewise, the E(s, n) values are still used in the normal way to determine which state among the reachable states after a roll is most desirable. To get the actual expected increase in score from state (s, n), you must add back the false zilch penalty times the probability of zilching for the remainder of the turn. Although you could calculate this for all states (s, n); we only really need to know the true expectation for the turn as a whole, which we can get by correcting E(0, 6). This gives rise to the notion of a corrected expectation for the turn:

E_C = E(0,6) + yz

(10)

Enough analysis! On to the results! Now I am become death, destroyer of Zilch.

Results

The optimal strategies

I wrote a little java program that solves for E_BIG6; finds E(s,n) for a supplied zilch penalty, y; for that strategy, calculates the probability of zilching, z; and also outputs the corrected expected points for the turn, E_C. Running the software for the case y=0 we get:

	E_BIG6 =	478.237
	E_C = E(0, 6) =	623.017
	z =	.193326

So the best you can do for a single turn is to rack up an average of about 623 points, and zilch about 1 time in 5. I'll get to the actual strategy tables shortly, but first let's solve for the optimal strategies required for the three zilch rule for turn types T₀, T₁ and T₂.

Finding the best strategy for T₂ is easy: just set y=500 and you get these results:

	E₂ = E(0, 6) =	547.157
	z₂ =	.132148

You don't use E_C here since the 500 point zilch penalty is not artificial but real. This penalty reduces the maximum expected points per turn by about 12%. The more conservative play required here also reduces the zilch probability by about a third.

To find the best strategies for T₀ and T₁ we need to let the zilch penalty for those two turn types (y₀ and y₁) vary and then maximize E_AVG as given by Equation 9. Table 1 shows how varying the penalty for zilching (y) affects the probability of zilching (z) and the corrected expected points per turn (E_C). Due to the integral nature of the problem, there are fairly large ranges of y that have no affect on the strategy. I'm only listing y values among those tried that produced a strategy change:

Table 1. Probability of zilching for the turn, z, and corrected expected score, E_C, as a function of the zilch penalty, y.

y	z	E_C
0	0.193326	623.017489
10	0.193326	623.017488
15	0.193302	623.017141
17	0.193296	623.017049
22	0.190399	622.955542
24	0.182110	622.759187
26	0.178151	622.657753
27	0.177759	622.647306
30	0.177757	622.647238
38	0.177723	622.645977
42	0.177619	622.641662
44	0.174575	622.509618
65	0.174551	622.508057
67	0.174543	622.507569
68	0.170991	622.268940
72	0.170988	622.268745
77	0.170631	622.241338
80	0.170620	622.240487
88	0.170484	622.228569
92	0.170389	622.219825
115	0.170387	622.219696
117	0.170383	622.219130
122	0.157678	620.678963
127	0.157507	620.657322
130	0.157498	620.656168
138	0.157427	620.646349
142	0.157364	620.637441
165	0.157362	620.637123
167	0.157357	620.636297
172	0.157356	620.636150
177	0.157286	620.623706
180	0.157271	620.620945
188	0.157239	620.615023
192	0.157171	620.602036
203	0.144131	617.962533
215	0.144129	617.962108
217	0.144120	617.960165
222	0.143469	617.816023
227	0.143448	617.811242
230	0.143424	617.805798
231	0.142245	617.534058
235	0.140672	617.165031
238	0.140661	617.162252
242	0.140573	617.141121
265	0.140572	617.140733
267	0.140558	617.137106
272	0.140556	617.136678
277	0.140553	617.135694
280	0.140521	617.126870
288	0.140519	617.126208
292	0.140413	617.095273
315	0.140411	617.094663
317	0.140401	617.091542
322	0.140391	617.088225
327	0.140390	617.088121
330	0.140376	617.083493
338	0.140376	617.083407
340	0.140343	617.072035
342	0.140031	616.965512
365	0.140029	616.964776
367	0.139995	616.952543
372	0.139981	616.947224
380	0.139967	616.942009
392	0.139576	616.788919
415	0.139575	616.788296
417	0.139555	616.780056
422	0.139544	616.775513
430	0.139522	616.765914
442	0.139326	616.679483
465	0.139325	616.678925
467	0.139318	616.675611
472	0.139308	616.670956
480	0.139279	616.657114
481	0.132230	613.270797
492	0.132148	613.230640

Pumping this table through a little awk script (which I hacked out at a command prompt and didn't save for you), I found that E_AVG is maximized when Y₀ = 0 and Y₁ = 72. Here are the summary statistics:

	y₀ =	0
	z₀ =	.193326
	E₀ =	623.017

	y₁ =	72
	z₁ =	.170988
	E₁ =	622.269

	y₂ =	500
	z₂ =	.132148
	E₂ =	547.157

This gives

E_AVG = 620.855

For turn type T₀, you're best off just going for the maximum expected points possible: trying to play more conservatively doesn't reduce your zilch probability (or the probability of entering state T₂) enough to offset the corresponding loss in expected points for turns of type T₀.

For turn type T₁ (when you've got one consecutive zilch) you're best off pretending you will be penalized an extra 72 points if you zilch. This reduces your expected score by only 0.2% but reduces your probability of zilching by about 10%. This little extra protection against your third consecutive zilch slightly increases your overall average turn scores.

Let's move on to the actual strategies.

Strategy T₀: playing with no consecutive zilches

Table 2 below gives E(s, n) for all s ≤ 3200 for the case y = 0. This is the strategy achieving the maximum expected points for a turn and is the best strategy to use if you didn't zilch on your previous turn. The first table entry is E_BIG6 = 478.237. The last table entry gives the total expected points for the turn: E₀ = 623.017. The probability of zilching for the entire turn (not shown in the table) is z₀ = .193326.

Table 2. E(s,n) for turn type T₀.

	n
s	6	5	4	3	2	1
3200	478.237	-6.608	-340.997	-775.515	-1319.085	-1948.921
3150	478.237	-2.750	-333.126	-761.626	-1296.863	-1915.588
3100	478.237	1.108	-325.256	-747.737	-1274.640	-1882.254
3050	478.323	4.966	-317.386	-733.848	-1252.418	-1848.921
3000	478.706	8.824	-309.515	-719.959	-1230.196	-1815.573
2950	479.301	12.682	-301.645	-706.070	-1207.971	-1782.162
2900	479.897	16.540	-293.774	-692.181	-1185.734	-1748.665
2850	480.492	20.398	-285.904	-678.291	-1163.471	-1715.134
2800	481.088	24.256	-278.033	-664.394	-1141.189	-1681.602
2750	481.683	28.114	-270.161	-650.488	-1118.900	-1648.070
2700	482.278	31.973	-262.286	-636.578	-1096.612	-1614.538
2650	482.874	35.833	-254.407	-622.667	-1074.324	-1581.006
2600	483.470	39.694	-246.527	-608.754	-1052.035	-1547.475
2550	484.069	43.558	-238.645	-594.840	-1029.747	-1513.943
2500	484.677	47.423	-230.761	-580.924	-1007.458	-1480.410
2450	485.290	51.290	-222.876	-567.008	-985.170	-1446.876
2400	485.975	55.157	-214.989	-553.089	-962.881	-1413.339
2350	486.949	59.026	-207.101	-539.170	-940.591	-1379.789
2300	488.222	62.896	-199.211	-525.251	-918.299	-1346.179
2250	489.496	66.767	-191.320	-511.331	-895.995	-1312.472
2200	490.771	70.640	-183.429	-497.410	-873.664	-1278.714
2150	492.048	74.513	-175.538	-483.482	-851.309	-1244.955
2100	493.326	78.386	-167.645	-469.545	-828.945	-1211.197
2050	494.604	82.260	-159.748	-455.601	-806.581	-1177.438
2000	495.884	86.136	-151.848	-441.655	-784.217	-1143.678
1950	497.164	90.013	-143.944	-427.706	-761.853	-1109.919
1900	498.448	93.894	-136.037	-413.755	-739.488	-1076.159
1850	499.740	97.776	-128.128	-399.802	-717.124	-1042.398
1800	501.041	101.661	-120.218	-385.848	-694.759	-1008.635
1750	502.344	105.547	-112.306	-371.893	-672.394	-974.870
1700	503.867	109.434	-104.392	-357.936	-650.029	-941.102
1650	505.684	113.323	-96.476	-343.979	-627.662	-907.298
1600	507.502	117.213	-88.558	-330.022	-605.289	-873.408
1550	509.327	121.105	-80.641	-316.064	-582.895	-839.469
1500	511.172	124.998	-72.723	-302.102	-560.480	-805.529
1450	513.033	128.891	-64.805	-288.132	-538.055	-771.583
1400	514.894	132.784	-56.883	-274.156	-515.630	-737.633
1350	516.756	136.679	-48.958	-260.177	-493.203	-703.679
1300	518.619	140.575	-41.030	-246.196	-470.774	-669.725
1250	520.484	144.474	-33.100	-232.212	-448.345	-635.771
1200	522.356	148.374	-25.167	-218.227	-425.916	-601.816
1150	524.236	152.277	-17.233	-204.240	-403.487	-567.860
1100	526.119	156.180	-9.297	-190.252	-381.057	-533.901
1050	528.180	160.085	-1.360	-176.263	-358.627	-499.941
1000	530.368	163.992	6.579	-162.273	-336.196	-465.950
950	532.560	168.763	14.520	-148.283	-313.760	-431.909
900	534.870	174.577	22.461	-134.293	-291.310	-397.845
850	537.684	180.570	30.403	-120.299	-268.848	-363.762
800	540.959	186.564	38.345	-106.301	-246.379	-329.574
750	544.307	192.559	46.289	-92.299	-223.889	-295.226
700	547.655	198.555	54.235	-78.294	-201.356	-260.789
650	551.006	204.879	62.183	-64.276	-178.780	-226.340
600	554.457	212.146	70.136	-50.240	-156.188	-191.890
550	558.365	219.989	78.096	-36.194	-133.594	-157.423
500	562.820	227.838	86.062	-22.143	-110.997	-122.863
450	567.530	235.694	94.033	-8.089	-88.381	-88.136
400	572.248	243.557	102.010	5.970	-65.722	-53.275
350	576.985	251.428	-	-	-43.013	-18.370
300	581.746	-	-	34.134	-20.274	16.539
250	-	-	-	48.243	6.148	51.455
200	-	-	149.232	64.645	40.331	-
150	-	-	163.981	91.507	-	-
100	-	306.667	184.939	-	-	-
50	-	322.318	-	-	-	-
0	623.017	-	-	-	-	-

Using this table, you can easily figure out what to do in any turn play situation. Consider these examples.

With 300 points and 3 dice to roll, you should roll.
With 300 points and 2 dice to roll, you should bank.
With 300 points and 1 die to roll, you should roll.
With an opening roll of (3, 3, 3, 5, 2, 6) you should take the 5 and roll five dice.
With an opening roll of (1, 1, 1, 1, 4, 4), you should score it as three pair for 1500 and take a free roll.
If you already have 500 points and then roll (1, 1, 1, 1, 4, 4), you should score it as a set of four 1s for 2000 and bank.

Strategy T₁: playing with one consecutive zilch

Table 3 below gives E(s, n) for all s ≤ 3200 for the case y = 72. This is the optimal strategy for turns of type T₁ (when you're playing with one consecutive zilch). The corrected expected points for the turn is: E₁ = 622.269. The probability of zilching for the entire turn is z₁ = .170988.

Table 3. E(s,n) for turn type T₁.

	n
s	6	5	4	3	2	1
3200	478.237	-12.164	-352.330	-795.515	-1351.085	-1996.921
3150	478.237	-8.306	-344.460	-781.626	-1328.863	-1963.588
3100	478.237	-4.448	-336.589	-767.737	-1306.640	-1930.254
3050	478.237	-0.590	-328.719	-753.848	-1284.418	-1896.921
3000	478.237	3.268	-320.848	-739.959	-1262.196	-1863.588
2950	478.490	7.126	-312.978	-726.070	-1239.974	-1830.254
2900	479.039	10.984	-305.108	-712.181	-1217.751	-1796.879
2850	479.635	14.842	-297.237	-698.292	-1195.522	-1763.412
2800	480.230	18.700	-289.367	-684.403	-1173.271	-1729.888
2750	480.826	22.558	-281.497	-670.510	-1150.994	-1696.356
2700	481.421	26.416	-273.625	-656.607	-1128.707	-1662.824
2650	482.016	30.275	-265.751	-642.699	-1106.419	-1629.292
2600	482.612	34.134	-257.874	-628.788	-1084.130	-1595.760
2550	483.208	37.995	-249.995	-614.876	-1061.842	-1562.229
2500	483.804	41.858	-242.113	-600.962	-1039.554	-1528.697
2450	484.408	45.722	-234.230	-587.047	-1017.265	-1495.165
2400	485.020	49.588	-226.345	-573.131	-994.977	-1461.631
2350	485.634	53.456	-218.460	-559.214	-972.688	-1428.095
2300	486.436	57.324	-210.572	-545.295	-950.399	-1394.558
2250	487.662	61.193	-202.683	-531.375	-928.109	-1360.988
2200	488.935	65.064	-194.792	-517.456	-905.813	-1327.317
2150	490.210	68.936	-186.901	-503.536	-883.495	-1293.567
2100	491.486	72.808	-179.010	-489.612	-861.147	-1259.809
2050	492.764	76.682	-171.118	-475.678	-838.785	-1226.051
2000	494.042	80.555	-163.224	-461.737	-816.421	-1192.292
1950	495.321	84.430	-155.325	-447.791	-794.057	-1158.532
1900	496.600	88.307	-147.422	-433.843	-771.693	-1124.773
1850	497.882	92.186	-139.516	-419.893	-749.329	-1091.013
1800	499.169	96.067	-131.608	-405.942	-726.964	-1057.253
1750	500.468	99.951	-123.698	-391.988	-704.600	-1023.492
1700	501.770	103.837	-115.788	-378.034	-682.235	-989.727
1650	503.075	107.723	-107.874	-364.077	-659.870	-955.960
1600	504.884	111.611	-99.959	-350.120	-637.503	-922.192
1550	506.702	115.501	-92.042	-336.163	-615.137	-888.340
1500	508.520	119.393	-84.125	-322.206	-592.755	-854.402
1450	510.357	123.285	-76.207	-308.248	-570.346	-820.463
1400	512.214	127.178	-68.289	-294.281	-547.922	-786.520
1350	514.075	131.071	-60.370	-280.306	-525.497	-752.572
1300	515.936	134.965	-52.446	-266.328	-503.071	-718.619
1250	517.799	138.861	-44.519	-252.348	-480.643	-684.665
1200	519.663	142.758	-36.590	-238.365	-458.214	-650.711
1150	521.529	146.658	-28.658	-224.381	-435.785	-616.756
1100	523.408	150.559	-20.724	-210.394	-413.356	-582.801
1050	525.290	154.463	-12.789	-196.408	-390.926	-548.844
1000	527.217	158.367	-4.853	-182.418	-368.496	-514.884
950	529.405	162.273	3.086	-168.429	-346.066	-480.915
900	531.595	166.585	11.026	-154.439	-323.633	-446.896
850	533.840	171.940	18.967	-140.449	-301.191	-412.833
800	536.398	177.933	26.908	-126.458	-278.733	-378.761
750	539.486	183.927	34.850	-112.461	-256.266	-344.627
700	542.833	189.921	42.793	-98.460	-233.787	-310.353
650	546.182	195.917	50.738	-84.457	-211.274	-275.947
600	549.531	201.970	58.685	-70.445	-188.717	-241.498
550	552.913	208.696	66.637	-56.417	-166.130	-207.048
500	556.531	216.537	74.593	-42.375	-143.535	-172.593
450	560.750	224.384	82.556	-28.326	-120.940	-138.093
400	565.457	232.237	90.525	-14.273	-98.336	-103.453
350	570.171	240.097	-	-	-75.702	-68.632
300	574.899	-	-	13.848	-53.014	-33.729
250	-	-	-	27.930	-30.282	1.178
200	-	-	130.189	35.303	-7.274	-
150	-	-	141.074	47.867	-	-
100	-	288.116	151.530	-	-	-
50	-	298.518	-	-	-	-
0	609.958	-	-	-	-	-

Strategy T₂: playing with two consecutive zilches

Table 4 below gives E(s, n) for all s ≤ 3200 for the case y = 500. This is the optimal strategy for turns of type T₂ (when you're playing with two consecutive zilches). The expected points for the turn is: E₂ = 547.157. The probability of zilching for the entire turn is z₂ = .132148.

Table 4. E(s,n) for turn type T₂.

	n
s	6	5	4	3	2	1
3200	478.237	-45.189	-419.700	-914.403	-1541.307	-2282.254
3150	478.237	-41.331	-411.830	-900.515	-1519.085	-2248.921
3100	478.237	-37.472	-403.960	-886.626	-1496.863	-2215.588
3050	478.237	-33.614	-396.089	-872.737	-1474.640	-2182.254
3000	478.237	-29.756	-388.219	-858.848	-1452.418	-2148.921
2950	478.237	-25.898	-380.348	-844.959	-1430.196	-2115.588
2900	478.237	-22.040	-372.478	-831.070	-1407.974	-2082.254
2850	478.237	-18.182	-364.608	-817.181	-1385.751	-2048.921
2800	478.237	-14.324	-356.737	-803.292	-1363.529	-2015.588
2750	478.237	-10.466	-348.867	-789.403	-1341.307	-1982.254
2700	478.237	-6.608	-340.997	-775.515	-1319.085	-1948.921
2650	478.237	-2.750	-333.126	-761.626	-1296.863	-1915.588
2600	478.237	1.108	-325.256	-747.737	-1274.640	-1882.254
2550	478.323	4.966	-317.386	-733.848	-1252.418	-1848.921
2500	478.706	8.824	-309.515	-719.959	-1230.196	-1815.573
2450	479.301	12.682	-301.645	-706.070	-1207.971	-1782.162
2400	479.897	16.540	-293.774	-692.181	-1185.734	-1748.665
2350	480.492	20.398	-285.904	-678.291	-1163.471	-1715.134
2300	481.088	24.256	-278.033	-664.394	-1141.189	-1681.602
2250	481.683	28.114	-270.161	-650.488	-1118.900	-1648.070
2200	482.278	31.973	-262.286	-636.578	-1096.612	-1614.538
2150	482.874	35.833	-254.407	-622.667	-1074.324	-1581.006
2100	483.470	39.694	-246.527	-608.754	-1052.035	-1547.475
2050	484.069	43.558	-238.645	-594.840	-1029.747	-1513.943
2000	484.677	47.423	-230.761	-580.924	-1007.458	-1480.410
1950	485.290	51.290	-222.876	-567.008	-985.170	-1446.876
1900	485.975	55.157	-214.989	-553.089	-962.881	-1413.339
1850	486.949	59.026	-207.101	-539.170	-940.591	-1379.789
1800	488.222	62.896	-199.211	-525.251	-918.299	-1346.179
1750	489.496	66.767	-191.320	-511.331	-895.995	-1312.472
1700	490.771	70.640	-183.429	-497.410	-873.664	-1278.714
1650	492.048	74.513	-175.538	-483.482	-851.309	-1244.955
1600	493.326	78.386	-167.645	-469.545	-828.945	-1211.197
1550	494.604	82.260	-159.748	-455.601	-806.581	-1177.438
1500	495.884	86.136	-151.848	-441.655	-784.217	-1143.678
1450	497.164	90.013	-143.944	-427.706	-761.853	-1109.919
1400	498.448	93.894	-136.037	-413.755	-739.488	-1076.159
1350	499.740	97.776	-128.128	-399.802	-717.124	-1042.398
1300	501.041	101.661	-120.218	-385.848	-694.759	-1008.635
1250	502.344	105.547	-112.306	-371.893	-672.394	-974.870
1200	503.867	109.434	-104.392	-357.936	-650.029	-941.102
1150	505.684	113.323	-96.476	-343.979	-627.662	-907.298
1100	507.502	117.213	-88.558	-330.022	-605.289	-873.408
1050	509.327	121.105	-80.641	-316.064	-582.895	-839.469
1000	511.172	124.998	-72.723	-302.102	-560.480	-805.529
950	513.033	128.891	-64.805	-288.132	-538.055	-771.583
900	514.894	132.784	-56.883	-274.156	-515.630	-737.633
850	516.756	136.679	-48.958	-260.177	-493.203	-703.679
800	518.619	140.575	-41.030	-246.196	-470.774	-669.725
750	520.484	144.474	-33.100	-232.212	-448.345	-635.771
700	522.356	148.374	-25.167	-218.227	-425.916	-601.816
650	524.236	152.277	-17.233	-204.240	-403.487	-567.860
600	526.119	156.180	-9.297	-190.252	-381.057	-533.901
550	528.180	160.085	-1.360	-176.263	-358.627	-499.941
500	530.368	163.992	6.579	-162.273	-336.196	-465.950
450	532.560	168.763	14.520	-148.283	-313.760	-431.909
400	534.870	174.577	22.461	-134.293	-291.310	-397.845
350	537.684	180.570	-	-	-268.848	-363.762
300	540.959	-	-	-106.301	-246.379	-329.574
250	-	-	-	-92.299	-223.889	-295.226
200	-	-	37.142	-128.047	-266.961	-
150	-	-	8.190	-189.755	-	-
100	-	196.017	-32.853	-	-	-
50	-	167.775	-	-	-	-
0	547.157	-	-	-	-	-

Comparing Table 4 with Table 2 you can see that playing with two consecutive zilches is almost identical to playing without any consecutive zilches while pretending you have 500 more points for the turn than you really do. To see this compare the 500 point line in Table 4 with the 1000 point line in the Table 2. They are identical. This remains true until you get down to point values below 300 at which time the 300 point minimum bank rule forces you to roll even though rolling gives a negative expected change in your score.

Software

The software I wrote to find optimized Zilch strategies is 718 lines of java code (or 322 lines comments stripped). Please observe the GNU public license copyright protection or I may have to introduce you to my friend Guido. You can download with either zip or tgz compression as convenient:

Downloads: zilch.zip OR zilch.tgz

The compile command is simply:

javac Zilch.java

Then to run it type:

java Zilch

You can optionally add a zilch penalty to the command line. For example, to run the program with y = 500 type:

java Zilch 500

To find the best strategy for variations of the game that use different scoring rules, just change the scoring constants at the top of the file. If you set a score to zero, then that score combination is effectively eliminated from the game and is instead treated as a zilch. So if in your Farkle variant, the six-die nothing roll is just a zilch, you need only set NOTHING_SCORE = 0. The software will then interpret this as a zilching roll.

E_BIG6 is calculated for you. If you set the NOTHING_SCORE to 0, (giving you a non-zero chance of zilching on a 6 die roll) then E_BIG6 will be correctly initialized to 0. There's a chicken-and-egg problem associated with the calculation of E_BIG6 which required a bit of finger work to resolve reliably in the face of various possible scoring changes. Check out the comments for method initEBIG6 if you're interested.

The smallest valid value for S_BIG is also determined through a binary search, so you need not worry about changing that for different scoring options.

On my 10 year old home computer, the strategy for Zilch is determined in about 7 seconds. Farkle strategies take about 30 seconds. (Farkle is much slower because S_BIG has to be set big enough that the chance of a 6 die farkle out-weighs the potential gains of a 6 die roll.) No doubt you could solve these same problems in a tiny fraction of a second with appropriate optimizations, but I personally don't have a need for better performance.

Conclusions

This was a fun problem. The trick is to work the problem backwards: finding the expected scores for high point states first, and then working your way back down to lower and lower scores until finally you get the expected score from the starting state. Everything else is just details (which hopefully I've gotten correct).

One thing I found surprising about the results is just how incredibly insensitive the corrected expected turn score is to the zilch penalty. The optimal strategy for the case of an infinite zilch penalty drops the probability of zilching from .193326 down to .126959 (the minimum zilch probability you can achieve for a turn). Playing that same strategy on a turn where the zilch penalty is actually zero drops your expected score from 623.017 down to 605.851 — it only costs you 17 points per turn! That's less than 300 points over the course of a typical game, and that's nothing in a game of Zilch. I think this is true because almost all the big points in Zilch come from 6 die rolls where there's no chance to zilch. So, playing to reach 300 points as reliably as you can and then banking as soon as you face a roll of less than six dice reduces your expected scores very little compared to playing for maximum expected points. I found this very surprising and somehow unsatisfying.

I'd be pleased to know if you found this document comprehensible; or if you found any errors in the analysis or the software. Leave a comment or send me an email. If you're lucky, you might even meet me masquerading as pips in a game of Zilch. Just don't expect to win.

References

FARKLE, Wikipedia.
Multinomial Coefficients and Farkle, Cap Khoury, Jul. 2009.
Farkle Odds, Gregory Graham, August 2009.
Study of the game Zilch part 1, Leadhyena Inrandomtan, November 2008.
FARKLE, Expectation, and Knowing What You Want, Cap Khoury, August 2009.

Piece Name	Size	Unique Orientations	Complexity
A	6	24	144
B	6	24	144
C	5	24	120
D	5	24	120
E	6	24	144
F	5	24	120
G	5	12	60
H	5	24	120
I	5	24	120
J	5	12	60
K	5	24	120
L	6	24	144

Piece Name	Size	Unique Orientations	Complexity
A	6	24	144
B	6	24	144
C	5	24	120
D	5	24	120
E	6	24	144
F	5	24	120
G	5	12	60
H	5	24	120
I	5	24	120
J	5	12	60
K	5	24	120
L	6	24	144

Matt's Maniacal Musings

Fixed Image List Algorithm (FILA)A backtrack algorithm for solving tiling problems

Contents

Background

Fletcher's Image Sifting Technique

De Bruijn's Image Sifting Technique

Fixed Image List Algorithm (FILA)

solveFila

selectFila

selectFila for Fixed-Order Heuristics

selectFila for F Heuristic

selectFila for E Heuristic

getImageListSet

Neighbor Occupancy Filtering (NOF) and Image List Set Indices (ILSI)

Priority Occupancy Filtering (POF)

FILA Performance

Macro FILA Performance Characteristics

Test Case P: Pentominoes in a 10x6 Rectangle

Test Case OP: One-sided Pentominoes in an 18x5 Rectangle

Test Case TC: Tetris Cube

Test Case PT: Pentominoes + Tetrominoes in a 13x13 Diamond

Micro FILA Performance Characteristics

Performance Comparison of polycube 2.0 and polycube 1.2.1

Future Work

FILA Ordering Heuristics that Target Pieces

Code Generation

Applications to Other Geometries

Software Download

Conclusions

References

Optimal Play of the Farkle Dice Game

Maximizing Win Probability in the Game of Farkle

Contents

The Rules of Farkle

Markov Decision Processes and Value Iteration

Extending the MDP to support Farkle

Game State Characterization

The Farkle Value Iteration Equation

Performance

The Strategy

Optimal Strategy Case: $b = 0, d = 0, f = 0, e = 0$

Optimal Strategy Case: $b = 6000, d = 8000, f = 0, e = 0$

Optimal Strategy Case: $b = 8000, d = 6000, f = 0, e = 0$

Optimal Strategy Case: $b = 9000, d = 9500, f = 0, e = 0$

Optimal Strategy Case: $b = 9500, d = 9000, f = 0, e = 0$

Strategy Validation

Comparison to strategy that maximizes expected turn score

Effects of the banked score lower bound

Conclusions

Next Steps

References

Tracking the States of a Set of Objects by PartitionAn Introduction to the Partition Container

Contents

Motivation

An Example

The problem

A solution

Conclusions

Software Downloads and Documentation

Solving Polyomino and Polycube Puzzles Algorithms, Software, and Solutions

Contents

Motivation

Backtrack Algorithms

Puzzle Complexity

De Bruijn's Algorithm

Dancing Links (DLX) Algorithm

DLX Description

Ordering Heuristics

Most Constrained Hole (MCH) Algorithm

Combining the Algorithms

Software Optimizations

Bit Fields

Early MCH Fit Count Exit

Fast Permutation Algorithm

Constraints

Backtrack Triggers

Parity Constraint Violations

Volume Constraint Violations

Image Filters

Puzzle Bounds Constraint Violations

`selectFila` for F Heuristic

`selectFila` for E Heuristic

Piece Name	Size	Unique Orientations	Complexity
A	6	24	144
B	6	24	144
C	5	24	120
D	5	24	120
E	6	24	144
F	5	24	120
G	5	12	60
H	5	24	120
I	5	24	120
J	5	12	60
K	5	24	120
L	6	24	144