I've designed a new backtrack algorithm for solving tiling problems that I call fixed image list algorithm (FILA). The algorithm is flexible in that it supports various ordering heuristics.^{1} By using a heuristic that always selects the first open cell, FILA behaves like Fletcher and de Bruijn's algorithms.^{2,3} If you use a heuristic that always picks the cell that has fewest fit options, it behaves like my most constrained hole (MCH) algorithm.^{4} A key distinguishing feature of FILA is that ordering heuristics return neither a target cell to fill, nor a target piece to place, but rather return a set of image lists that should be tried (where an image is defined as a particular translation of a particular rotation of a puzzle piece). The returned set contains one list of images for each uniquely shaped puzzle piece. Although the ordering heuristics that target cells are best suited to FILA, the interface does allow heuristics to target pieces, and is a subject for additional research.
This interface allows the heuristic to select and return a precalculated set of image lists that is customized in three different ways to radically reduce the size of the lists by eliminating most images that cannot possibly fit. First, because different lists are calculated for each cell, only images bounded by the puzzle walls are included in the lists. Second, some heuristics (like that used by Fletcher's algorithm) guarantee that cells are filled in a particular order. For such fixed selection order heuristics, FILA identifies this order during initialization and, through a procedure I call priority occupancy filtering (POF), only includes images in a list for a cell that do not conflict with cells that must already be filled. Third, a technique I call neighbor occupancy filtering (NOF) (which is similar to a technique Gerard Putter described to me in a 2011 email conversation) precalculates a different set of image lists for each possible occupancy state of the adjacent neighbors of each target cell. For a 3D polycube puzzle, there are up to six adjacent neighbors (in the $x$, $y$, $z$, $+x$, $+y$, and $+z$ directions), and so up to 64 different sets of image lists are precalculated for each puzzle cell. Later, when a cell is selected by the heuristic, the current occupancy state of those adjacent neighbors is determined, and the set of image lists corresponding to that compound state is returned, guaranteeing no image conflicts with those neighboring cells. In this way, the number of images that must be tried by FILA at each recursive step is radically reduced, improving algorithm efficiency relative to other algorithms that make no such optimization, but without the expense of continuous list maintenance as is required by Donald Knuth's DLX algorithm.
Version 2.0 of my polycube puzzle solver only includes the DLX and FILA algorithms, but the retired algorithms (de Bruijn, EMCH, and MCH) can all be recreated with FILA by using the f (first), e (estimate), and s (size) heuristics respectively. In addition all of the other implemented heuristics, previously only available to DLX, may now also be used with FILA. Despite the additional abstraction, the new FILA algorithm (even with the new NOF optimization disabled) has improved puzzle solve times (I've seen from 10% to 35%). Enabling NOF (by simply adding n to the command line) consistently provides additional incremental performance gains (I've seen from 5% to 27%). Because performance gains afforded by NOF are not attributable to changes in the search tree, but rather are limited to the efficient elimination of many images that don't fit at each branch; these performance improvement percentages should not compound as puzzle size increases, but should rather be largely independent of puzzle size.
Although the examples shown here, and the polycube puzzle solver software are limited to 3D puzzles on a cubic lattice, FILA and all of it's supporting components have no such constraints and can be used to solve tiling problems in any number of dimensions on any lattice.
December 3, 2018 Edit: I changed the title of this blog entry and edited the above introduction to make it clear that FILA was not limited to 3D puzzles on a cubic lattice.
I was recently invited to work on a paper on polyomino puzzle parity with professor Marcus Garvie of the University of Guelph, which got me back to the subject of polyomino and polycube puzzle solving. During my research for that paper, I read Fletcher's 1965 publication on solving the 10x6 pentomino problem shown in Figure 1, and also spent an evening making my best effort translation (via google translate) of de Bruijn's 1971/72 paper on solving the same puzzle.
Although some 50 years old, Fletcher and de Bruijn's algorithms (which are extremely similar) are still widely regarded as the fastest for many tiling problems. These algorithms define the fixed list of 63 possible rotations of the 12 pentominoes. At each recursive step, the algorithms target an unfilled cell (a hole), and attempt to fill that cell by translating each of the 63 piece images to the absolute position of the target, and then attempting to place each of those translated images to cover the target. Here an image is defined as a particular translation of a particular rotation of a particular puzzle piece. (This is a slightly looser definition than I used in my previous blog article^{4} in that here I allow an image to fall partially outside the boundary of the puzzle box as required by Fletcher and de Bruijn's algorithms. I'll use the term bounded image to refer to images contained wholly inside the puzzle box.) If you were to simultaneously watch animations of the Fletcher and de Bruijn algorithms placing and removing pieces from the puzzle as they search for solutions, you would find them to be almost identical. In particular, the search trees the two algorithms explore are identical. (I.e., the set of partial assemblies the two algorithms produce are identical.) Only the order in which the branches of that tree are explored differ.
Both Fletcher and de Bruijn modeled the 10x6 puzzle box so the shorter sides (dimension 6) are on the left and right, and the longer sides (dimension 10) are topandbottom. To eliminate rotationally redundant solutions, both Fletcher and de Bruijn started by placing the X piece in one of 7 locations in the upper left quadrant of the box. Then, starting with the topleft cell, the algorithms scan down searching for an unfilled puzzle cell. When the bottom edge of the puzzle box is reached, the scan continues at the top of the next column to the right. When an unfilled cell is found (the target), the algorithms try all 63 possible orientations of the 12 pentomino pieces to fill that target. For each such orientation one could try to place each of the five constituent cubes of the pentomino into the target, but because all cells to the left and directly above that cell are guaranteed to be previously filled, there is only one cell of each oriented piece that can possibly fit the puzzle. (For a more complete explanation, see Figure 2 of my previous blog article.^{4}) So each algorithm need only try 63 images to fill the target. For each successfully placed image, the algorithm recurses, scanning for the next unfilled cell in order. When the list of images is exhausted at any particular target, the algorithms backtrack to the previous target and continue processing where it left off, considering each of the remaining images in the list at that cell.
The order in which the 63 images are considered at each cell do differ, but other than that the algorithms are identical if your view of them is limited to the animations. But internally, their designs significantly differ in how they process the 63 images to quickly sift through those that do not fit and identify the ones that do. The two approaches each have certain advantages and disadvantages in efficiency. It was these differences that got me thinking about how to better optimize this aspect of backtrack algorithms that rely on fixed image lists that ultimately led to my design of FILA.
During any particular recursive step, most of the images in the list of 63 images will be found to be unusable either because the image corresponds to a piece that is already used, or because it conflicts with pieces already placed on the board, or because it intersects the boundary of the puzzle box itself. Donald Knuth's dancing links (DLX) algorithm,^{1} in contrast, takes the different approach of maintaining dynamic image lists for each puzzle cell and for each puzzle shape, that are continuously pruned and restored during algorithm execution. At any recursive step, for each unfilled puzzle cell a perfect image list is available that includes all images that cover that cell, but only those images that still fit in the puzzle without conflict, and only images for pieces that are still available. Likewise, there is a perfect image list available for each remaining piece that includes all images of that piece that still fit in the puzzle without conflict. But for many problems the time to keep these lists updated is more than the time saved by not having to sift through images that are either no longer available or no longer fit.
Still, many other tiling problems, due to their curious geometries, are not efficiently solved with the simple Fletcher algorithm that always fills the puzzle from lefttoright. DLX's abstract data model easily supports any conceivable ordering heuristic allowing it to better solve these problems. Further, this same abstract data model allows it to be used for a wide variety of problem domains that go beyond tiling problems in Z^n space. Recognizing their performance advantages, FILA was designed to use fixed (precalculated) image lists, but also have the flexibility to be easily used with a variety of ordering heuristics.
In this article, I'll start by showing how Fletcher and de Bruijn sifted through the list of 63 pentomino images. I'll then detail FILA, and show how it's use of cellspecific image lists generated with POF and NOF image filtering attempts to capture the best aspects of both approaches and improve upon them.
Fletcher's approach to checking for image fits is interesting. Instead of checking the 63 images for conflicts independently, he explores the cells around the target cell following a predefined tree structure. The distance from the root to each leaf of this tree is length 5. There are 63 leaves on the tree, each corresponding to one of the 63 pentomino orientations. If either a filled cell or a puzzle wall is encountered while following a branch, then exploration of that branch is terminated and all images subtending that branch are efficiently skipped. For example, the first cell of the tree that is checked is directly below the target cell. If that cell is filled, then 29 images from the list of 63 are skipped and the cell to the immediate right of the target cell is then tested to see if it is filled. Each time a leaf is reached, the pentomino corresponding to that leaf is checked for availability (it may have already been used). If available, the pentomino is placed and the algorithm recurses.
I've mapped the tree Fletcher designed to an animated player shown in Figure 2. The black area to the left represents cells that were found to be previously filled, and the red cell (shown at step 0) is the first empty cell found during the search for an empty cell. In the original 10x6 pentomino problem, the tree cannot possibly be fully traversed because either the top or bottom of the puzzle would interfere, so I've increased the vertical dimension of the puzzle area so that the entire search tree can be examined. Each time a leaf of the tree is reached, I display an image counter in the topright area of the puzzle so you can more easily keep track of where you are in the image list.
The entire tree can be explored with only 90 memory accesses, but in practice, far fewer steps are typically needed for the 6x10 pentomino problem as either the puzzle boundary or previously placed pieces will interrupt many branch explorations. Still, there are some aspects of this approach that are undesirable. First, Fletcher doesn't check to see whether a piece is even available until a leaf of the search tree is reached. 83% of the pieces placed during algorithm execution on the 10x6 pentomino puzzle are for the last 4 pieces of the puzzle, so a significant amount of time is spent checking to see if pieces that are no longer available will fit. Second, note that despite the tree structure, the occupancy state of many cells are checked multiple times. For example, you can see from the movie player, that the cell just to the right of the target cell is checked in steps 5, 9, 20, and 44. If that cell is filled, those four checks respectively eliminate 1, 4, 10, and 34 images. It would be nice if only one check of that cell was made and, if found to be occupied, all 49 of these conflicting images were eliminated at once. Unfortunately, due to the diversity of the pentomino shapes, there is no way to construct a simple exploration tree that avoids revisiting cells.
I'll make one other observation which is important to understanding the effectiveness of NOF filtering (discussed later). Because the puzzle is filled from left to right, cells further to the right of the root cell are decreasingly likely to be found occupied. Also, note that occupancies detected further from the root eliminate fewer images of the tree. So the cells nearest the root are among the most likely to be occupied, and also eliminate the most images when they are occupied.
Like Fletcher, de Bruijn's software started by placing the X piece in one of 7 board positions. Then when trying to fill a targeted cell, de Bruijn simply linearly iterated over the remaining 62 images. But where Fletcher checked for piece availability last, de Bruijn made this check first. Here's is an excerpt of his program that I attempted to translate to English:
refillingAttempt: if warehouse[pieceNum[i]] = 0 then goto nextSlice; for i:= step 1 until 4 do if occupied[cell + relpos[slice, i]] = 1 then goto nextSlice; warehouse[pieceNum[slice]] := 0; occupied[cell] := 1; for i:= step 1 until 4 do occupied[cell + relpos[slice,i]] := 1; . . // recurse, or produce solution if this was the last piece . nextSlice: i := i + 1; if i <= 63 then goto refillingAttempt;
As you can see, he kept the definitions for all 63 images in a two dimensional array: relpos[slice, i]
, where slice = 1
to 63
was what I'd call an image number, and i = 1
to 4
identifies the four cells of the pentomino shape (other than the cell occupying the target cell which is known to be open). The value of each array entry is an integer that specified the relativeposition of a constituent cell of the image (slice). This number could be added to the integer cell location of the target to give the integer cell location of the i
^{th} cell of the slice
. He also had an array pieceNum
defined so that pieceNum[slice]
mapped the image number slice
back to it's prototype polyomino number (1 to 12). He had a boolean array called warehouse
which tracked the availability of the 12 polyominoes. There was also a boolean array called occupied
which tracked whether each puzzle cell was occupied or not.
This linear iteration over 62 images seems far less efficient than Fletcher's treelike search over the grid space, but this approach does have the one advantage of not checking cell occupancy states for images of puzzle pieces that have already been placed. Because most of the search is done when only few pieces remain, most images in the list of 63 are skipped without checking board availability at all. Running the algorithm on the 6x10 pentomino problem, I found that on average only 3.65 pieces must be considered at each recursive step. Because each piece has on average 5.25 unique images, at each recursive step, the algorithm only checks cell availability for about 3.65 x 5.25 = 19.2 images.
The original motivation for my design of FILA was to try to somehow take advantage of Fletcher's approach of quickly eliminating many potential images from an image list due to a single puzzle cell being occupied, while simultaneously somehow quickly skipping over images for pieces that are no longer available (in the spirit of de Bruijn's algorithm).
We'll start by looking at the pseudo code for the main backtrack processing of FILA which will reveal it's recursive nature and the abstract interface to the ordering heuristic. The ordering heuristic is where the new and interesting stuff happens, and is explained over a few sections wherein the workings of NOF, and POF are explained.
Assume you have a puzzle with $P$ polycube pieces that are to be used to fill some puzzle region $R$. We will not require that each piece have a unique shape, so let $\mathbb{Q}$ be the set of shapes unique under rotation from which the $P$ pieces are chosen. $\mathbb{Q}$ is a minimal set in that every shape in $\mathbb{Q}$ must be used to form at least one of the $P$ pieces. Let $Q$ be the number of unique shapes: $Q = \vert \mathbb{Q} \vert$. Identify each shape in $\mathbb{Q}$ with a number $s = 1, 2, 3, \ldots, Q$. The algorithm solveFila
maintains an ordered set $S$ (e.g., an array) holding these shape numbers. Initialize $S$ by arbitrarily loading these numbers in order: $S_1 = 1, S_2 = 2, \ldots S_Q = Q$. Define $N_s$ to be the number of pieces having shape $s$. Initially $N_s > 0$ for all $s$, but each time a piece of shape $s$ is placed, the value of $N_s$ will be decremented, and when a piece of shape $s$ is removed, the value of $N_s$ will be incremented. So during algorithm execution, $N_s$ represents the number of pieces of shape $s$ that have yet to be placed. Let $V$ (volume) be the number of cells in the puzzle region $R$ that must be filled, and denote the cells themselves $c_0, c_1, c_2, \ldots c_{V1}$. Although it's probably inappropriate for pseudo code, we'll assume that the occupancy state of the puzzle is modeled as a bitfield $o$, where bit $v$ of $o$ is one if and only if $c_v$ is occupied. The list $O$ is used as a stack of images currently placed in the puzzle and is used only for producing output when solutions are found.
solveFila
invokes the function selectFila
which returns a set $I$ of lists of (references to) images to be attempted to be placed in the puzzle. There is one image list $I_s \in I$ for each shape $s$. In general all lists in $I$ could contain images, but only the images in lists $I_s$ for which at least one piece of shape $s$ remains to be placed should be attempted to be placed in the puzzle. All image list sets are precalculated, but many such sets exist. The process by which an image list set is chosen by selectFila
, and the exact content of each set are detailed in subsequent sections. Each image $i$ in $I_s$ has a layout field $L[i]$ which is itself a bitfield. Bit $v$ of $L[i]$ is set if and only if image $i$ occupies cell $c_v$.
solveFila
takes three arguments: $p$ is the number of remaining puzzle pieces; $q$ is the number of remaining shapes; and $o$ is the current occupancy state of the puzzle region $R$. So to start things off, you invoke solveFila
with parameters $p=P$, $q=Q$, and $o = 0$. Below, I use the notation $x \land y$ to represent the bitwise and of bit fields $x$ and $y$, and $x \lor y$ to represent the bitwise or of $x$ and $y$.
1.
solveFila
$(p, q, o)$2.
If $p = 0$ process the solution and return.3.
Set $I \leftarrow$selectFila
$(p, q, o)$.4.
For each $j \leftarrow 1, 2, 3, \ldots q$,5.
set $s \leftarrow S_j$;6.
set $N_s \leftarrow N_s  1$;7.
if $N_s = 0$,8.
swap$(S_j, S_q)$,9.
set $q \leftarrow q  1$;10.
for each $i$ in $I_s$,11.
if $o \land L[i] = 0$,12.
set $O_p \leftarrow i$;13.
solveFila
$(p1, q, o \lor L[i])$;14.
if $N_s = 0$,15.
set $q \leftarrow q + 1$;16.
swap$(S_j, S_q)$;17.
set $N_s \leftarrow N_s + 1$.
So unlike Fletcher and de Bruijn's algorithms, FILA keeps track of which shapes still have unused pieces, and only considers placing images of those shapes. This information is maintained by lines 49 of the algorithm, and perhaps deserves some explanation. Line 4 iterates over the numbers $j$ from 1 to the number of remaining shapes $q$. Note that $j$ is not a shape number, but just a sequence number. The numbers of the available shapes are stored in the ordered list $S$ which serves as a warehouse of available shape numbers. The available shape numbers are kept in the first $q$ positions of $S$, so $s = S_j$ is an available shape number for all iterated $j$ values. While shape $s$ is under consideration (starting at line 5), the number of copies of that shape, $N_s$, is decremented (line 6). If that counter hits zero (line 7), then no more copies of that shape are available. In that case, the values of $S_j$ and $S_q$ are swapped (line 8) so that shape number $s$ is listed as the last available shape in $S$. Then the number of available shapes $q$ is decremented (line 9), so that subsequent recursive calls to solveFila
will no longer see shape $s$ in the now smaller window into the warehouse $S$. After all images of shape $s$ have been tested for fit, and a recursive search for solutions has been performed for each image that does fit (lines 1013), the shape bookkeeping operations (performed in lines 69) are undone (lines 1417) to restore shaperelated data to it's previous state, and the next sequence number $j$ is processed (starting again at line 4).
Puzzle boundary filtering, and POF and NOF filtering (explained in the next sections), can be so effective that it is not uncommon for an Image list $I_s$ to be empty. For this reason, a small overall performance benefit can be had by inserting a check immediately after line 5 to see if $I_s$ is empty, and if so, skip immediately to the next $j$ value, bypassing the shape bookkeeping updates, the pointless loop over the empty image list, and the subsequent undo of the shape bookkeeping.
The function selectFila
returns the set of image lists $I$ that should be tried by solveFila
. The implementation will vary depending on the desired behavior of the ordering heuristic. I will give here three example implementations.
The first works well for any fixedorder ordering heuristic, wherein the heuristic keeps an array, $C$, of (references to) all the puzzle cells in a particular (fixed) order, and always picks the first unoccupied cell from this list as the fill target. Through an appropriate ordering of the cells in $C$, any fixed order heuristic can be realized. For example, by ordering the cells so as to minimize coordinates in $x$, $y$, $z$ priority order, Fletcher's lefttoright heuristic is produced. By sorting cells to maximize the quantity $x^2+y^2+z^2$, cells are filled radially from the outside towards the puzzle center. Assume the index into $C$ is zerobased: $C_0$, $C_1$, $\ldots$ $C_{V1}$. Let each cell $c_v$ have a bit field $B[c_v]$ with only bit $v$ set, so that $c_v$ is occupied if and only if $B[c_v] \land o \ne 0$. As a performance optimization, this implementation of selectFila
maintains a stack $M$, of the indices into $C$ of previously selected cells so that subsequent calls to selectFila
don't have to start searching from the beginning of $C$ for the next unoccupied cell. $M_{P+1}$ is initialized to 1 to ensure the first invocation of selectFila
starts its search for an empty cell at position $0$ in $C$.
selectFila
is invoked with the same three arguments as solveFila
: the number of remaining pieces $p$, the number of remaining shapes $q$, and the current puzzle occupancy state $o$. The function getImageListSet
returns the appropriate set of image lists $I$ for the selected fill target $C_m$ and is detailed in the next section.
selectFila
$(p, q, o)$ { Set $m \leftarrow M_{p+1} + 1$. While $o \land B[C_m] \ne 0$, set $m \leftarrow m + 1$. Set $M_p \leftarrow m$. ReturngetImageListSet
$(C_m, o)$; }
selectFila
for F Heuristic
For our second example, first recall that the puzzle cells $c$ are themselves numbered, $c_0$, $c_1$, $\ldots$, $c_{V1}$. This numbering defines their bit position in the occupancy state variable $o$. If this ordering happens to be that of a desirable fixed order heuristic, you can use that natural order directly with no need for the list $C$. In my solver, I number my cells according to their numerical coordinate positions with the $x$ coordinate taking precedence over $y$, and $y$ taking precedence over $z$. But this ordering is exactly the lefttoright fill order used by Fletcher and de Bruijn's algorithms. I call this heuristic that just picks the first open cell the "F" heuristic. (Or you can think about the F standing for Fletcher if you want.) The F heuristic in my solver overrides the default selectFila
implementation used by all other fixedorder heuristics with a simpler (and faster) implementation that takes advantage of the natural cell ordering. Abstractly, it looks like this:
selectFila
$(p, q, o)$ { Set $v \leftarrow $lowestSetBit
$(\lnot o)$. ReturngetImageListSet
$(c_v, o)$. }
The operation $\lnot o$ is the binary negation of $o$ (to produce the bitfield representing the holes in the puzzles), and lowestSetBit(o)
returns the number of the lowest bit in $o$ that's set (which most modern processors implement in silicon).
selectFila
for E HeuristicAs a third example, consider a heuristic that picks a cell estimated to be hardest to fill by first identifying all cells that have a minimum number of open neighbor cells, and then picking the cell among that set at which a minimum number of images fit (by explicitly counting the number of fits). So it acts sort of like a poor man's S heuristic most often used by DLX. Give each cell $c$ an additional field $N[c]$ which is a bit field with up to six bits set that identify the occupancy bits of the adjacent neighbors of $c$ in the six ordinal directions: +x, +y, +z, x, y, and z. If one or more of these six neighbors are nonexistent (because $c$ is at the perimeter of the puzzle, and/or because the puzzle is only twodimensional), then $N[c]$ will have fewer than six bits set. Then the number of open neighbor cells of $c$ may be found by counting the number of bits set in the quantity $N[c] \land \lnot o$. The algorithm below starts by loading the cells with a minimum number of open neighbors into the set $C$, then iterating over all cells in $C$ and using a fit counting helper function to find a cell for which a minimum number of images fit.
selectFila
$(p, q, o)$ Set $h \leftarrow \lnot o$. Set $n_{min} \leftarrow \infty$. Set $C \leftarrow \emptyset$. For each bit number $v$ set in $h$, set $n \leftarrow $countBits
$(N[c_v] \land h)$; if $n \le n_{min}$, if $n < n_{min}$, set $n_{min} \leftarrow n$; set $C \leftarrow \emptyset$; add $c_v \rightarrow C$. Set $f_{min} \leftarrow \infty$ For each $c$ in $C$, set $I \leftarrow $getImageListSet
$(c, o)$; set $f \leftarrow $countFits
$(I, q, o, f_{min})$; if $f < f_{min}$, set $f_{min} \leftarrow f$, set $I_{min} \leftarrow I$. Return $I_{min}$.countFits
$(I, q, o, f_{max})$ Set $f \leftarrow 0$. For each $j \leftarrow 1, 2, 3, \ldots q$, set $s \leftarrow S_j$; for each $i$ in $I_s$, if $o \land L[i] = 0$, set $f \leftarrow f + 1$; if $f \ge f_{max}$, return $f$. Return $f$.
countBits
$(x)$ returns the number of bits set in bit field $x$ (which is another operation that most modern processors implement in silicon.) Also note that countFits
only counts image fits up to a supplied maximum. (Since we are only looking for the cell with the minimum fits, counting beyond the minimum found so far is unnecessary.)
The function getImageListSet(c, o)
returns an image list set $I$ (i.e., a set of lists of images). List $I_s$ in $I$ is a list of all images of shape $s$ that cover $c$ with the following two restrictions:
The first property is guaranteed by NOF. The second is guaranteed by POF.
Each ordering heuristic holds a two dimensional array, $A$, of image list sets. Each entry $A_{c,z}$ is an image list set composed specifically for cell $c$ and for the occupancy state of adjacent neighbors encoded in the index variable $z$. The number $z$ is called an image list set index (ILSI). Without explaining how an ILSI is calculated, getImageListSet
looks (roughly) like this:
getImageListSet
$(c, o)$ Set $z \leftarrow$getIlsi
$(c, o)$. return $A_{c, z}$
So you simply calculate an ILSI $z$ for cell $c$, and then return the $z$^{th} image list set for cell $c$ from the matrix $A$. I'm glossing over one detail here: each cell can use a different (optimized) getIlsi
function. An updated (real) version of getImageListSet
is given below after I've explained how the ILSI $z$ is calculated and by implication, how the associated image list set is defined.
As a group the six adjacent neighbors of some cell $c$ can take on $2^6 = 64$ different compound states, and each of these states will have associated with it a different image list set. (Note that some neighbor occupancy states for some cells cannot be entered, and the associated image list sets for these states need not be populated.) We want to extract from the puzzle occupancy state $o$ just the six bits that represent the occupancy state of the six adjacent neighbors of $c$. Then we'll take those bits and repack them into a new bit field that is just 6 bits long. This sixbit bit field is our ILSI.
The ILSI is constructed as follows. The highest order bit of an ILSI (bit 5) is always loaded with the bit representing the occupancy state of the adjacent neighbor in the $x$ direction relative to cell $c$. Similarly bits 4, 3, 2, 1, and 0 respectively are loaded with the bit representing the occupancy state of the adjacent neighbors in the $y$, $z$, $+x$, $+y$, and $+z$ directions. These assignments of neighbors to bitpositions in the ILSI are arbitrary, but must be consistent. If one or more of the six neighbors of $c$ are outside the puzzle bounds (and therefore are not represented by any bit in $o$), then a 1 is loaded into the corresponding bits in the ILSI (so that a zero consistently identifies an open neighbor).
This resulting sixbit bit field is then interpreted as an integer between 0 and 63, which is in turn used as the second index into the matrix $A$ to retrieve an image list set that contains all puzzle images that fill cell $c$, but do not conflict with the occupied neighbors of $c$ and (if POF filtering is possible) do not conflict with cells that must have been filled prior to the selection of $c$ as a target. This process of identifying the occupancy states of neighbors and then returning an image list set from which all images that conflict with occupied neighbors was filtered is what I mean by NOF.
Figure 3 graphically depicts this process for a 10x6 pentomino puzzle that is in the process of being solved using the F heuristic. The cells are numbered from 0 to 59. These cell numbers identify the position of the bit in the occupancy state $o$ that indicate whether the cell is filled. Knowing that the F heuristic picks open cells in order, we know that cell 0 was targeted first, then cell 3, then cell 6, and then cell 10. The next hole is cell 14 which is our current target. The adjacent neighbors of cell 14 are cells 8, 13, 15 and 20. These bits are extracted from $o$, and loaded into their preassigned bit positions in the ILSI. Since this is a twodimensional puzzle, the bits in the ILSI corresponding to the neighbors in the $z$ and $+z$ directions are each loaded with a 1. The resulting ILSI bit field has the value $111101$ which has a decimal value of 61, so the 61^{st} image list set for cell 14 will be returned.
Bit Number  5 4 
4 8 
4 2 
3 6 
3 0 
2 4 
1 8 
1 2 
6 
0 

Occupancy State  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  1  1  1  1  1  0  0  1  1  1  1  1  1  1  1  1  1  1  1  1  1 
Bit Number  5  4  3  2  1  0 
Neighbor Direction  $x$  $y$  $z$  $+x$  $+y$  $+z$ 
Neighbor Location  $8$  $13$  $$  $20$  $15$  $$ 
ILSI  1  1  1  1  0  1 
To do this algorithmically, we'll start by defining some additional fields for each cell $c$. Recall that $B[c]$ is a bit field with a single bit set in the same position as $c$'s occupancy bit in $o$. Define $N_{x^}[c]$ to be the bit field $B$ of the neighbor adjacent to $c$ in the $x$ direction, or a zeroed bit field if no such neighbor exists. Similarly define $N_{y^}[c]$, $N_{z^}[c]$, $N_{x^+}[c]$, $N_{y^+}[c]$, and $N_{z^+}[c]$ to be the $B$ field of the adjacent neighbors of $c$ in the $y$, $z$, $+x$, $+y$, and $+z$ directions respectively. With these definitions, we can now write getIlsi(c, o)
. Although there are far more concise ways to write getIlsi
, I have found an obnoxious set of nested if statements six levels deep with hardcoded integer return values to be far faster than several other approaches I've tried. Here is a portion of one way to implement getIlsi
that is quite fast:
getIlsi
$(c, o)$
Set $h \leftarrow \lnot o$.
If $N_{x^}[c] \land h \ne 0$,
if $N_{y^}[c] \land h \ne 0$,
if $N_{z^}[c] \land h \ne 0$,
if $N_{x^+}[c] \land h \ne 0$,
if $N_{y^+}[c] \land h \ne 0$,
if $N_{z^+}[c] \land h \ne 0$,
return 0;
else
return 1;
else
if $N_{z^+}[c] \land h \ne 0$,
return 2;
else
return 3;
else
if $N_{y^+}[c] \land h \ne 0$,
if $N_{z^+}[c] \land h \ne 0$,
return 4;
else
return 5;
else
if $N_{z^+}[c] \land h \ne 0$,
return 6;
else
return 7;
else
$\ldots$
else
return 63.
Notice that the first thing I do in this version of getIlsi
is to negate the occupancy state to produce a new bit field $h$ that contains a 1 for each hole in the puzzle. This is done because an operation like $N_{z^+}[c] \land o$ will produce a zero result if either the neighbor is empty or nonexistent. This behavior is not well suited for producing ILSI since we want bit positions corresponding to nonexistent neighbors to be loaded with a 1 and bit positions corresponding to empty neighbors to be loaded with a 0. To avoid this ambiguity, getIlsi(c, o)
instead works with the puzzle holes $h$. Then $N_{z^+}[c] \land h$ will produce a nonzero result if and only if the neighbor is unoccupied.
So the above implementation of getIlsi
works fine, but the approach raises the question, “Why are we even checking the occupancy states of neighbors that don't exist?” For fixedorder heuristics, there will also be neighbors that are guaranteed to be occupied. Checking the occupancy states of those neighbors is equally wasteful.
In for a penny, in for a pound! My solver actually has not 1, but 64 different getIlsi
methods (which I wrote with a little code generator I hacked out). They are named, getIlsi00
, getIlsi01
, $\ldots$, getIlsi63
. The number in the name of each function, when interpreted as an ILSI, convey the ILSI bits that the function assumes are known to be set — and so it won't check the occupancy of the neighbors that correspond to those bit positions and simply return an ILSI with those same bits always set. These functions can be accessed through an array $G$ indexed by ILSI, so that, for example, $G_{34} =$ getIlsi34
. Then, as part of initialization, each heuristic, for each puzzle cell $c$, determines the ILSI bits which must be set for the cell and composes an ILSI mask $m_c$ with these bits set and all other bits clear. Then a list $Z$ is defined for the heuristic which associates each cell with its appropriate getIlsiXX
method: $Z_c = G_{m_c}$. We can now update our previous implementation of getImageListSet
to use an optimized getIlsi
method:
getImageListSet
$(c, o)$
Set $z \leftarrow Z_c(c, o)$.
return $A_{c, z}$
So instead of invoking a generalpurpose getIlsi
method, the particular getIlsiXX
method best suited for cell $c$ (referenced through the list entry $Z_c$) is invoked.
For example, for the problem depicted in Figure 3, cell 0 is associated with getIlsi57
since only the two neighbors in the +x, and +y direction (which corresponds to bits 1 and 2 of the ILSI) can possibly be open. That method's implementation looks like this:
getIlsi57
$(c, o)$
If $N_{x^+}[c] \land o \ne 0$,
if $N_{y^+}[c] \land o \ne 0$,
return 63;
else
return 61;
else
if $N_{y^+}[c] \land o \ne 0$,
return 59;
else
return 57.
Because these specialized functions never even look at nonexistent neighbors, there's no need to negate the occupancy state as was done for the generalpurpose getIlsi
. Since the F heuristic is being used to solve that puzzle (which guarantees that cells are targeted in their numbered order), it is always the case that cells to the left and below the target are filled. So all cells in this puzzle except those on the top row and the rightmost column would be associated with getIlsi57
.
Cells 5, 11, 17, …, 53, which can only possibly have an open neighbor in the +x direction, are bound to getIlsi59
:
getIlsi59
$(c, o)$
If $N_{x^+}[c] \land o \ne 0$,
return 63;
else
return 59.
Cells 54, 55, 56, 57 and 58 can only have an open neighbor in the +y direction and are bound to getIlsi61:
getIlsi61
$(c, o)$
If $N_{y^+}[c] \land o \ne 0$,
return 63;
else
return 61.
And cell 59, which can have no open neighbors, is bound to getIlsi63:
getIlsi63
$(c, o)$
Return 63.
I know this all seems daft (see my web site name), but in my testing, these specialized getIlsiXX
methods improved overall run times for some 2D puzzles by about 10% compared to a getIlsi method written in a single line that just checks all 6 neighbors and sums (or binary or's) the corresponding bit values.
So that pretty much completes the algorithm description, but I still haven't detailed exactly what I mean by priority occupancy filtering (POF). POF doesn't affect the solver algorithm at all, but it does affect how the image list sets are composed as explained in the next section.
Consider again the partially solved puzzle shown in Figure 3. Recall that the ILSI for target cell 14 is 61. So image list set $A_{14,61}$ would be returned from getImageListSet(c, o)
. Exactly what images are in that set? If only NOF is applied, the answer is any bounded image that covers cell 14 but does not cover cells 8, 13, or 20 as shown in Figure 4. For ordering heuristics that target cells in an unpredictable order (like heuristics e and s in my solver), this is a complete definition of $A$.
But because in this example we're using the F heuristic (which always target cells in their numbered order), we also know that in order for cell 14 to be targeted, all cells with a smaller number must also be occupied as shown in Figure 5:
So we can also filter from list $A_{14,61}$ all images that conflict with the black cells above. This is priority occupancy filtering: excluding from all lists $A_{c,z}$ (for all $0 \le z < 63$) any image that conflicts with cells that must have been filled before $c$ is selected as a target by a fixedorder heuristic. Understand that in this example, the cells that must be previously filled are those cells with a lower number than the target, but that's only because this example uses the F heuristic which targets the lowest numbered hole. In general the order that cells are filled by a fixedorder heuristic can vary, but POF will work with any fixed order heuristic to eliminate all images that conflict with any set of cells that must have been previously filled by that heuristic.
Combining the occupancies in Figures 4 and 5, produces the occupancy map shown in Figure 6:
And so through the combined application of NOF and POF filtering, image list set $A_{14,61}$ is loaded only with those images that cover cell 14 but avoid all of the black cells in Figure 6.
If you want to know more about how to algorithmically setup these image list sets, take a look at my source code for OrderingHeuristicStore::loadImages(), OrderingHeuristic::initNeighborOccupancy(), OrderingEntity::loadImages(), and OrderingEntity::initPriorityOccupancy().
We'll start by taking a macro view of the algorithms comparing the overall performance characteristics of FILA both with and without NOF enabled relative to and in coordination with other good puzzle solving tactics. Then we'll take a micro view to better understand the effects of NOF filtering on a pertargetcell basis. Finally I'll make some brief statements comparing the performance of this new polycube version 2.0 to the previously available polycube version 1.2.1.
Figure 7 shows four puzzles used to analyze the performance of polycube 2.0 and FILA. Table 1 shows the results of several test cases run on each of these puzzles. Each series of tests starts with straight DLX using Knuth's S heuristic (which picks the cell or piece target that have fewest fit options), and with the r option enabled to eliminate rotationally redundant solutions. (Some of these puzzles take an annoyingly long time to run without that optimization. And who wants rotationally redundant solutions anyway?) Each successive test in a test group adds one additional feature or optimization so you can see the incremental effect of each. The key below the table explains everything. A discussion of the test case results follows.




Test Case 
Command Line  Fits  $\Delta$ %  NoFits  $\Delta$ %  Run Time (hh:mm:ss) 
$\Delta$ %  Solutions  

P1  ./polycube i q r def/pentominoes_10x6.txt  892,247    0    00:00:02.082    2339  
P2  ./polycube i q r V def/pentominoes_10x6.txt  768,356  13.9%  0  0.0%  00:00:01.754  15.8%  2339  
P3  ./polycube i q r V of=11 def/pentominoes_10x6.txt  1,000,250  +30.2%  0  0.0%  00:00:02.050  +16.9%  2339  
P4  ./polycube i q r V of=11 f11 def/pentominoes_10x6.txt  2,091,215  +109.1%  13,106,789  +$\infty$%  00:00:00.168  91.8%  2339  
P5  ./polycube i q r V of=11 f11 n def/pentominoes_10x6.txt  2,091,215  0.0%  4,682,886  64.3%  00:00:00.157  6.3%  2339  
OP1  ./polycube128 i q r def/pentominoes_1s_18x5.txt  1,816,931,170    0    01:14:18.016    686,628  
OP2  ./polycube128 i q r V def/pentominoes_1s_18x5.txt  1,771,195,065  2.5%  0  0.0%  01:11:25.145  3.9%  686,628  
OP3  ./polycube128 i q r V of=17 f def/pentominoes_1s_18x5.txt  13,151,493,569  +642.5%  83,733,447,441  +$\infty$%  00:23:07.672  67.6%  686,628  
OP4  ./polycube128 i q r V of=17 f17 n def/pentominoes_1s_18x5.txt  13,151,493,569  0.0%  25,422,589,384  69.6%  00:19:12.259  17.0%  686,628  
TC1  ./polycube i q rL def/tetriscube.txt  30,255,329    0    00:01:52.951    9839  
TC2  ./polycube i q rL f11 oe=11 def/tetriscube.txt  48,705,459  +61.0%  1,093,916,558  +$\infty$%  00:00:22.206  80.3%  9839  
TC3  ./polycube i q rL f11 oe=11:f=3 def/tetriscube.txt  80,346,268  +65.0%  1,526,897,959  +39.6%  00:00:19.945  10.2%  9839  
TC4  ./polycube i q rL f11 oe=11:f=3 n def/tetriscube.txt  80,346,268  0.0%  393,143,352  74.3%  00:00:17.007  14.7%  9839  
PT1  ./polycube i q r def/PT12.txt  207,341,751    0    00:10:45.529    51,184  
PT2  ./polycube i q r V13 def/PT12.txt  78,145,746  62.3%  0  0.0%  00:03:02.362  71.7%  51,184  
PT3  ./polycube i q r V13 f13 oe=13 def/PT12.txt  153,069,413  +95.9%  1,094,305,862  +$\infty$%  00:01:10.218  61.5%  51,184  
PT4  ./polycube i q r V13 f13 oe=13:f=3 def/PT12.txt  185,469,244  +21.2%  1,203,943,050  +10.0%  00:01:07.707  3.6%  51,184  
PT5  ./polycube i q r V13 f13 oe=13:f=3 n def/PT12.txt  185,469,244  0.0%  291,191,337  75.8%  00:01:00.625  10.5%  51,184 
KEY  

Test Case  P  Pentomino 10x6  All test cases were run on a Intel(R) Core(TM) i34130T CPU @ 2.90GHz running Unbutu Linux using only one thread on one processor). 
OP  OneSided Pentomino 18x5  
TC  Tetris Cube  
PT  Tetromino+Pentomino 13x13 Diamond  
Command Line  This is the command line you can use to reproduce the test. Two different builds of polycube were used: polycube uses a 64 bit occupancy bitfield. polycube128 was built with the preprocessor definition DGRIDBITFIELD_SIZE=128 to produce a 128 bit occupancy field which slows FILA, but allows it to be activated earlier in the puzzle search process. The command line options used are summarized below. Additional details of these and other command line options can be found by running polycube with the help option, or reading README.txt. 

i  info: Turns on informational output including statistics and performance measurements.  
q  quiet: Turns off solution output (so as not to impact performance measurements).  
r  redundancyFilter: Attempts to eliminate rotationally redundant solutions by constraining the position and/or rotation of one uniquely shaped puzzle piece. If no argument is given, a piece is chosen for you; or you can supply the name of a piece to attempt to pick a better piece yourself.  
V  volumeFilter: With no arguments (as used here), before the search for puzzle solutions begins, every bounded image is considered to see if placing it will partition the puzzle region into two or more isolated subregions with at least one of those subregions having a volume that cannot be matched by any subset of the remaining pieces. Each such image found is filtered out (removed).  
f<N>  fila: Activates FILA every time N pieces remain to be placed. All solves start with DLX. Each time the number of remaining pieces hits the number N, a FILA data model of the remaining open space and remaining pieces (as modeled by the DLX matrix) is constructed, DLX is deactivated, and FILA is activated. When FILA has completed exploration of this subpuzzle, DLX continues where it left off.  
o<H>  order: Sets a colon separated list of ordering heuristic configurations, H. For example, e=11:f=5 activates the estimatedmostconstrainedhole heuristic when 11 pieces are left; and activates the firsthole heuristic when 5 pieces are left.  
n  nof: Enables neighbor occupancy filtering (NOF).  
No Fits  The number of times an algorithm attempts to place a piece in the puzzle only to find it doesn't fit.  
Fits  The number of times an algorithm successfully places a piece.  
Run Time  The run time of the program in hours minutes and seconds. This is the total program run time including program load, puzzle parsing, puzzle and solver initialization, the solve itself, and all cleanup time. This detail is not really important since in all cases the solve took more than 99% of the run time.  
Solutions  The total number of rotationally unique solutions found.  
$\Delta$ %  The incremental percent change of the statistic to the left from the previous row to the current row. 
The first set of test cases (P) operate on the 10x6 pentomino puzzle. Test case P1 uses DLX with the (default) S heuristic enabled and the the rotational redundancy filter enabled. Of the $63 \times 60 = 3780$ piece images that could possibly be placed anywhere in the puzzle, only 2056 are bounded to the 10x6 puzzle region, and so the DLX matrix begins with 2056 rows. The rotational redundancy filter selects a uniquely shaped piece (if available) to rotationally and/or translationally constrain to prevent rotationally redundant solutions from ever being discovered and (as a beneficial sideaffect) to significantly reduce program run times. In this case it chooses to constrain piece X. Originally, piece X has 32 images that fit in the puzzle. After constraint, only 8 images positioned in the lowerleft quadrant of the puzzle remain (reducing the total number of rows in the matrix to 2032). This is identical to how Fletcher and de Bruijn started their algorithms, save that they placed the X piece in the topleft quadrant, and excluded the image of the X piece jammed into the corner (which obviously can lead to no solutions.) Because the X piece now has so few fit options it becomes the first target of DLX, so the algorithm begins by placing one of these 8 images as the first step — again just like Fletcher and de Bruijn. With this configuration, DLX finds all 2339 solutions in 2.082 seconds.
Test case P2 adds a onetime application of the volume constraint filter to all images as a preliminary step of solver processing. This filter examines the placement of each image in the DLX matrix (one at a time) to determine if it results in a partitioning of the puzzle into two or more subregions where at least one subregion has a volume that cannot possibly be equaled by any combination of the remaining pieces. If so, that image is discarded from the DLX matrix. For this puzzle, this eliminates 125 of the 2032 bounded images or about 6.2%. Among these is one of the remaining images of the X piece that was jammed into the lowerleft corner of the puzzle, reducing the number of images of piece X to 7 (which completes the replication of the starting conditions used by Fletcher and de Bruijn). This filtering took 1.3 msec, of processing, but reduced the total run time by 336 msec, or by 15.8% — a good investment.
Test case P3 disables the default DLX S heuristic (which always picks a column from the DLX matrix with a minimum number of entries), and enables the F heuristic. My F heuristic when applied to DLX is identical to Fletcher's F heuristic except that (like all of my DLX heuristics) have an overriding behavior of always selecting a DLX matrix column with zero or one 1s over any other target normally selected by the heuristic. This increased the run time back up to 2.050 seconds. This was a bad idea for this puzzle: sometimes DLX performance can be improved with an ordered fill (like that enforced by the F heuristic), but not for such a small puzzle. The motivation for this test case was to allow an applestoapples comparison between DLX and FILA with test case P4.
Test case P4 enables FILA each time 11 pieces are remaining. (So I still use DLX to first place the X piece in one of 7 positions in the lower left quadrant, but then FILA is used to place the remaining 11 pieces. Currently, no available FILA heuristic can select a piece as a target — only cells are selected, so I still always use DLX to place at least one piece when using the rotational redundancy filter.) FILA runs about 12 times faster than DLX using the F heuristic finding all 2339 solutions in just 0.168 seconds. This is despite the fact that I let DLX cheat by picking columns of size 0 (which leads to an immediate backtrack) or size 1 (where there is but one fill choice) over the first cell normally picked by the F heuristic.
In test case P5, I enabled NOF filtering (POF cannot be turned off). First notice that the number of images that failed to fit in the puzzle was reduced by almost twothirds (64.3%), from 13.1 million down to just 4.7 million. The elimination of these 8.4 million useless fit checks saved an additional 11 msec of processing time bringing the total run time down by 6.3% to just 0.157 seconds.
The second set of test cases (OP) examines the problem of placing the 18 onesided pentominoes in an 18x5 box as shown in Figure 7. The set of onesided pentominoes are the set of pentominoes unique under rotation in the plane but not reflection. For these tests I compiled the solver so that FILA uses a 128bit occupancy bit field. This is obviously a little slower, but does allow FILA to be used for the entire puzzle. (With the default 64 bit occupancy bit field, only the last 12 pieces could be placed with FILA. This is probably all you really need since for these types of puzzles, the vast majority of the work is typically done placing the last several pieces, and so it only really matters that FILA be active for these last pieces. But for these test cases, to keep things clear and simple, I wanted to show how FILA performs on the whole puzzle.)
Test case OP1 again uses DLX with the rotational redundancy filter to find all 686,628 solutions in 1 hour 14 minutes 18 seconds.
In test case OP2, the volume filter is added to reduce the run time by 3.9% down to 1 hour 11 minutes 25 seconds.
In test case OP3, FILA was activated using the F heuristic when 17 pieces remain, reducing the run time by 68% down to just 23 minutes 8 seconds.
In test case OP4, NOF was enabled which reduced the number of nonfitting images by 70% and the overall run time by an additional 17% down to 19 minutes 12 seconds.
The third set of test cases examines the Tetris Cube puzzle. This puzzle has 12 oddly shaped pieces that must be placed in a 4x4x4 box as shown in Figure 8.
Test case TC1 starts like other test cases with DLX, the default S heuristic, and the redundancy filter enabled. Notice that in the TC test group, I supplied the argument L to the r option. This deserves some explanation. The redundancy filter with no argument given picks a uniquely shaped piece that does the best job of eliminating rotationally redundant solutions by rotationally and/or translationally constraining that piece. If multiple pieces are equally effective in this regard, then it picks a piece among those candidates that have the minimum number of constrained images. I no longer think this is the best rule to use as a tiebreaker. I now believe that instead picking a piece that is large and/or complicated may be a better choice.This makes intuitive sense — it's easier to place the large hard pieces first, and then fill in the smaller more flexible pieces around the big complicated piece, than to place the little easy pieces first and then hope you happen to form a void that the large complicated piece happens to fit nicely into. For example, people who have spent significant time playing with pentomino puzzles know that piece X is difficult to place, and hence it makes a good choice for constraint. I'm not sure how to gauge what makes a piece 'complicated', and so I have not yet tried to modify polycube's selection criteria. In any case, my autoselection routine, does not always make the best choice for the piece to rotationally and/or translationally constrain to eliminate rotationally redundant solutions. Stephan Westen discovered that piece L in the tetris cube is a much better choice, so in this example I'm passing piece L as the argument to the r option to force the redundancy filter to constrain that piece to eliminate rotationally redundant solutions. Under this configuration, the solver found all 9839 solutions in about 1 minute 58 seconds. Pieces were placed in the box (and subsequently removed) around 30 million times.
Test case TC2 enables FILA after DLX places the first piece, and also switches from the S heuristic to my E heuristic at the same time (as described above). Notice that the number of fits increases by about 60% to 49 million indicating the E heuristic does not do as good a job of picking the minimum fit target. The number of nofit images found by searchFila
also increases from 0 to over a billion, and this does not count the vastly larger number of fit checks performed by the eheuristic itself. But despite these extensive activities and degraded ability to pick the minimum fit target, use of FILA with the E heuristic yields a better than 5fold increase in solver performance, reducing the total run time by 80.3% down to just 22.2 seconds.
Test case TC3 swtiches from the E heuristic to the much lighter weight F heuristic when only 3 pieces remain. All the time spent counting neighbor holes at each remaining cell, and then doing fitcounts for those candidate cells with a minimum number of open neighbors just can't pay off when there are so few pieces left. You are better off just placing pieces as fast as you can with the light weight F heuristic. This again increases the number of image fits by 65% to 80 million, and the number of nofit images to 1.5 billion, but actually reduces the run time by another 10.2% down to 19.9 seconds.
Test case TC4 enables NOF filtering, which reduced no fit images by 74% from 1.5 billion down to just 393 million. Note that NOF not only reduces the number of fit checks made by searchFila, but also the fitchecks made by the E heuristic. This efficiency in image processing reduced run time by 14.7% to just 17.0 seconds.
Test case PT examines the problem of placing the 12 pentominoes and 5 tetrominoes into a diamond shaped puzzle measuring 13 squares wide and 13 high as shown in Figure 7. Five squares are eliminated from the center to achieve the correct volume.
Test case PT1 uses DLX, with the S heuristic, and the redundancy filter enabled. All 51,184 unique solutions are discovered in about 10 minutes 46 seconds.
Test case PT2 enables the volume filter, but instead of only applying the filter once at the beginning, the volume filter is reapplied after every piece placement until fewer than 13 pieces remain to be placed. This technique is particularly effective on this puzzle because the jagged puzzle edges and the central island make it susceptible to partition. This produced a better than 3fold improvement in solver speed, reducing the total run time by 72% down to just 3 minutes 2 seconds.
Test case PT3 enabled FILA and the estimate heuristic when 13 pieces remain, reducing the run time by another 62% down to just 1 minute 10 seconds.
Test case PT4 uses the lighter weight F heuristic for the last 3 piece placements giving an additional small performance improvement of 3.6%, reducing the total run time by another 2 seconds down to 1 minute 8 seconds.
Test case PT5 enabled NOF. No fit images were reduced by 76% (the largest percent reduction seen over all test cases examined in this document), and run time was reduced by another 10.5%, down to 1 minute 1 second.
Let's focus on just test case P4 where FILA was used with the F heuristic without NOF enabled to solve the pentominoes 10x6 problem. The informational output from that run includes the following:
# Number of placement attempts when N pieces were left to be placed: ATTEMPTS[ 1]= 301677 ATTEMPTS[ 2]= 3478035 ATTEMPTS[ 3]= 5722296 ATTEMPTS[ 4]= 3665538 ATTEMPTS[ 5]= 1284992 ATTEMPTS[ 6]= 386776 ATTEMPTS[ 7]= 200366 ATTEMPTS[ 8]= 126819 ATTEMPTS[ 9]= 28279 ATTEMPTS[10]= 3088 ATTEMPTS[11]= 131 ATTEMPTS[12]= 7 # Number of fits when N pieces were left to be placed: FITS[ 1]= 2339 FITS[ 2]= 302256 FITS[ 3]= 760374 FITS[ 4]= 617667 FITS[ 5]= 272072 FITS[ 6]= 82406 FITS[ 7]= 26950 FITS[ 8]= 17275 FITS[ 9]= 7994 FITS[10]= 1744 FITS[11]= 131 FITS[12]= 7
This output gives, as a function of the remaining number of pieces $p$, the number of times the solver tried to place a piece in the puzzle (ATTEMPTS), and how many times it actually succeeded in placing a piece in the puzzle (FITS). We can learn a lot from this information through some simple calculations. This program output is transcribed to the second and third columns of Table 2. Subtracting fits from attempts gives the nofits information in the fourth column. Note that each time a piece is successfully placed in the puzzle when, say, 5 pieces were left, produces a recursive invocation of solveFila
with the number of remaining pieces $p$ reduced by 1 to 4. So by dividing the total number of attempts when 4 pieces were left (3,665,538) by the total number of fits when 5 pieces were left (272,072), gives the average number of piece fitting attempt events per cell (or piece) targeted by a single recursive call solveFila
when $p = 4$: (3,665,538 / 272,072 = 13.473). Similarly, dividing the total number of fits or nofits when $p$ pieces are left by the total number of fits when $p+1$ pieces are left, yields the number of times a piece fit or (respectively) didn't fit per target when $p$ pieces were left. This information is tabulated in the last three columns of Table 2 as attemptspertarget, fitspertarget, and nofitspertarget.


Table 3 gives the same information as Table 2 but for test case P5 where NOF was enabled. Compare tables 2 and 3 to verify that NOF doesn't affect fits at all — rather it only reduces the number of images that don't fit in the puzzle that must be processed by each recursive invocation of solveFila
. Comparing the last column of Table 2 with the last column of Table 3, shows the level to which NOF filtering reduces the number of nofit images for each invocation of solveFila. As can be determined from column fitstotal, over 93% of solveFila
invocations are for $p$ values from 1 to 5. In this range, when NOF is enabled, the total number of images that must be considered is (in the worst case) only about 8. Of these 8, less than 5 are nofit images. So of the original 63 images that de Bruijn examined at every recursive step of his algorithm, FILA with NOF only has to look at 8, and almost half of these do actually fit. Why so few images?
Many images are eliminated on a cellbycell basis due to puzzle bounds considerations: of the $63 \times 60 = 3780$ images that one could try to place at the 60 puzzle cells, only 2056 are bounded by the puzzle walls. The rotational redundancy filter reduces the number of allowed placements of the X piece by 24 (from 32 down to 8). The volume filter eliminates another 125 images (including one of the remaining X images), reducing the total number of images down to $205624125=1907$. Recall that DLX is used to place the X piece in one of 7 starting locations in the lower left quadrant. Each such placement eliminates a large number of images (primarily from the left side of the puzzle) from the DLX matrix. For example, the last such placement (when the X piece is placed very close to the center of the puzzle) leaves the matrix with only 1101 images. These 1101 images are then used to populate the matrix of image list sets $A$ used by FILA. POF filtering would nominally keep the total number of images in each image list set to 63, but all of the other reductions to this point makes these lists on average much smaller. For the case where only 1101 DLX images remain after placing the X piece for the last time, POF reduces the average number of images per image list set over the remaining 55 holes to just $1101 / 55 \approx 20.0$. This average of 20 is not typical since, for example, the image list sets for cells near the X piece and near the right border wall will have fewer images. Likewise cells just to the right of the X will have significantly more than the average 20. When work is most intense (when 4 pieces are left) about an additional 8/12 (67%) of these images are discounted simply because 8 pieces are unavailable (and so their images are never attempted). Finally, NOF filtering reduces the nofit images in the image list sets by (on average) another 64.3% (as seen from test case P5 in Table 1) to produce the overall sizes seen here in Table 2.
After all this reduction, remember that each of the few remaining nofit images that still have to be considered, are ruled out by a single machine instruction that performs a binaryand of the puzzle occupancy state with the layoutbitmask of the image (line 11 of solveFila
). I am skeptical, therefore, that more specialized NOF image lists based on the occupancy states of additional neighbors near the target cell, could possibly reduce nofit images in sufficient number that the resulting reduction in image fitcheck processing time could outweigh the increased processing time needed to calculate the more detailed ILSI. And this does not even consider the increased initialization times to generate the larger number of image list sets in $A$ (which grows by a factor of 2 for each additional neighbor considered).
Tables 4 and 5 provide the same piecebypiece and pertarget statistics for test cases TC3 and TC4 of the Tetris Cube. I won't bore you with as much detail here, but I wanted to impress upon you the usefulness of NOF when used in combination with the E heuristic and in higher dimensional puzzles (3D instead of 2D). Compare, for example, the number of nofitspertarget when 4 pieces remain from tables 4 and 5: by enabling NOF, the number of nofit images is reduced from 27.7 all the way down to 3.2 — a reduction factor of 8.6. There are two reasons for this this large reduction. First, the E heuristic is not an ordered heuristic, so no POF filtering is possible. Where for the pentominoes puzzle, there are only 63 images to choose from to populate the image list sets; the lack of POF filtering, and the increased rotational freedom results in 1416 Tetris Cube piece images that could possibly populate each image list set at a cell. Again, puzzle boundary considerations, the rotational redundancy filter, and the placement of the first piece by DLX, will drastically reduce the numbers of available images by the time FILA is actually activated, but we are still left with much larger image list sets. Because the E heuristic always targets a cell with a maximum number of occupied (or nonexistent) neighbors, it naturally targets cells that produce ILSI with many bits set, for which NOF filtering is most effective. Because no POF filtering is possible, all filtering is due to NOF — which makes NOF just all that more useful for heuristics that don't follow a fixed targeting order.


Although NOF does seem to consistently provide a significant performance improvement, there were other software implementation changes that provided even greater performance benefits. Most significantly, the old EMCH algorithm counted open neighbors one neighbor at a time. FILA's new E heuristic uses either a silicon based bit population count instruction (if available) or tablebased lookups to count the number of neighbor holes at each cell which is far faster. Similarly, my old variation of the de Bruijn algorithm iterated over the heads of the DLX matrix to find unoccupied cells. It did remember where it last left off (so it wasn't starting from the beginning with each request), but FILA's new F heuristic more efficiently iterates over the occupancy bit field looking for zeroes. It can use silicon based instructions (if available) or table lookups to do this efficiently. These and other small optimizations together with NOF have improved the solve times for some puzzles by almost a factor of two. For example, the best solve time I can produce with polycube version 1.2.1 for the Tetris Cube is 31.6 seconds. Version 2.0 solves the same puzzle on the same machine in just 17.0 seconds (as noted above), a reduction in run time of 46%. 2D puzzle performance (for which the E heuristic is not typically most useful), has improved by a lesser, but still significant amount. For example the best solve times for the pentominoes 10x6 puzzle have improved on my machine from about 0.212 seconds down to 0.156 seconds — a 26% improvement.
There's nothing in the FILAheuristic interface that precludes a heuristic from targeting a piece (rather than a cell) and returning an image list set that lists only images for a single target piece. Unfortunately, if a heuristic is targeting pieces, then it cannot be a fixedorder heuristic, which means you can't use POF to gradually reduce the size of the image list sets that target pieces as the puzzle is filled in. And NOF filtering alone would be almost completely useless for reducing the size of image list sets that target pieces. So I don't immediately see a way to, for example, extend the estimate heuristic to efficiently identify pieces that have few fits and/or identify a precalculated image list targeting such a piece that is well filtered to the current puzzle state. Still, there may be times that targeting pieces could be useful, even if it requires much work to identify the target (e.g., fitcounting across all images of a piece), and even if the returned image list set is not well filtered (e.g., just return all images of the piece with no filtering at all). Such an approach might still perform favorably compared to DLX within some limited range of puzzle sizes.
Also, for the special case where the puzzle solve is just getting started ($p = P$), every image in all image list sets are guaranteed to fit. So the S and E heuristics could easily be modified to notice that $p = P$ and instead of counting fits or neighbor holes, just look at the list sizes. Image list sets that target pieces could be included specifically for this situation, so that FILA could target, for example, a piece that's been highly constrained to eliminate rotationally redundant solutions. This would enable FILA to fully emulate de Bruijn and Fletcher for the 10x6 pentomino problem by identifying that the X piece should be placed first. It is not clear to me, however, that there would be any advantage to this approach over my current approach of always using DLX to place the first piece of a puzzle (other than the elimination of DLX itself — which is obviously no small simplification.) In fact, it is my expectation that such an approach would be inferior to the current approach of using the combination of algorithms: In the 10x6 pentomino problem, placing the X piece smack in the middle of the puzzle causes DLX to eliminate many images. Using this reduced image set as a feed for the initialization of the fixed image list sets used by the FILA F heuristic is highly advantageous and this is not a behavior that a pure FILA approach to the problem can readily replicate.
For now, I leave the subject of defining FILA ordering heuristics that can target pieces as a problem for future investigation.
I have not definitively answered the question of whether FILA is faster than Fletcher's algorithm for the 10x6 pentomino problem. I have not even taken the time to translate Fletcher's original program to a modern programming language. But even if I did, a comparison between that program and polycube wouldn't really be a fair comparison of Fletcher's algorithm and FILA: Fletcher's program is hard coded to solve the 10x6 pentomino puzzle which has several advantages:
Because polycube is a general puzzle solver, it is necessarily more cumbersome. As a result, Fletcher's program is not only simpler, but also smaller, which allows for better CPU caching. So if someone were to compare a direct translation of Fletcher's published software to polycube 2.0 and reported Fletcher's software faster I would be neither surprised, nor deterred.
To make a morefair comparison, I could add some generalization of Fletcher's algorithm to polycube. This would not only require finding an algorithm to efficiently assign images to a search tree, it also would require (I think) a new data model since Fletcher's algorithm requires checking the occupancy of cells just outside the puzzle boundary — something polycube doesn't currently allow. I guess I'm not interested in such an endeavor — especially since the results would not necessarily be definitive.
Alternatively, one could write a code generator that takes a puzzle as input and outputs a FILA solver program that's hardcoded and highly optimized to the particulars of the input puzzle. Such a program generated for the 10x6 pentomino problem could, I think, then be fairly compared to Fletcher's original hardcoded program for the same puzzle (with only those minimal modifications needed to translate it to a modern programming language). I suspect such an optimized FILA solver could be made to run far faster than Fletcher's original program. The more I think about this, the less hard it seems like it would be to do. Maybe I'll try it some time — not to prove FILA faster than Fletcher at pentominoes 10x6 (something I already believe to be true), but rather to be able to solve harder puzzles more efficiently.
polycube 2.0 is restricted to puzzles whose cells fall on the integer lattice points of a two or three dimensional Cartesian coordinate system, but there is nothing about the FILA algorithm or POF or NOF filtering that is limited in this way. (The calculation of the ILSI given here does talk about the nearest neighbors in the 6 ordinal directions, but in general a neighbor can be in any direction, and the mapping of those neighbors to ILSI bits is arbitrary.) Other puzzle geometries, like polyiamonds, polyhexes, and polysticks could also be solved with a FILA software application that was suitably abstracted to service those geometries.
This software is protected by the GNU General Public License (GPL) Version 3. See the README.txt file included in the zip file download for more information.
Contents: same as for Windows, but no executable is provided, and all text files are carriage return stripped.
The source is about 16,000 lines of C++ code, with dependencies on two other libraries (boost and the Mersene Twister random number generator) which are also included in the download. The executable file polycube.exe is a Windows console application (sorry, no GUIs folks). For maximum platform compatibility, the provided executable has NOT been compiled to use g++ builtin bitfield operations __builtin_popcount() or __builtin_ctz(). If you make the effort to compile for your own hardware (see README.txt), you should see a moderate performance improvement to FILA. I've seen 8% to 15% depending on the puzzle and the heuristics used.
FILA is a fast flexible recursive backtracking algorithm that uses precalculated (fixed) lists of images that are prefiltered to exclude images incompatible with the cells location, incompatible with cells that must have been previously filled by a heuristic (POF), or incompatible with occupancy states of the nearest neighbors of the targeted cell (NOF).
Fletcher and de Bruijn used a fixed list of 63 pentomino images that were considered for placement at each targeted cell in the 10x6 pentomino problem, but many of these images collide with puzzle walls. By using a separate list of images at each cell, images that lie partially outside the puzzle bounds can be eliminated. For this puzzle, this reduces the number of images in each list by on average 45.6%.
Fletcher and de Bruijn recognized that by filling a puzzle from left to right using a strict cell selection order, the number of images that had to be considered at each cell was greatly reduced (80%). POF filtering generalizes this technique to any heuristic that targets cells in a predetermined order, eliminating all images that conflict with cells that must have been filled prior to the targeted cell.
Instead of considering all images in a set oneatatime, Fletcher walked the cells near a fill target to eliminate images in groups. Instead of walking the whole tree (sections of which often correspond to regions outside the puzzle boundary, or to pieces that are not even available), NOF focuses on just the most important nearest neighbor cells, aggregating their occupancy states into a small index number used to select a set of images built specifically for that compound neighbor occupancy state. This approach eliminates on average an additional twothirds to threefourths of the images that don't fit the puzzle.
The combination of these three strategies eliminates the vast majority of images that don't fit the target. For example, for the solver configuration that produced fastest solve times for the Tetris Cube, the F heuristic (Fletcher's heuristic) was used when 3 pieces were left to be placed. The average number of images that had to be considered by a single recursive invocation of the algorithm at that stage was just 11.3; and the number of these images that didn't fit was just 9.2. This is as compared to the 1,416 unique tetris cube piece images that would populate these lists if no filtering was used at all. NOF filtering is particularly useful for unordered heuristics (where POF filtering is not possible). For the same tetris cube solver configuration, the E heuristic was used when 4 pieces were left to be placed. The average number of images that had to be considered at this stage was only 4.3 with only 3.2 of those images not fitting. So through these simple techniques the lion's whale's share of images that don't fit are eliminated from the algorithm at the cost of checking the occupancy states of at most a few cells. (This is as compared to DLX's highercost approach of dynamically maintaining perfect image lists with every piece placement or removal.) The net effect is faster solve times. The NOF feature alone improved solve times 6% to 17% for the puzzles examined here (though I have seen as high as 27% in other puzzles).
The paper focuses on a more minimalist rule set (whereas my previous blog post solved for facebook farkle rules). The optimization equations are much simplified by using a pair of selfreferential equations describing preroll and postroll game states. The paper also includes a comparison of optimal play vs maxexpectedscore play, a mechanism allowing a human to perfectly replicate maxexpectedscore play, and some simple techniques you can use to win over 49% of your games against an optimal player.
As of the time of this post, the proceedings from the conference have not yet been published, but a link to our paper is provided here for your convenience:
Optimal Play of the Farkle Dice Game
There are some POVRay images included in the paper that graphically show the game states from which you should bank. For your viewing pleasure, I've included below links to the images in their original 16 megapixel detail.
]]>Neller and Presser modeled a simple dice game called pig as a Markov Decision Process (MDP) and used value iteration to find the optimal game winning strategy^{1}. Inspired by their approach, I've constructed a variant of an MDP which can be used to calculate the strategy that maximizes the chances of winning 2player farkle. Due to the three consecutive farkle penalty, an unfortunate or foolish player can farkle repeatedly to achieve an arbitrarily large negative score. For this reason the number of game states is unbounded and a complete MDP model of farkle is not possible. To bound the problem, a limit on the lowest possible banked score is enforced. The calculated strategy is shown to converge exponentially to the optimal strategy as this bound on banked scores is lowered.
Each farkle turn proceeds by iteratively making a preroll banking decision, a (contingent) roll of the dice, and a postroll scoring decision. I modified the classic MDP to include a secondary (postroll) action to fit this turn model. A reward function that incentivizes winning the game is applied. A similarly modified version of valueiteration (that maximizes the value function for both the preroll banking decision, and the postroll scoring decision) is then used to find an optimal farkle strategy.
With a lower bound of 2500 points for banked scores, there are 423,765,000 distinct game states and so it is not convenient to share the entire strategy in printed form. Instead, I provide some general characterizations of the strategy. For example, if both players use this same strategy, the player going first will win 53.487% of the time. I also provide samples of complete singleturn strategies for various initial banked scores. Currently, only the strategy for Facebook Farkle has been calculated, but the strategy for other scoring variants of farkle could easily be deduced using the same software.
Farkle rules differ only slightly from the rules of Zilch, but are provided here for completeness.
Farkle is played with two or more players and six sixsided dice. Each player takes turns rolling the dice. The dice in a roll can be worth points either individually or in combination. If any points are available from the roll, the player must set aside some or all of those scoring dice, adding the score from those dice to their point total for the turn. After each roll, a player may either reroll the remaining dice to try for more points or may bank the points accumulated this turn (though you can never bank less than 300 points). When a player banks his points, the player's turn is ended and the dice are passed to the next player.
If no dice in a roll score, then the player loses all points accumulated this turn and their turn is ended. This is called a farkle, a sorrowful event indeed.
If all dice in a roll score, the player gets to continue his turn with all six dice. This is called hot dice and is guaranteed to brighten your day.
A player may continue rolling again and again accumulating ever more points until he either decides to bank those points or loses them all to a farkle.
If a player ends three consecutive turns with a farkle, they not only lose their points from the turn but also lose 500 points from their banked game score. (This is the only way to lose banked points.) After a triple farkle, your consecutive farkle count is reset to zero so you're safe from another triple farkle penalty for at least three more turns.
The game ends when one player has banked a total of 10,000. Unlike zilch, other players do not get a final turn.
Scoring is as follows:
Note: those familiar with MDPs may find the nonstandard variable names I use to present this standard subject distracting. My aim is to use variable names most meaningful in the final valueiteration equation. I beg your indulgence. As a courtesy, I've included mousehover popups where each such nonstandard variable name is introduced (highlighted in blue) explaining the motivation for the change.
A Markov Decision Process (MDP)^{2} is a system having a finite set of states $S$. For each state $s \in S$, there are a set of actions that may be taken $A(s)$. For each action $a \in A(s)$, there is a set of transition probabilities $P_a(s, s')$ defining the probability of transitioning from $s$ to each state $s' \in S$ given that action $a$ was taken while in state $s$. When action $a$ is taken from state $s$, the MDP responds by randomly moving to a new state $s'$ as governed by the transition probabilities $P_a(s, s')$ and then assigning the decision maker a reward $D_a(s, s')$.
The objective is to find a strategy function $G(s)$ that returns the particular action at each state $s \in S$ that will maximize the expected cumulative reward given by:
$$\sum_{k=0}^{\infty} \gamma^k D_{a_k}(s_k, s_{k+1})$$where $k$ is a discreet time variable, $s_k$ is the game state at time $k$, $a_k$ is the player action taken from from state $s_k$, and $\gamma \in [0,1)$ is a constant discount factor for future rewards. The expectation must be taken over all possible state transition paths, and maximized over all possible choices for the actions $a_k$ taken in each state $s_k$. Then $G(s)$ will be defined by the $a_k$ taken from each state $s_k$ that maximizes this expectation.
One technique for solving this problem is value iteration. With this technique, each state $s$ is given a decimal value $W(s)$ which is an estimate of the expected discounted sum of all future rewards gained from state $s$. The estimate for $W(s)$ is iteratively refined for all $s$ by applying this update equation sequentially to all states $s$: $$W_{i+1}(s) := \max_{a \in A(s)} \left[ \sum_{s'} P_a(s, s')(D_a(s, s') + \gamma W_i(s'))\right]$$
Note that at each state $s$, the action which provides maximum total reward is selected. So not only is the estimate of $W(s)$ iteratively refined, but the best action $a$ taken for each state $s$ is also simultaneously improved. Iteration continues until $W_{i+1}(s)$ and $W_i(s)$ converge for all $s$, and $G(s)$ is then the set of actions $a$ selected for each state $s$ in the final iteration.
During a farkle turn, a player must iteratively
To fit farkle to an MDP model, one need only consider the result of a roll as part of the game state, and then reorder the turn sequence to that of a combined scoringandbank/roll action followed by either an endofturnevent or another randomrollevent. But this increases the number of game states by at least a few orders of magnitude and makes the problem unsolvable without a commensurate increase in computer resources.
Alternatively, the rolldecision could be splintered to include any conceivable combination of scoring instructions, directing the MDP how to score each potential roll before the roll is even made, thereby allowing the state machine to transition without additional input from the player once the roll is made. Aside from being a painful way of thinking about the problem, the number of possible scoring instructions is enormous, and the approach is again not feasible.
Rather than forcing the game to fit the MDP model, I instead define an extended MDP (EMDP) to more naturally model the game. Like an MDP, an EMDP has a finite set of states $S$. For each state $s \in S$, there are a set of primary actions that may be taken $A(s)$. For each primary action $a \in A(s)$, there are a set of sets of secondary actions $R_a(s)$. Once primary action $a$ is taken, one set of secondary actions $r \in R_a(s)$ is selected randomly by the EMDP according to a probability distribution $P_a(s)$. So instead of transitioning, the EMDP responds to action $a$ by offering a randomly chosen set of secondary actions that may be taken. For each secondary action $c \in r$, a deterministic transition state $s'$ is defined by a transition matrix $s' = X_{a,c}(s)$. Selection of action $c$ causes the EMDP to transition to $s'$ and reward $D_{a,c}(s, s')$ is granted.
Because there are two actions to be taken in a turn, the optimal strategy also has two parts: $G_A(s)$ is the optimal primary action in state $s$, and $G_C(s, r)$ is the optimal secondary action to take in state $s$ given that secondary action set $r$ was randomly offered by the EMDP in response to action $a$.
Value iteration processing is also extended to account for the secondary action: $$W_{i+1}(s) := \max_{a \in A(s)} \left[ \sum_{r \in R_a(s)} P_{a,r}(s) \max_{c \in r} \Big[ D_{a,c}(s, s') + \gamma W_i(s') \Big] \right]$$
Neller and Presser chose a reward function to incentivize winning the game^{1}. For any transition from a nonwinning game state $s$ to a winning game state $s_w$, $D_{a,c}(s, s_w) = 1$ (where $s_w$ is any state where the player's banked points plus his turn points meets or exceeds the game goal of 10,000 points, and where his turn points meets or exceeds the minimum banking threshold). For transitions to any other nonwinning state $s_v$, $D_{a,c}(s, s_v) = 0$. Because all game winning states $s_w$ are terminal, all future rewards from such states must be zero, so $W(s_w) = 0$. Because $W$ is known for all game winning states $s_w$, $W(s_w)$ is never updated during value iteration. (I.e., although a game winning state can appear on the right side of the value iteration update equation, it never appears on the left.)
Normally for an MDP, $0 \le \gamma < 1$, but we do not wish to value a game you win in 30 turns, less than a game you win in 10. Following Neller and Presser's approach, I instead set $\gamma = 1$. In general this can prevent value iteration from converging, but it does not cause a problem for farkle. (I think this is because there is no circular state transition path offering unbounded rewards, and because only the player that gets to a game winning state first actually wins, which ensures that the optimal strategy can't be attracted to some infinitely long path to a game winning state.) With $\gamma = 1$, $W(s)$ converges to the probability of winning from any nonterminal state $s$ when using an optimal strategy; the $a$ selected in the $\max_a$ operation converges to the optimal banking strategy from state $s$: $a = G_A(s)$; and the $c$ selected by the $\max_c$ operation converges to the optimal scoring strategy from state $s$ given that roll $r$ was thrown: $c = G_C(s,r)$.
Given that $\gamma = 1$, simplifications can be achieved if you move the reward for transitions to a game winning state out of the reward function, and into the value function for those same game winning states. That is, instead of defining $W(s_w) = 0$, define $W(s_w) = 1$, and set $D_{a,c}(s, s') = 0$ everywhere, allowing the reward function to be completely eliminated from the update equation. This definition also results in $W(s)$ consistently being the probability of winning the game from any state $s$ (including terminal game winning states).
Applying the simplifications of $\gamma = 1$, $D_{a,c}(s, s') = 0$, and $W(s_w) = 1$ reduces the extended value iteration update formula to:
$$W_{i+1}(s) := \max_{a \in A(s)} \left[ \sum_{r \in R_a(s)} P_{a,r}(s) \max_{c \in r} W_i(s') \right]$$This general approach for finding the optimal play strategy for games having both preroll and postroll actions, is further detailed for the specifics of 2player farkle in the sections below.
The current game state $s$ is characterized by six component state variables:
$$s = (t, n, b, d, f, e)$$where $t$ is the number of points accumulated only from your current turn. Once $b + t \gt 9950$ (which means you've hit the goal of 10,000 points) and you've met the minimum requirement to bank $t > 250$, the game is over, so for nonterminal game states we have:
$$t \in \{0, 50, 100, ..., \max [250, 9950b]\}\text{.}$$$n$ is the number of dice you have to roll
$$n \in \{1, 2, 3, 4, 5, 6\}\text{,}$$$b$ is your banked score for which I enforce a lower bound $L$
$$b \in \{L, L+50, ..., 100, 50, 0, 50, 100, ..., 9950\}\text{,}$$$d$ is your opponent's banked score which is also lower bounded to $L$
$$d \in \{L, L+50, ..., 100, 50, 0, 50, 100, ..., 9950\}\text{,}$$$f$ is your consecutive farkle count (from previous turns)
$$f \in \{0, 1, 2\}\text{, and}$$$e$ is your opponent's consecutive farkle count
$$e \in \{0, 1, 2\}\text{.}$$In this section we apply the farkle component state variables and rules defined in previous sections to the EMDP value iteration equation:
$$W_{i+1}(s) := \max_{a \in A(s)} \left[ \sum_{r \in R_a(s)} P_{a,r}(s) \max_{c \in r} W_i(s') \right]$$Let's detail $A(s)$ first. For farkle, $A(s)$ is the set of available banking actions having at most two members: BANK and ROLL. For game states where you have so far accumulated less than 300 points, you have only one preroll (primary) action available to you: ROLL the dice. But for all other game states you have two preroll actions available: BANK, or ROLL. This yields:
\begin{equation} W_{i+1}(s) := \left\{ \begin{array}{ll} \sum\limits_{r \in R_{\text{ROLL}}(s)} P_{{\text{ROLL}},r}(s) \max\limits_{c \in r} W_i(s'), & \text{if $t < 300$}.\\ \\ \max \bigg[\sum\limits_{r \in R_{\text{BANK}}(s)} P_{{\text{BANK}},r}(s) \max\limits_{c \in r} W_i(s'), \\ \hspace{30pt} \sum\limits_{r \in R_{\text{ROLL}}(s)} P_{{\text{ROLL}},r}(s) \max\limits_{c \in r} W_i(s') \bigg], & \text{if $t \ge 300$}.\\ \\ \end{array} \right. \end{equation}In the case of a bank action the equation collapses. $R_{\text{BANK}}(s)$ has only one entry: $r_{\text{BANK}}$. There's only one member of the probability distribution: $P_{{\text{BANK}}, r_{\text{BANK}}}(s) = 1$. Also, $r_{\text{BANK}}$ has only one entry: $c_{\text{BANK}}$. So for the case of $t \ge 300$ , the first member of the outer max operation reduces to just $W_i(s')$. All the mathematical machinery in this case is flexible enough to handle the banking case, but is entirely unnecessary. But what exactly is $s'$ after you bank?
Looking back at the game state characterization from the previous section, there is no variable that encodes whose turn it is. Everything I've written so far is from the perspective of the player who controls the dice, and there is no $s'$ expressible in terms of our six component state variables that identifies your state after a banking operation. (This is by design and is consistent with Neller and Presser's approach for optimizing pig game play strategy.) After you bank, it is your opponent's turn who we assume is also playing the optimal strategy, and we can express his state after you bank. If your game state just before you banked was $s = (t, n, b, d, f, e)$, then your opponent's state after you bank will be $o' = (0, 6, d, b + t, e, 0)$. To be clear, after you bank your opponent will have 0 turn points, 6 dice to roll, a banked score of $d$, an opponent's banked score of $b + t$ (which is your new banked score), $e$ consecutive farkles, and his opponent will have $0$ consecutive farkles (your farkle count reverting to zero having just banked). Your opponent's win probability is $W(o')$, which means our win probability after the bank must be: $W_i(s') = 1  W_i(o') = 1  W_i(0, 6, d, b + t, e, 0)$, which yields: \begin{equation} W_{i+1}(s) := \left\{ \begin{array}{ll} \sum\limits_{r \in R_{\text{ROLL}}(s)} P_{{\text{ROLL}},r}(s) \max\limits_{c \in r} W_i(s'), & \text{if $t < 300$}.\\ \\ \max \bigg[\Big(1  W_i(0,6,d,b+t,e,0)\Big), \\ \hspace{30pt} \sum\limits_{r \in R_{\text{ROLL}}(s)} P_{{\text{ROLL}},r}(s) \max\limits_{c \in r} W_i(s') \bigg], & \text{if $t \ge 300$}.\\ \\ \end{array} \right. \end{equation}
In the case of a ROLL action, $R_{\text{ROLL}}(s)$ corresponds to the set of all possible rolls of the dice from state $s$. More precisely, each roll $r$ is defined as a set of possible scoring decisions for some permutation of thrown dice. The $\sum_r$ operation is summing over each possible roll $r \in R_{\text{ROLL}}(s)$. $P_{\text{ROLL},r}(s)$ is the probability of making roll $r$ from game state $s$. And each $c \in r$ is one possible scoring decision given that roll $r$ was thrown. Given scoring decision $c$, the new game state $s'$ is determined. The $\max_c$ operation maximizes the expected win probability $W_i(s')$ given that roll $r$ was thrown over these possible scoring decisions.
First note that the set of potential rolls from game state $s = (t, n, b, d, f, e)$ is only dependent on the number of dice you are rolling, so:
$$R_{\text{ROLL}}(s) = R_{\text{ROLL}}(n)$$Second note that
$$P_{\text{ROLL},r}(s) = {1 \over {6^n}}$$Third, because the expression for $W_i(s')$ is fundamentally different for the case of farkling rolls vs. scoring rolls, it is convenient to partition $R_{\text{ROLL}}(n)$ into two subsets: the subset of all farkling rolls $R_{\text{FARKLE}}(n)$, and the subset of all scoring rolls $R_{\text{SCORE}}(n)$:
$$R_{\text{ROLL}}(n) = R_{\text{FARKLE}}(n) \bigcup R_{\text{SCORE}}(n)$$Fourth, if roll $r$ is a farkle, then $r$ will have only one member: a zero point farkle scoring decision and the $max_c$ operation can be dropped.
Applying these four observations gives:
\begin{equation} \begin{array}{rl} \sum\limits_{r \in R_{\text{ROLL}}(s)} P_{{\text{ROLL}},r}(s) \max\limits_{c \in r} W_i(s') &= \sum\limits_{r \in R_{\text{FARKLE}}(n)} {1 \over {6^n}} W_i(s') + \sum\limits_{r \in R_{\text{SCORE}}(n)} {1 \over {6^n}} \max\limits_{c \in r} W_i(s') \\ &= {1 \over {6^n}} \sum\limits_{r \in R_{\text{FARKLE}}(n)} W_i(s') + {1 \over {6^n}} \sum\limits_{r \in R_{\text{SCORE}}(n)} \max\limits_{c \in r} W_i(s') \end{array} \end{equation}After a farkle, it becomes your opponent's turn and there is again no expression for the new game state. We again instead express $W_i(s')$ in terms of your opponents win probability after your farkle:
$$W_i(s') = 1  W_i(0, 6, d, b', e, f')$$where $b'$ is your new banked score after your farkle, which may have decreased due to the three consecutive farkle penalty, but is enforced to always be at least $L$ to keep the problem tractable:
$$b' = \max [ L, b  Y_f ]\text{;}$$ where $Y_f$ is the number of points you lose from your banked score when you farkle while already having $f$ consecutive farkles $$\begin{align*} Y_0 &= 0 \\ Y_1 &= 0 \\ Y_2 &= 500\text{;} \end{align*}$$and where $f'$ is your new consecutive farkle count which normally just increments, but is reset back to zero if you just had your third consecutive farkle
$$f' = (f+1) \mod 3\text{.}$$Note also that for all farkling rolls the expression inside the sum is the same, so we can replace the sum with a multiplicative factor equaling the number of ways to roll a farkle with $n$ dice. Combining that count with the ${1 \over {6^n}}$ simplifies to the probability of farkling with $n$ dice^{3}:
$$\begin{align*} F_1 &= 2/3 \\ F_2 &= 4/9 \\ F_3 &= 5/18 \\ F_4 &= 17/108 \\ F_5 &= 25/324 \\ F_6 &= 5/216 \\ \end{align*}$$Combining the above observations gives this substitution:
$${1 \over {6^n}} \sum\limits_{r \in R_{\text{FARKLE}}(n)} W_i(s') = F_n (1  W_i(0,6,d,b',e,f'))$$To detail the expression for the case of scoring rolls, first let $C_T(c)$ be the number of points taken with scoring combination $c$, and let $C_N(c)$ be the number of dice used with scoring combination $c$. So after rolling roll $r \in R_{\text{SCORE}}(n)$ and selecting scoring action $c \in r$ the state transitions from $s = (t, n, b, d, f, e)$ to
$$s' = (t', n', b, d, f, e)$$where
$$ \begin{align*} t' &= t+C_T(c) \\ n' &= h(nC_N(c)) \end{align*} $$and where $h(x)$ is a hotdice function for resetting the number of available dice back to $6$ when all dice are successfully scored:
$$h(n)=\begin{cases}6, & \text{for $n = 0$.} \\ n, & \text{otherwise.}\end{cases}$$This gives our final value iteration equation, where (below) I repeat all the supporting equations for convenience:
$$ W_{i+1}(t,n,b,d,f,e) := \left\{ \begin{array}{ll} F_n (1  W_i(0,6,d,bY_f,e,f')) + \\ \hspace{30pt} {1 \over 6^n} \sum\limits_{r \in R_{\text{SCORE}}(n)} \max\limits_{c \in r} \left[ W_i(t',n',b,d,f,e)\right], & \text{if $t < 300$}.\\ \\ \max \bigg[ \Big(1  W_i(0,6,d,b+t,e,0)\Big), \\ \hspace{30pt} \Big( F_n (1  W_i(0,6,d,b',e,f')) + \\ \hspace{60pt} {1 \over 6^n} \sum\limits_{r \in R_{\text{SCORE}}(n)} \max\limits_{c \in r} \left[ W_i(t',n',b,d,f,e) \right] \Big) \bigg], & \text{if $t \ge 300$}. \end{array} \right. $$where
$$ \begin{align*} b' &= \max [ L, b  Y_f ] \\ f' &= (f+1) \mod 3 \\ t' &= t+C_T(c) \\ n' &= h(nC_N(c)) \\ h(n) &=\begin{cases}6, & \text{for $n = 0$.} \\ n, & \text{otherwise.}\end{cases} \\ Y_0 &= 0 \\ Y_1 &= 0 \\ Y_2 &= 500 \\ F_1 &= 2/3 \\ F_2 &= 4/9 \\ F_3 &= 5/18 \\ F_4 &= 17/108 \\ F_5 &= 25/324 \\ F_6 &= 5/216 \\ \end{align*} $$With a lower bound of $L=2500$ there are 423,765,000 game states. Each state is modeled with a double precision floating point number requiring 8 bytes, so the entire state matrix requires 3.39012 billion bytes of RAM (plus some array overhead). With game goal $g$ the number of states grows as $(gL)^3$. Lowering $L$ from 2500 to, say, 10,000 (to eliminate all reasonable doubt that you'll ever venture into portions of the calculated strategy that are nonoptimal) will increase the number of game states to over 1.7 billion and memory requirements to 14 GB (which is more than I have on any of my computers).
The value iteration software to solve the optimal farkle problem was written in C++. Obvious optimizations were made. For example, I don't actually iterate over possible rolls, but only over a precalculated set of unique score sets (which is orders of magnitude smaller in count than rolls), and weighting each score set by the number of rolls that share that same set. I let the program iterate over all game states until the maximum relative change over all states from one iteration to the next was less than 1 part in a billion. (I.e., iterations continued until no state had a change in value from one iteration to the next of more than 1 part in a billion, so that even those states with extremely small win probabilities were calculated with precision.) With the floor set to 2500 points, it took 62 iterations for the matrix to converge. Using one core of a dualcore Intel I3 4130T, the software performed 1.30 million state updates per second, each pass of the matrix taking 5 minutes 26 seconds, and convergence taking 5 hours 37 minutes.
It is not practical to list the win probabilities and banking rules for all of the half billion game states in two player farkle. Here I provide only a limited view into the complete strategy by means of five examples. Each of the five subsections below provide a 2 dimensional slice of the 6 dimensional strategy matrix, sufficient for optimal play of a single turn for a particular startofturn game state.
Table 1 shows the win probabilities and banking actions needed to play an opening turn optimally. Each cell shows the win probability for different turn scores (starting at $t=0$ in the top row and increasing down the page) and/or different number of dice to roll (starting at $n=6$ in the first column and decreasing as you move to the right). For all entries in this table, the other four component state variables are fixed: your banked score (b) is fixed at 0 points, your opponent's banked score (d) at 0 points, your consecutive farkle count (f) at 0, and your opponent's consecutive farkle count (e) at zero. States shaded green are states from which your optimal banking action is to roll. States shaded red are states from which your optimal banking action is to bank. States with an asterisk are inaccessible and can be ignored (although the software still calculates your win probability in, and optimal play out of these states which would be useful if, say through a disruption in the laws of the universe, you somehow find yourself in such a state).
t  n  

6  5  4  3  2  1  
0  0.534870  0.506721^{*}  0.493622^{*}  0.487801^{*}  0.486163^{*}  0.489950^{*} 
50  0.540099^{*}  0.511005  0.495849^{*}  0.489177^{*}  0.487586^{*}  0.491718^{*} 
100  0.545359^{*}  0.515776  0.499374  0.490798^{*}  0.489034^{*}  0.493525^{*} 
150  0.550709^{*}  0.520616^{*}  0.503815  0.493135  0.490494^{*}  0.495375^{*} 
200  0.556198^{*}  0.525472^{*}  0.508564  0.497158  0.492462  0.497250^{*} 
250  0.561811^{*}  0.530484^{*}  0.513285^{*}  0.502074  0.495439  0.499128 
300  0.567448  0.535760^{*}  0.518054^{*}  0.506680  0.503290  0.503290 
350  0.573082  0.541181  0.523166^{*}  0.511296^{*}  0.509711  0.509711 
400  0.578710  0.546604  0.528579  0.516146  0.516146  0.516146 
450  0.584332  0.552028  0.533996  0.522592  0.522592  0.522592 
500  0.589966  0.557452  0.539418  0.529048  0.529048  0.529048 
550  0.595675  0.562873  0.544842  0.535513  0.535513  0.535513 
600  0.601436  0.568338  0.550268  0.541986  0.541986  0.541986 
650  0.607195  0.573937  0.555693  0.548463  0.548463  0.548463 
700  0.612941  0.579588  0.561115  0.554944  0.554944  0.554944 
750  0.618671  0.585232  0.566534  0.561427  0.561427  0.561427 
800  0.624400  0.590866  0.571948  0.567910  0.567910  0.567910 
850  0.630173  0.596489  0.577354  0.574392  0.574392  0.574392 
900  0.635987  0.602137  0.582753  0.580870  0.580870  0.580870 
950  0.641793  0.607914  0.588142  0.587343  0.587343  0.587343 
1000  0.647615  0.613782  0.593809  0.593809  0.593809  0.593809 
1050  0.653417  0.619633  0.600266  0.600266  0.600266  0.600266 
1100  0.659192  0.625466  0.606713  0.606713  0.606713  0.606713 
1150  0.664939  0.631280  0.613147  0.613147  0.613147  0.613147 
1200  0.670657  0.637072  0.619567  0.619567  0.619567  0.619567 
1250  0.676345  0.642842  0.625970  0.625970  0.625970  0.625970 
1300  0.682002  0.648587  0.632356  0.632356  0.632356  0.632356 
1350  0.687625  0.654306  0.638721  0.638721  0.638721  0.638721 
1400  0.693212  0.659998  0.645065  0.645065  0.645065  0.645065 
1450  0.698762  0.665660  0.651385  0.651385  0.651385  0.651385 
1500  0.704289  0.671292  0.657681  0.657681  0.657681  0.657681 
1550  0.709840  0.676892  0.663950  0.663950  0.663950  0.663950 
1600  0.715350  0.682458  0.670190  0.670190  0.670190  0.670190 
1650  0.720818  0.687989  0.676400  0.676400  0.676400  0.676400 
1700  0.726243  0.693483  0.682579  0.682579  0.682579  0.682579 
1750  0.731622  0.698938  0.688723  0.688723  0.688723  0.688723 
1800  0.736955  0.704354  0.694832  0.694832  0.694832  0.694832 
1850  0.742239  0.709727  0.700904  0.700904  0.700904  0.700904 
1900  0.747472  0.715058  0.706936  0.706936  0.706936  0.706936 
1950  0.752655  0.720345  0.712927  0.712927  0.712927  0.712927 
2000  0.757846  0.725587  0.718875  0.718875  0.718875  0.718875 
2050  0.763028  0.730782  0.724781  0.724781  0.724781  0.724781 
2100  0.768162  0.735929  0.730640  0.730640  0.730640  0.730640 
2150  0.773241  0.741026  0.736453  0.736453  0.736453  0.736453 
2200  0.778262  0.746072  0.742217  0.742217  0.742217  0.742217 
2250  0.783224  0.751066  0.747932  0.747932  0.747932  0.747932 
2300  0.788127  0.756006  0.753594  0.753594  0.753594  0.753594 
2350  0.792969  0.760892  0.759204  0.759204  0.759204  0.759204 
2400  0.797788  0.765722  0.764759  0.764759  0.764759  0.764759 
2450  0.802600  0.770495  0.770258  0.770258  0.770258  0.770258 
2500  0.807367  0.775700  0.775700  0.775700  0.775700  0.775700 
2550  0.812069  0.781084  0.781084  0.781084  0.781084  0.781084 
2600  0.816707  0.786408  0.786408  0.786408  0.786408  0.786408 
2650  0.821278  0.791671  0.791671  0.791671  0.791671  0.791671 
2700  0.825781  0.796871  0.796871  0.796871  0.796871  0.796871 
2750  0.830218  0.802008  0.802008  0.802008  0.802008  0.802008 
2800  0.834586  0.807081  0.807081  0.807081  0.807081  0.807081 
2850  0.838886  0.812087  0.812087  0.812087  0.812087  0.812087 
2900  0.843116  0.817025  0.817025  0.817025  0.817025  0.817025 
2950  0.847277  0.821894  0.821894  0.821894  0.821894  0.821894 
3000  0.851367  0.826696  0.826696  0.826696  0.826696  0.826696 
3050  0.855387  0.831428  0.831428  0.831428  0.831428  0.831428 
3100  0.859336  0.836090  0.836090  0.836090  0.836090  0.836090 
3150  0.863214  0.840682  0.840682  0.840682  0.840682  0.840682 
3200  0.867019  0.845202  0.845202  0.845202  0.845202  0.845202 
3250  0.870753  0.849650  0.849650  0.849650  0.849650  0.849650 
3300  0.874415  0.854025  0.854025  0.854025  0.854025  0.854025 
3350  0.878005  0.858327  0.858327  0.858327  0.858327  0.858327 
3400  0.881522  0.862555  0.862555  0.862555  0.862555  0.862555 
3450  0.884967  0.866709  0.866709  0.866709  0.866709  0.866709 
3500  0.888343  0.870788  0.870788  0.870788  0.870788  0.870788 
3550  0.891652  0.874791  0.874791  0.874791  0.874791  0.874791 
3600  0.894888  0.878719  0.878719  0.878719  0.878719  0.878719 
3650  0.898052  0.882571  0.882571  0.882571  0.882571  0.882571 
3700  0.901145  0.886347  0.886347  0.886347  0.886347  0.886347 
3750  0.904167  0.890047  0.890047  0.890047  0.890047  0.890047 
3800  0.907118  0.893671  0.893671  0.893671  0.893671  0.893671 
3850  0.909999  0.897219  0.897219  0.897219  0.897219  0.897219 
3900  0.912809  0.900691  0.900691  0.900691  0.900691  0.900691 
3950  0.915548  0.904088  0.904088  0.904088  0.904088  0.904088 
4000  0.918219  0.907409  0.907409  0.907409  0.907409  0.907409 
4050  0.920820  0.910655  0.910655  0.910655  0.910655  0.910655 
4100  0.923353  0.913826  0.913826  0.913826  0.913826  0.913826 
4150  0.925819  0.916920  0.916920  0.916920  0.916920  0.916920 
4200  0.928219  0.919939  0.919939  0.919939  0.919939  0.919939 
4250  0.930560  0.922884  0.922884  0.922884  0.922884  0.922884 
4300  0.932847  0.925756  0.925756  0.925756  0.925756  0.925756 
4350  0.935068  0.928555  0.928555  0.928555  0.928555  0.928555 
4400  0.937226  0.931282  0.931282  0.931282  0.931282  0.931282 
4450  0.939321  0.933937  0.933937  0.933937  0.933937  0.933937 
4500  0.941354  0.936521  0.936521  0.936521  0.936521  0.936521 
4550  0.943326  0.939035  0.939035  0.939035  0.939035  0.939035 
4600  0.945236  0.941479  0.941479  0.941479  0.941479  0.941479 
4650  0.947087  0.943854  0.943854  0.943854  0.943854  0.943854 
4700  0.948879  0.946160  0.946160  0.946160  0.946160  0.946160 
4750  0.950612  0.948400  0.948400  0.948400  0.948400  0.948400 
4800  0.952287  0.950573  0.950573  0.950573  0.950573  0.950573 
4850  0.953905  0.952679  0.952679  0.952679  0.952679  0.952679 
4900  0.955468  0.954721  0.954721  0.954721  0.954721  0.954721 
4950  0.956977  0.956699  0.956699  0.956699  0.956699  0.956699 
5000  0.958614  0.958614  0.958614  0.958614  0.958614  0.958614 
The game begins in the upper left cell at $t=0$ and $n=6$. The win probability listed here means that if both players play optimally, the player going first wins 53.487% of the time. To play a turn, first you must choose your banking action. Since cell (t=0, n=6) is green, the optimal banking action is to roll.
Second, you roll the dice. Let's say for your opening roll, you throw:
$$6, 5, 3, 3, 3, 2$$Third, you must choose how to score the dice. Here you have three choices:
To determine your optimal scoring choice, you must look at the win probabilities from the three states corresponding to each scoring option:
$$ \begin{align*} W(t=50, n=5) &= 0.511005\\ W(t=300, n=3) &= 0.506680\\ W(t=350, n=2) &= 0.509711 \end{align*} $$So in this case, you should score only the 5 since it moves you to the state with the best probability of ultimately winning the game. This ends one primaryaction, randomresponse, secondaryaction sequence effecting one state change in the EMDP. This process is then simply repeated (starting with another banking decision from the new state (t=50, n=5) until you either roll a farkle, or end in a state where you're supposed to bank.
It is interesting to see how the optimal play strategy for the opening turn differs from the strategy that simply maximizes your expected farkle turn score. To maximize expected score you wouldn't bank with 6 dice available to roll unless you had 16,400 or more points on the turn. In contrast, to optimize your chances of winning a game, you would bank 5000 or more points on your opening turn even when you have six dice available to roll. The banking threshold for 5 dice is similarly reduced from 3050 to a more conservative 2500. The banking threshold for other available dice counts do not differ.
Note that if you manage to bank 1000 points on your opening turn, your chances of winning increase to almost 60%; banking 1850 points on your first turn leaves you with better than 70% chance of ultimately winning; bank 2750 points and your chances of winning are above 80%; and manage to put up 3900 points on the opening turn and you're win probability is over 90%.
As a second example, consider the case of Table 2 which shows how to optimally play a turn where you start with 6000 points and are 2000 points behind to your opponent who has 8000 points.
t  n  

6  5  4  3  2  1  
0  0.162365  0.135452^{*}  0.124658^{*}  0.118916^{*}  0.116823^{*}  0.120172^{*} 
50  0.167053^{*}  0.138098  0.126521^{*}  0.120462^{*}  0.118340^{*}  0.121980^{*} 
100  0.172282^{*}  0.141184  0.128485  0.122068^{*}  0.119912^{*}  0.123894^{*} 
150  0.177898^{*}  0.144859^{*}  0.130513  0.123738  0.121540^{*}  0.125881^{*} 
200  0.183766^{*}  0.149237^{*}  0.133183  0.125473  0.123237  0.127935^{*} 
250  0.189820^{*}  0.153963^{*}  0.136653^{*}  0.127264  0.125005  0.130069 
300  0.196092  0.158915^{*}  0.140895^{*}  0.130017  0.126838  0.132298 
350  0.202623  0.164030  0.145391^{*}  0.133939^{*}  0.128728  0.134616 
400  0.209467  0.169363  0.150024  0.138163  0.133991  0.137010 
450  0.216533  0.174975  0.154798  0.142503  0.139457  0.139475 
500  0.223831  0.180897  0.159828  0.146975  0.145115  0.145115 
550  0.231322  0.187028  0.165248  0.151603  0.150876  0.150876 
600  0.239035  0.193362  0.171012  0.156843  0.156843  0.156843 
650  0.246918  0.199976  0.176975  0.163015  0.163015  0.163015 
700  0.255037  0.206817  0.183154  0.169453  0.169453  0.169453 
750  0.263357  0.213830  0.189533  0.176109  0.176109  0.176109 
800  0.271945  0.221070  0.196123  0.183059  0.183059  0.183059 
850  0.280741  0.228460  0.202874  0.190242  0.190242  0.190242 
900  0.289790  0.236097  0.209824  0.197712  0.197712  0.197712 
950  0.298954  0.243918  0.216938  0.205281  0.205281  0.205281 
1000  0.308358  0.251943  0.224242  0.213245  0.213245  0.213245 
1050  0.317895  0.260087  0.231740  0.221146  0.221146  0.221146 
1100  0.327785  0.268509  0.239462  0.229404  0.229404  0.229404 
1150  0.337937  0.277274  0.247380  0.237923  0.237923  0.237923 
1200  0.348393  0.286386  0.255529  0.246764  0.246764  0.246764 
1250  0.358985  0.295733  0.263897  0.255751  0.255751  0.255751 
1300  0.369743  0.305337  0.272544  0.264989  0.264989  0.264989 
1350  0.380541  0.315057  0.281403  0.274404  0.274404  0.274404 
1400  0.391642  0.324961  0.290466  0.284353  0.284353  0.284353 
1450  0.402933  0.334964  0.299633  0.294503  0.294503  0.294503 
1500  0.414660  0.345257  0.308958  0.305091  0.305091  0.305091 
1550  0.426484  0.355749  0.318423  0.315621  0.315621  0.315621 
1600  0.438586  0.366590  0.328154  0.326360  0.326360  0.326360 
1650  0.450706  0.377696  0.338163  0.337112  0.337112  0.337112 
1700  0.463015  0.389278  0.348576  0.348173  0.348173  0.348173 
1750  0.475329  0.401117  0.359573  0.359573  0.359573  0.359573 
1800  0.488007  0.413244  0.371759  0.371759  0.371759  0.371759 
1850  0.500848  0.425457  0.384134  0.384134  0.384134  0.384134 
1900  0.514151  0.437907  0.397273  0.397273  0.397273  0.397273 
1950  0.527546  0.450291  0.410346  0.410346  0.410346  0.410346 
2000  0.541169  0.463032  0.423910  0.423910  0.423910  0.423910 
2050  0.553892  0.475650  0.435848  0.435848  0.435848  0.435848 
2100  0.566687  0.488532  0.448671  0.448671  0.448671  0.448671 
2150  0.579544  0.501372  0.462112  0.462112  0.462112  0.462112 
2200  0.593075  0.514412  0.476647  0.476647  0.476647  0.476647 
2250  0.606845  0.527568  0.490860  0.490860  0.490860  0.490860 
2300  0.620874  0.541235  0.505044  0.505044  0.505044  0.505044 
2350  0.634230  0.554962  0.518525  0.518525  0.518525  0.518525 
2400  0.647380  0.568960  0.532648  0.532648  0.532648  0.532648 
2450  0.659830  0.582404  0.547160  0.547160  0.547160  0.547160 
2500  0.672922  0.595574  0.563571  0.563571  0.563571  0.563571 
2550  0.685577  0.608065  0.579471  0.579471  0.579471  0.579471 
2600  0.699046  0.620640  0.594828  0.594828  0.594828  0.594828 
2650  0.712677  0.633358  0.608315  0.608315  0.608315  0.608315 
2700  0.727273  0.647372  0.621304  0.621304  0.621304  0.621304 
2750  0.740566  0.661998  0.632781  0.632781  0.632781  0.632781 
2800  0.752872  0.677539  0.646547  0.646547  0.646547  0.646547 
2850  0.763740  0.691731  0.661928  0.661928  0.661928  0.661928 
2900  0.775267  0.704645  0.683575  0.683575  0.683575  0.683575 
2950  0.787428  0.716595  0.704022  0.704022  0.704022  0.704022 
3000  0.801293  0.729617  0.722026  0.722026  0.722026  0.722026 
3050  0.812468  0.740574  0.732915  0.732915  0.732915  0.732915 
3100  0.824009  0.752452  0.744321  0.744321  0.744321  0.744321 
3150  0.834312  0.764049  0.754637  0.754637  0.754637  0.754637 
3200  0.844465  0.775780  0.768552  0.768552  0.768552  0.768552 
3250  0.854392  0.785990  0.782612  0.782612  0.782612  0.782612 
3300  0.865209  0.800609  0.800609  0.800609  0.800609  0.800609 
3350  0.875145  0.811712  0.811712  0.811712  0.811712  0.811712 
3400  0.887417  0.826357  0.826357  0.826357  0.826357  0.826357 
3450  0.896088  0.833481  0.833481  0.833481  0.833481  0.833481 
3500  0.907011  0.847263  0.847263  0.847263  0.847263  0.847263 
3550  0.912996  0.864340  0.864340  0.864340  0.864340  0.864340 
3600  0.918091  0.883970  0.883970  0.883970  0.883970  0.883970 
3650  0.920604  0.894832  0.894832  0.894832  0.894832  0.894832 
3700  0.925351  0.903017  0.903017  0.903017  0.903017  0.903017 
3750  0.933956  0.903045  0.903045  0.903045  0.903045  0.903045 
3800  0.950436  0.903078  0.903078  0.903078  0.903078  0.903078 
3850  0.962853  0.903098  0.903098  0.903098  0.903098  0.903098 
3900  0.973675  0.917497  0.903124  0.903124  0.903124  0.903124 
3950  0.979061  0.930203  0.903136  0.903136  0.903136  0.903136 
Given your 2000 point defecit, it is interesting to see how much more agressively this turn should be played compared to an opening turn: the banking thresholds here for 6, 5, 4, 3, 2 and 1 dice are, respectively, 4000, 3350, 1750, 600, 400, and 500 points. Compare that to the banking thresholds for an opening turn at 5000, 2500, 1000, 350, 300, and 300 points.
Another interesting observation about this strategy is that although you should bank between 3300 and 3850 points when you have 5 dice to roll, you should instead roll if you have 3900 to 3950 points (because you have such a high probability of closing out the game).
Now consider the reverse of the situation in the previous example. Table 3 shows how to optimally play a turn where you start with 8000 points and are 2000 points ahead of your opponent who has 6000 points.
t  n  

6  5  4  3  2  1  
0  0.903422  0.878090^{*}  0.862982^{*}  0.857184^{*}  0.856210^{*}  0.860061^{*} 
50  0.907970^{*}  0.883520  0.866464^{*}  0.858208^{*}  0.857345^{*}  0.861539^{*} 
100  0.912448^{*}  0.888520  0.872004  0.860637^{*}  0.858466^{*}  0.863007^{*} 
150  0.916837^{*}  0.893230^{*}  0.877734  0.864994  0.859560^{*}  0.864460^{*} 
200  0.921258^{*}  0.898001^{*}  0.882899  0.871213  0.863650  0.865889^{*} 
250  0.925550^{*}  0.902725^{*}  0.887727^{*}  0.877180  0.868829  0.867273 
300  0.929835  0.907504^{*}  0.892777^{*}  0.882276  0.882276  0.882276 
350  0.933851  0.912127  0.897734^{*}  0.888668^{*}  0.888668  0.888668 
400  0.937676  0.916796  0.902726  0.894905  0.894905  0.894905 
450  0.941287  0.921250  0.907616  0.900824  0.900824  0.900824 
500  0.944922  0.925413  0.912287  0.907268  0.907268  0.907268 
550  0.948355  0.929370  0.916532  0.913595  0.913595  0.913595 
600  0.951784  0.933391  0.920430  0.919922  0.919922  0.919922 
650  0.955073  0.937177  0.925407  0.925407  0.925407  0.925407 
700  0.958290  0.940961  0.930450  0.930450  0.930450  0.930450 
750  0.961368  0.944716  0.934554  0.934554  0.934554  0.934554 
800  0.964239  0.948516  0.938914  0.938914  0.938914  0.938914 
850  0.966594  0.952008  0.943151  0.943151  0.943151  0.943151 
900  0.968986  0.955211  0.948312  0.948312  0.948312  0.948312 
950  0.971335  0.958000  0.953239  0.953239  0.953239  0.953239 
1000  0.973780  0.960716  0.958124  0.958124  0.958124  0.958124 
1050  0.975945  0.963165  0.961202  0.961202  0.961202  0.961202 
1100  0.978076  0.965682  0.964291  0.964291  0.964291  0.964291 
1150  0.979977  0.968029  0.966812  0.966812  0.966812  0.966812 
1200  0.981763  0.970175  0.970114  0.970114  0.970114  0.970114 
1250  0.983328  0.973408  0.973408  0.973408  0.973408  0.973408 
1300  0.984740  0.977098  0.977098  0.977098  0.977098  0.977098 
1350  0.986091  0.979125  0.979125  0.979125  0.979125  0.979125 
1400  0.987692  0.981344  0.981344  0.981344  0.981344  0.981344 
1450  0.988993  0.982226  0.982226  0.982226  0.982226  0.982226 
1500  0.990373  0.984029  0.984029  0.984029  0.984029  0.984029 
1550  0.991218  0.985999  0.985999  0.985999  0.985999  0.985999 
1600  0.991737  0.989914  0.989914  0.989914  0.989914  0.989914 
1650  0.992073  0.992073  0.992073  0.992073  0.992073  0.992073 
1700  0.992947  0.992947  0.992947  0.992947  0.992947  0.992947 
1750  0.992964  0.992964  0.992964  0.992964  0.992964  0.992964 
1800  0.993911  0.992981  0.992981  0.992981  0.992981  0.992981 
1850  0.994722  0.992990  0.992990  0.992990  0.992990  0.992990 
1900  0.995640  0.992998  0.992998  0.992998  0.992998  0.992998 
1950  0.996180  0.993000  0.993000  0.993000  0.993000  0.993000 
In this example, you have a 2000 point lead and are yourself only 2000 points from winning the game. It is interesting to see how much more conservatively this turn should be played compared to an opening turn: the banking thresholds here for the different number of dice to throw are 1650, 1250, 650, 300, 300, and 300 points. Compare that to the banking thresholds for an opening turn at 5000, 2500, 1000, 350, 300, and 300 points.
As in the previous example, there is another banking rule inversion in this table, but this time for the case of 6 dice, where at point values above 1750 it again becomes advantageous to roll to try to end the game.
As a fourth example, consider the case of Table 4 which shows how to optimally play a turn where you start with 9000 points and are 500 points behind your opponent who has 9500 points.
t  n  

6  5  4  3  2  1  
0  0.454366  0.351125^{*}  0.303308^{*}  0.277271^{*}  0.270069^{*}  0.285659^{*} 
50  0.463700^{*}  0.358576  0.309820^{*}  0.283131^{*}  0.274544^{*}  0.289393^{*} 
100  0.475303^{*}  0.366847  0.316369  0.289394^{*}  0.280312^{*}  0.294362^{*} 
150  0.486102^{*}  0.376478^{*}  0.323286  0.295374  0.286679^{*}  0.300901^{*} 
200  0.505119^{*}  0.388392^{*}  0.331770  0.301490  0.292906  0.309194^{*} 
250  0.525335^{*}  0.399404^{*}  0.342403^{*}  0.308335  0.298948  0.317272 
300  0.554879  0.417884^{*}  0.354687^{*}  0.317757  0.305197  0.325207 
350  0.573800  0.432225  0.366843^{*}  0.329936^{*}  0.313516  0.332809 
400  0.602493  0.454655  0.382838  0.343746  0.325562  0.341277 
450  0.619405  0.462471  0.391661  0.356792  0.340660  0.354200 
500  0.653307  0.481158  0.404518  0.367897  0.356345  0.372251 
550  0.696939  0.489452  0.408392  0.376025  0.370389  0.393025 
600  0.761617  0.519126  0.416628  0.381381  0.381505  0.412584 
650  0.821580  0.577904  0.419050  0.384455  0.389121  0.429136 
700  0.878974  0.671873  0.455551  0.391137  0.393701  0.441170 
750  0.920889  0.759474  0.538352  0.391141  0.396326  0.448721 
800  0.951179  0.838237  0.658773  0.453755  0.398829  0.452947 
850  0.966192  0.886061  0.752905  0.563689  0.401559  0.455471 
900  0.976536  0.921141  0.831614  0.696407  0.522215  0.459381 
950  0.981336  0.937788  0.873088  0.776038  0.641661  0.462492 
I find it interesting that the only time you should bank during such a turn is if you have exactly 3 dice to roll and either 700 or 750 points on the turn. Why does it make sense to roll just 1 or 2 dice with those same number of points? And why does it make sense to roll three dice when you have 800 or more points?
To answer the first question, observe that it's easier to ultimately hotdice (score all dice thrown) when you are rolling only 1 or 2 dice than when you are rolling 3. If you can manage to get back to 6 dice, your chances of ending the game this turn increase dramatically. Furthermore, any additional game points you gain by rolling 3 dice are meaningless unless you hotdice. If only 1 or 2 of the 3 thrown dice score, you will still find yourself below the game goal and (if you then decide to bank these few extra points) will not have even reduced the number of points you have to accumulate on your subsequent turn due to the minimum banking threshold.
To answer the second question, note that it's possible to reach 10,000 points without hotdice: in particular there's a 7% chance of rolling two ones and a third nonscoring die. These rolls end the game if you have 9800 points.
Now consider the reverse of the situation in the previous example. Table 3 shows how to optimally play a turn where you start with 9500 points and are 500 points ahead of your opponent who has 9000 points.
t  n  

6  5  4  3  2  1  
0  0.801016  0.702303^{*}  0.658594^{*}  0.637598^{*}  0.630975^{*}  0.640081^{*} 
50  0.826173^{*}  0.706846  0.660815^{*}  0.642258^{*}  0.639026^{*}  0.652003^{*} 
100  0.863321^{*}  0.724040  0.665063  0.645329^{*}  0.645400^{*}  0.663218^{*} 
150  0.897707^{*}  0.757952^{*}  0.666217  0.647091  0.649767^{*}  0.672708^{*} 
200  0.930612^{*}  0.811876^{*}  0.687617  0.648362  0.652392  0.679607^{*} 
250  0.954643^{*}  0.862100^{*}  0.735325^{*}  0.649657  0.653897  0.683936 
300  0.972009  0.907257^{*}  0.804365^{*}  0.686823  0.655332  0.686359 
350  0.980617  0.934675  0.858333^{*}  0.749851^{*}  0.656897  0.687806 
400  0.986548  0.954788  0.903460  0.825942  0.726073  0.690048 
450  0.989300  0.964332  0.927238  0.871597  0.794555  0.691832 
Here I found it surprising that even though you have a 500 point lead and control of the dice, the optimal strategy requires that you never bank until you've reached the gamewinning 500 points. This is surprisingly more aggressive play than on an opening turn where the score is tied. I would have guessed that banking 300 points when faced with rolling 1 or 2 dice would have been better to avoid the highprobability farkle, then allow your opponent a chance to force his way to a 1000 point turn, and assuming he fails, follow up with a highprobability minimum bank turn to win. It turns out if your opponent plays perfectly, he has about a 1/3 chance of reaching 1000 points to steal the win, and so playing a little more aggressively to attempt to end the game and deny him that possible steal is advantageous.
Also note how for many point totals, you are better off with fewer dice. This is often the case when playing turns where your best play is to try to reach 10,000 points and end the game, and also in cases where you are just really far behind an opponent that is fast approaching a game winning score.
In this section I share the efforts I've made to sanity check the calculated strategies for reasonableness, and to determine the deviation between the calculated strategy and the truly optimal strategy. This section definitely gets into the weeds and is not for the faint of heart, but may be of interest to skeptics or the particularly nerdy (of which I am both).
Strategies that maximize expected turn score for farkle variants have been independently calculated with consistent results^{4,5,6}. These strategies do not consider your current banked score, your opponent's banked score or farkle count, or the end of game scoring goal. Still, one would expect that for an opening turn (where the score is tied at zero and where the endofgame scoring goal is distant) that the game winning strategy documented here and the strategy that maximizes expected turn score would be similar for low turn score states. Furthermore, the two strategies should converge if the game scoring goal is increased from 10,000 points towards infinity. Unfortunately, the 10,000 point goal was already stretching both the memory limits and processing power of my computer. To reduce the size of the state space, I eliminated the three consecutive farkle penalty. This effectively eliminates the f and e state variables, reducing the size of the state space by a factor of 9, and also eliminates the need to model banked score states of less than 0 points, which shrinks things even more.
My Farkle Strategy Generator (FSG)^{6} produces farkle strategies that maximize expected turn score. The FSG outputs strategy tables that prioritize each turn state with a sequential preference number. Those states associated with higher expected turn scores have a higher preference number. To follow the strategy, you score thrown dice to move to the state with the highest number. This is exactly the same process you use to follow the game winning strategy presented in this document: you score your dice to move to the state with the highest probability of winning. By indexing the states from lowest win probability to highest, the two strategies can easily be compared.
First compare the two strategies for the case of a 10,000 point goal. Table 6W shows the strategy that maximizes your chances of winning the game (produced by value iteration) for an opening turn. Table 6E shows the strategy that maximizes expected turn score (produced by the FSG). Turn scores in both cases are truncated to a maximum of 1500. States with different preference numbers are highlighted. There are numerous slight differences, but they are obviously very similar: the banking thresholds are almost identical, and the state preference numbers deviate only slightly.


Tables 7W and 7E compare these same two strategies but with the scoring goal increased to 30,000 points. Here the two strategies are seen to be almost identical for these low turn score states. The two strategies do appear to be converging as the scoring goal increases. While this doesn't assure strategy correctness, it offers convincing evidence of correctness in at least this limiting case.


The calculated strategy is nonoptimal when playing in regions of the state space where either your banked score, or your opponent's banked score is near the lower bound. This discrepancy between the calculated strategy and the truly optimal strategy is further exacerbated as the consecutive farkle count associated with a banked score near the bound increases.
To estimate the extent of the discrepancy, I lowered the bound on banked scores from 2500 to 3000 and regenerated the strategy matrix. One expects the relative deviation between the win probabilities of corresponding states in the two strategies to decrease as the minimum of the player's banked scores increases. To see the rate of convergence, for each possible banking score $B$ ($2500 <= B < 10,000$), I found the maximum relative deviation between win probabilities (calculated as ${{2w1w2}\over{w1+w2}}$) over all game states satisfying $min(b,d) >= B$. Table 8 lists selected results. Note that each time B increases by 500 points (which just happens to be the triple farkle penalty), the maximum relative deviation consistently decreased by 2 orders of magnitude (with one of those two orders of magnitude coming with the last 50 point increase). Given this exponential convergence rate, for any given $B$, the maximum relative deviation between the strategies for $L=2500$ and $L=\infty$ must be only slightly greater (on the order of 1%) than the maximum relative deviation between the strategies for $L=2500$ and $L=3000$.
B  Maximum Relative Deviation 

2500  0.296463834 
2050  0.028572943 
2000  0.003438359 
1550  0.000249319 
1500  0.000029516 
1050  0.000002074 
1000  0.000000234 
550  0.000000016 
500  0.000000001 
Note that the state win probabilities for two calculated strategies do not have to be identical for the rules about when to bank, and when to roll, and how to score the dice in each situation to be to be identical. So how close do the calculated win probabilities and the optimal win probabilities need to be to make it likely we are playing with a truly optimal strategy? Using state preference charts like those shown in the previous section, I compared complete turn strategies for all combinations of state variables $b$, $d$, $f$, and $e$. If two state preference charts are identical, then the strategies are identical. Once I found a bizarre state preference discrepancy associated with a relative difference in win probability on the same order as the convergence threshold for the entire state matrix. I ignored this and any others like it that might be lurking in the tables. Table 9 shows the banked score and consecutive farkle count thresholds above which no more strategy discrepancies otherwise appeared. So for example if both you and your opponent have at most 1 consecutive farkle, and you both have banked scores of at least 1500 points, then there are no discrepancies between the two strategies.
So although there's no way to pick a lower bound $L$ that will assure all turn strategies will be truly optimal for a given working range of banked scores and consecutive farkle counts, driving the maximum relative deviation down to within about 1 part in 10^{8} seems to assure that the vast majority of turn strategies (and likely all of them) are optimal. Given the convergence rate seen above, this means that as long as both players are about 2000 points above the lower bound on banked scores, all turn strategies are likely to be optimal.
Minimum Banked Score  Maximum Farkle Count  Maximum Relative Deviation 

1500  0  0.000001175 
1200  1  0.000001529 
700  2  0.000000070 
Extended forms of a Markov Decision Process (MDP) and value iteration were devised that allow more efficient modeling of 2player Farkle than is possible with a standard MDP. Using this model, the strategy that maximizes win probability using Facebook scoring rules has been determined under the constraint that banked scores are lower bounded to a large negative score. The strategy is shown to be unaffected by this bound in regions where banked scores are at least 2000 points above the bound. The strategy appears to converge (as expected) to the strategy that maximizes expected turn score as the endofgame scoring goal is increased.
With nearly a half billion game states, and so many different dice rolls and scoring options possible from each of these different states, it wasn't immediately obvious whether the problem was solvable with a home computer. I was quite pleased when I saw the first value iteration update complete, and even more pleased when I saw the matrix was actually converging! It was a nice problem in the sense that it's complexity puts finding the solution at the limits of what can be done with a home computer (or at least at the limits of what I can do with a home computer).
The optimal strategy showed many expected behaviors including more aggressive play when behind, and more conservative play when ahead. But it also showed many behaviors that were unexpected to me, such as playing aggressively even with a sizable lead when nearing the game goal of 10,000 points. Numerous times I was convinced I had discovered a flaw in the strategy (and therefore either a bug in my code or an error in my analysis) only to realize later that the strategy was actually reasonable.
The very limited view I've offered into the game winning strategy is unsatisfying. Time permitting, I intend to develop a web page that will allow you to explore the gargantuan optimal Farkle strategy in detail, and perhaps even play a game of farkle against a perfect computer opponent. I am motivated by your interest, so please leave a comment if you'd like to see it.
A Facebook Farkle game is won by the first player to bank 10,000 or more points, but there are other variations to endofgame rules. In Zilch, for example, the other player always gets a final go at the dice. Some people like to assure that both players get the same number of turns. In this latter case, the optimal strategies played by the two players are not even the same — with the player going second clearly having a strategic advantage. I think any of these could be modeled, but they are each very different, and not currently supported by my solver. These are ickier problems, and I don't intend to work on them anytime soon.
In this post I share a useful programming technique I first saw used twentysome years ago while reading through some C code. The technique combined aspects of an array and a linkedlist into a single data construct. I've here abstracted the technique into a reusable container I call a partition. (If someone knows of some library in some language that offers a similar container I'd appreciate a reference.)
A partition is a high performance container that organizes a set of N sequential integer IDs (which together are called the domain) into an arbitrary number of nonoverlapping groups. (I.e., each ID of the domain can be in at most one group.) The functionality of a partition includes these constant time operations:
None of the above operations use the heap and are each considerably faster than even a single push to standard implementations of a doubly linked list.
A partition has not one, but two fundamental classes: Domain and Group. The set of sequential IDs in a partition are defined by a Domain object; and a Group is a list of IDs from a Domain. Domain and Group each have two templatization parameters M and V. IDs can be bound to a userdefined member object of type M and Groups can be bound to a user defined value object of type V. Together these associations enable mappings from objects of type M (with known IDs) to objects of type V, and conversely from objects of type V (with known Groups) to a list of objects of type M.
The partition is potentially useful any time objects need to be organized into groups; and that happens all the time! In this post, I show how you can use a partition to track the states of a set of like objects. This is just one possible usage of a partition and is intended only as a tutorial example of how you can use this versatile storage class.
Suppose you have a set of N objects of some common type each having an independent state variable. Perhaps you have hardware monitoring the biterrorrates on N communications ports and it is your job to categorize them into signal quality states of CLEAR, DEGRADED or FAILED. Or perhaps you are tracking the BUSYIDLE states of N processors in a massively parallel computer processing environment. Or you might be tracking the activity of N threads in a thread pool. Obviously, such problems abound.
A common usecase for such systems is to find one or more objects in a particular state. For example, you may need to find one port in the CLEAR state for allocation to some new network communications service; or you may want to find and list all processors in the BUSY state to perform an audit. If the number of objects is small, you could simply iterate over them all until you find the objects with the desired state; but if you have many objects, this approach can be highly inefficient. To solve such problems for large sets, you will naturally use containers to hold objects that share a common state; but what type of container should you use?
For C++ you might group your objects into standard STL maps. Similarly, for Java you could use HashMaps from the java.util package. These classes provide convenient solutions, but such heavyweight containers may be prohibitively slow for some applications.
A lighterweight alternative is to use linkedlists, but then a state change of, for example, a circuit from CLEAR to DEGRADED now requires you first find the circuit in the CLEAR list so that it can be removed from that list before adding it to the DEGRADED list; this search can again be slow for large sets. In the C++ domain you can remedy this by giving each circuit object an iterator that tracks its own location in the linkedlist of which it is currently a member. This is actually an effective (if slightly cumbersome) solution. In the Java domain, such an approach is not possible if you restrict yourself to standard library containers which suffer from the limitation that iterators point between two members and are in any case necessarily invalidated whenever the container they reference is updated. (Oh, the horror! I need a Prozac!)
The partition container is specialized to handle exactly this type of problem and is both easier to use and faster than any of the above alternatives.
A partition is often useful for solving resource management problems. For our working example, we'll track the state of a box of eight crayons being shared by my two girls Becky and Cate to draw pictures. (This is not the most compelling resource management problem, but it is at least easy to understand and certainly doesn't look like any proprietary software I've ever written.) I want to show how you might model a resource that has multiple states, so we'll give each crayon both a User
state (which can be one of BECKY
, CATE
, or BOX
), and a Quality
state (which can be one of SHARP
or DULL
). The programming problem is merely to efficiently keep track of this state information for all eight crayons and support a variety of use cases, like "find a sharp crayon in the box", "get the state of the red crayon", or "return all of Cate's crayons to the box".
#include "partition.hpp" #include <string> #include <iostream> #include <iomanip> using namespace std; using namespace partition; enum Color { NONE_AVAILABLE = 1, RED, ORANGE, YELLOW, GREEN, BLUE, VIOLET, BROWN, BLACK, NUM_CRAYONS }; string crayonName[] = { "red", "orange", "yellow", "green", "blue", "violet", "brown", "black" }; enum User { BOX, CATE, BECKY, NUM_USER }; string userName[] = { "Box", "Cate", "Becky" }; enum Quality { SHARP, DULL, NUM_QUALITY }; string qualityName[] = { "Sharp", "Dull" }; class CrayonState { private: User user; Quality quality; public: CrayonState() {} CrayonState(User user, Quality quality) : user(user), quality(quality) { } User getUser() const { return user; } Quality getQuality() const { return quality; } friend ostream& operator << (ostream& os, const CrayonState& cs); }; inline ostream& operator << (ostream& os, const CrayonState& cs) { return os << "(" << setw(5) << userName[cs.user] << ", " << setw(5) << qualityName[cs.quality] << ")"; } typedef Group<string,CrayonState> CrayonGroup; inline ostream& operator << (ostream& os, const CrayonGroup& g) { os << g.getValue() << ":"; for(CrayonGroup::ConstIterator i = g.front(); !i.isAfterBack(); ++i) os << " " << (*i).member; return os; } class CrayonManager { private: Domain<string,CrayonState> crayons; CrayonGroup state[NUM_USER][NUM_QUALITY]; public: CrayonManager() { for(int i = 0; i < NUM_CRAYONS; ++i) crayons.addEntry(i, crayonName[i]); for(int u = 0; u < NUM_USER; ++u) { for(int q = 0; q < NUM_QUALITY; ++q) { state[u][q].setDomain(crayons); state[u][q].setValue(CrayonState(User(u),Quality(q))); } } state[BOX][SHARP].addAll(); } const CrayonState& getState(Color c) const { return crayons.getValue(c); } User getUser(Color c) const { return crayons.getValue(c).getUser(); } Quality getQuality(Color c) const { return crayons.getValue(c).getQuality(); } void setState(Color c, User u, Quality q) { if(c != NONE_AVAILABLE) { cout << CrayonState(u,q) << " < " << getState(c) << ": " << crayonName[c] << endl; state[u][q].addBack(c); } } void setUser(Color c, User u) { if(c != NONE_AVAILABLE) setState(c, u, getQuality(c)); } void setQuality(Color c, Quality q) { if(c != NONE_AVAILABLE) setState(c, getUser(c), q); } Color find(Quality q, User u = BOX) const { if(state[u][q].size() > 0) return (Color) state[u][q].peekFront().id; else return NONE_AVAILABLE; } Color findPreferred(Quality q = SHARP, User u = BOX) const { if(state[u][q].size() > 0) return (Color) state[u][q].peekFront().id; else return find(q == SHARP ? DULL : SHARP, u); } Color findPreferred(Color c, Quality q=SHARP, User u=BOX) const { if(getUser(c) == u) return c; else return findPreferred(q, u); } void moveAll(User from, User to, Quality q) { cout << CrayonState(to, q) << " < " << state[from][q] << endl; state[to][q].addBack(state[from][q]); } void moveAll(User from, User to) { for(int q = 0; q < NUM_QUALITY; ++q) moveAll(from, to, Quality(q)); } friend ostream& operator << (ostream& os, const CrayonManager& cm); }; inline ostream& operator << (ostream& os, const CrayonManager& cm) { for(int u = 0; u < NUM_USER; ++u) for(int q = 0; q < NUM_QUALITY; ++q) os << cm.state[u][q] << endl; return os; } int main(int argc, char** argv) { CrayonManager manager; Color c; // Becky grabs three crayons, preferrably orange, blue, and a sharp. // She dulls the first two. // c = manager.findPreferred(ORANGE); manager.setState(c, BECKY, DULL); c = manager.findPreferred(BLUE); manager.setState(c, BECKY, DULL); c = manager.findPreferred(); manager.setUser(c, BECKY); // Cate grabs two crayons, preferrably blue and green. // She dulls the first one. // c = manager.findPreferred(BLUE); manager.setState(c, CATE, DULL); c = manager.findPreferred(GREEN); manager.setUser(c, CATE); // Becky returns all her crayons to the box... // manager.moveAll(BECKY, BOX); // ...and then grabs a sharp one... // c = manager.find(SHARP); manager.setUser(c, BECKY); // ...and makes it dull... // manager.setQuality(c, DULL); // Cate returns her dullest crayon to the box. // c = manager.findPreferred(DULL, CATE); manager.setUser(c, BOX); // Cate notices the sharp ones are disappearing fast, // so she grabs all the remaining sharp ones... // manager.moveAll(BOX, CATE, SHARP); // Becky gets mad and shows dad. // cout << "\nFinal States:\n" << manager << endl; return 0; }
Note: bypass animation by doubleclicking the step buttons.
Define an enum for the 8 crayon colors. The enum values of these crayons (from 0 to 7) will form the ids of the Domain. Also define a string array that maps each crayon to its string name.
Likewise, define enums and string arrays for the User and Quality state variables.
Define a simple class to bind the two state variables together into a single compound state. One of these will be bound to each Group. They won't change value, so we won't need any setters.
Now create a class CrayonManager to track the crayon states. Start by giving CrayonManager a Domain to hold the crayons. We'll use the partition to map from a crayon color to a CrayonState; accordingly, the Domain has templatization parameters string and CrayonState. Alternatively, we could have defined two Domains (one to map crayons to user state, and another to map crayons to quality state), but it turns out to be more useful to have a single Domain that maps to a multivalued state.
Next we'll need some Groups. Each Group will hold all the crayons in a particular compound state, so we define a doublearray of Groups indexed by user state and quality state.
Initialize the domain by loading up all the crayon ids (zero through seven) and mapping each to its color string.
Initialize each Group by binding it to the Domain object and setting its value to the CrayonState it represents.
The last initialization step is to give all the crayons an initial state. We'll start with them all sharp and in the box.
Add a method to get the current compound state of a crayon. Remember that the state of a crayon is defined by its Group membership. The Domain's getValue() method finds the Group of the given crayon, and returns the userassigned value of that Group (which is just the CrayonState object we set above). I've also added a couple of convenience methods to return just the crayon's User state and Quality state.
The setState method sets a crayon's full compound state by moving that crayon to the Group representing the given state. Note that we only have to add the crayon to its new Group — the crayon is removed from its previous Group automatically. I've also provided convenience methods for setting each component of a crayon's state independently. For simplicity, I'll largely be ignoring exceptional conditions, but here I at least go to the effort to ignore attempts to set the state of the NONE_AVAILABLE crayon.
Add a method to find a crayon with a particular quality and user state; if no crayon with that compound state exists, return NONE_AVAILABLE.
findPreferred will try to find a crayon with the requested quality and user state. If none exists, it will look for a crayon with the opposite quality but that same user state. If the user has no crayons at all, it returns NONE_AVAILABLE.
Add an overloaded version of findPreferred that first looks for a particular color in the given user state, but if that fails, it works just like the first version of findPreferred above.
Add a method to transfer all crayons of a particular quality from one user to another, and another method to move all crayons (independent of quality) from one user to another.
So we can see what's going on, update the state setters to print the state changes that are taking effect, and define a streaming operator for the crayon manager itself that prints the current state of all crayons.
Finally, add a main routine to exercise things. To do it right, I'd have to define what to do when any of the find methods return NONE_AVAILABLE, but for this tutorial I just pass these failures onto the state setters which ignore them.
Compile the program and run it.
> g++ o crayons I ../src crayons.cpp
> ./crayons
(Becky, Dull) < ( Box, Sharp): orange
(Becky, Dull) < ( Box, Sharp): blue
(Becky, Sharp) < ( Box, Sharp): red
( Cate, Dull) < ( Box, Sharp): yellow
( Cate, Sharp) < ( Box, Sharp): green
( Box, Sharp) < (Becky, Sharp): red
( Box, Dull) < (Becky, Dull): orange blue
(Becky, Sharp) < ( Box, Sharp): violet
(Becky, Dull) < (Becky, Sharp): violet
( Box, Dull) < ( Cate, Dull): yellow
( Cate, Sharp) < ( Box, Sharp): brown black red
Final States:
( Box, Sharp):
( Box, Dull): orange blue yellow
( Cate, Sharp): green brown black red
( Cate, Dull):
(Becky, Sharp):
(Becky, Dull): violet
>
A software partition is an exceptionally fast container useful for organizing a set of like objects into groups. Applicable programming problems are common: I personally have used a partitionbased approach to achieve elegant solutions to circuit state modeling problems, problems in graph theory, and others. If you too are a software developer, I suspect you'll find uses for Partition as well if you only look for them.
This software is protected by the GNU general public license version 3. This is free software (as defined by the License), but I'd very much appreciate it if you leave a comment to let me know if and/or how you've found the software useful.
I have not gone to the effort to put together a commercial license, but if someone is interested, I'll make one available.
Although I've used partition programming techniques at multiple companies during my software career, this design and implementation is new and should be considered experimental. I'm not sure I'm satisfied with the public interface (particularly due to arguably cumbersome method names that resulted from the decision to not include a reverse iterator). Future versions may not be backwards compatible. This software has been extensively unittested, but only with gcc version 4.4.3 on Ubuntu 10.04.4. Please, please contact me if you discover platform compatibility problems; bugs; design deficiencies; and/or documentation errors (including misspellings or grammatical errors). Thanks!
Downloads  Documentation  
C++  partition_c++_0.1.1.tgz  0.1.1 
partition_c++_0.1.1.zip  
Java  partition_java_0.1.tgz  0.1 
partition_java_0.1.zip 
I've implemented a set of backtrack algorithms to find solutions to various polyomino and polycube puzzles (2D and 3D puzzles where you have to fit pieces composed of little squares or cubes into a confined space). Examples of such puzzles include the Tetris Cube, the Bedlam Cube, the Soma Cube, and Pentominoes. My approach to the problem is perhaps unusual in that I've implemented many different algorithmic techniques simultaneously into a single puzzle solving software application. I've found that the best algorithm to use for a problem can depend on the size, and dimensionality of the puzzle. To take advantage of this, when the number of remaining pieces reaches configurable transition thresholds my software can turn off one algorithm and turn on another. Three different algorithms are implemented: de Bruijn's algorithm, Knuth's DLX, and my own algorithm which I call mostconstrainedhole (MCH). DLX is most commonly used with an ordering heuristic that picks the hole or piece with fewest fit choices; but other simple ordering heuristics are shown to improve performance for some puzzles.
In addition to these three core algorithms, a set of constraints are woven into the algorithms giving great performance benefits. These include constraints on the volumes of isolated subspaces, parity (or coloring) constraints, fit constraints, and constraints to take advantage of rotational symmetries.
In this (rather long) blog entry I present the algorithms and techniques I've implemented and share my insights into where each works well, and where they don't. You can also download my software source, an executable version of the solver, and the solutions to various well known puzzles.
My original impetus for writing this software was an annoyingly difficult little puzzle called the Tetris Cube. The Tetris Cube consists of the twelve oddlyshaped plastic pieces shown in Figure 1 which you have to try to fit into a cubeshaped box. Each piece has a shape you could make yourself by sticking together little cubes of fixed size. You would need 64 cubes to make them all and accordingly, the puzzle box measures 4x4x4 — just big enough to hold them all. (This is an example of a polycube puzzle: a 3D puzzle where all the pieces are formed from little cubes. We'll also be looking at polyomino puzzles: 2D puzzles where all the pieces are formed from little squares.)
The appeal of the Tetris Cube to me is threefold. First, it's intriguing (and surprising to most folks) that a puzzle with only twelve pieces could be so wicked hard. I had spent much time in my youth looking for solutions to the far simpler (but still quite challenging) twodimensional Pentominoes puzzle, so when I first saw the Tetris Cube in a gaming store about three years ago, I knew that with the introduction of the third dimension that the Tetris Cube was an abomination straight from Hell. I had to buy it. Since then I've spent many an hour trying to solve it, but I've never found even one of its nearly 10,000 solutions manually.
Second, I enjoy the challenge of visualizing ways of fitting the remaining pieces into the spaces I have left, and enjoy the logic you can apply to identify doomed partial assemblies.
Third, I think working any such puzzle provides a certain amount of tactile pleasure. I should really buy the wooden version of this thing.
But alas, I think that short of having an Einsteinlike parietal lobe mutation, you will need both persistence and a fair amount of luck to find even one solution. If I ever found a solution, I think I'd feel not so much proud as lucky; or maybe just embarrassed that I wasted all that time looking for the solution. In this sense, I think perhaps the puzzle is flawed. But for those of you up for a serious challenge, you should really go buy the cursed thing! But do yourself a favor and make a sketch of the initial solution and keep it in the box so you can put it away as needed.
Having some modest programming skill, I decided to kick the proverbial butt of this vexing puzzle back into the fiery chasm from whence it came. My initial program, written in January 2010, made use of only my own algorithmic ideas. But during my debugging, I came across Scott Kurowski's web page describing the software he wrote to solve this very same puzzle. I really enjoyed the page and it motivated me to share my own puzzle solving algorithm and also to read up on the techniques others have devised for solving these types of puzzles. In my zeal to make the software run as fast as possible, over the next couple of weeks I incorporated several of these techniques as well as a few more of my own ideas. Then I stumbled upon Donald Knuth's Dancing Links (DLX) algorithm which I thought simply beautiful. But DLX caused me two problems: first it used a radically different data model and would not be at all easy to add to my existing software; second it was so elegant, I questioned whether there was any real value in the continued pursuit of this pet project.
Still I wasn't sure how DLX would compare to and possibly work together with the other approaches I had implemented. The following November, curiosity finally got the better of me and I began to lie awake at night thinking about how to to integrate DLX into my polycube solver software application.
The popular algorithms used to solve these types of problems are all recursive backtracking algorithms. With one algorithm that falls in this category you sequentially consider all the ways of placing the first piece; for each successful placement of that piece, you examine all the ways of placing the second piece. You continue placing pieces in this way until you find yourself in a situation where the next piece simply won't fit in the box, at which point you back up one step (backtrack) and try the next possible placement of the previously placed piece. Following this algorithm to its completion will examine all possible permutations of piece placements including those placements that happen to be solutions to the puzzle. This approach typically performs horribly. Another similar approach is to instead consider all the ways you can fill a single target open space (hole) in the puzzle; for each possible piece placement that fills the hole, pick another hole and consider all the ways to fill it; etc. This approach can also behave quite badly if you choose your holes willynilly, but if you make good choices about which hole to fill at each step, it performs much better. But in general you can mix these two basic approaches so that at each step of your algorithm you can either consider all ways to fill a particular hole, or consider all ways to place a particular piece. Donald Knuth gives a nice abstraction of this general approach that he dubbed Algorithm X.
To appreciate the true complexity of these types of problems it is perhaps useful to examine the Tetris Cube more closely. First note that most of the pieces have 24 different ways you can orient (rotate) them. (To see where the number 24 comes from, hold a piece in your hand and see that you have 6 different choices for which side faces up. After picking a side to face up, you still have 4 more choices for how to spin the piece about the vertical axis while leaving the same side up.) Two of the pieces, however, have symmetries that reduce their number of uniquely shaped orientations to just 12. For each orientation of a piece, there can be many ways to translate the piece in that orientation within the confines of the box. I call a particular translation of a particular orientation of a particular piece an image.
If we stick with the algorithmic approach of recursively filling empty holes to look for solutions, then we'll start by picking just one of the 64 holes in the puzzle cube (call the hole Z_{1}); and then onebyone try to fit each of the pieces in that hole. For each piece, all unique orientations are examined; and for each orientation, an attempt is made to place each of the piece's constituent cubes in Z_{1}. The size of a piece multiplied by its number of unique orientations I loosely call the complexity of a piece, which gives the total number of images of a piece that can possibly fill a hole. If, for example, a piece has 6 cubes and has 24 unique orientations, then 144 different images of that piece could be used to fill any particular hole. The complexity of the twelve Tetris Cube pieces are shown in Table 1. Each time a piece is successfully fitted into Z_{1}, our processing of Z_{1} is temporarily interrupted while the whole algorithm is recursively repeated with the remaining pieces on some new target hole, Z_{2}. And so on, and so on. Each time we successfully place the last piece, we've found a solution.
Piece Name  Size  Unique Orientations  Complexity 

A  6  24  144 
B  6  24  144 
C  5  24  120 
D  5  24  120 
E  6  24  144 
F  5  24  120 
G  5  12  60 
H  5  24  120 
I  5  24  120 
J  5  12  60 
K  5  24  120 
L  6  24  144 
The number of steps in such an algorithm cannot be determined without actually running it since the number of successful fits at each step is impossible to predict and varies wildly with which hole you choose to fill. It is useful however (or at least mildly entertaining) to consider how many steps you'd have if you didn't backup when a piece didn't fit, but instead let pieces hang outside the box, or even let two pieces happily occupy the same space (perhaps slipping one piece into Buckaroo Banzai's eighth dimension) and blindly forging ahead until all twelve pieces were positioned. In this case, the total number of ways to place pieces is easily written down. There are 12 × 11 × 10 . . . × 1 = 12! possible orderings of the pieces. And for each such ordering, each piece can be placed a number of times given by its complexity. So the total number of distinct permutations of pieces that differ either in the order the pieces are placed, or in the translation or orientation of those pieces is:
That's 2.2 decillion. The total number of algorithm steps would be a more than double that (since each piece placed also has to be removed). But this is just silliness: any backtracking algorithm worth its salt (and none of them are very salty) will reduce the number of steps to well below a quadrillion, and a good one can get the number of steps down to the tens of millions. I now examine some specific algorithms and explain how they work.
The first algorithm examined was first formulated independently by John G Fletcher and N.G. de Bruijn. I first stumbled upon the algorithm when reading Scott Kurowski's source code for solving the Tetris Cube. To read Fletcher's work you'll either need to find a library with old copies of the Communications of the ACM or drop $200.00 for online access. (I've yet to do either.) De Bruijn's work can be viewed online for free, but you'll need to learn Dutch to read it. (It's on my todo list.) Despite my ignorance of the two original publications on the algorithm, I'll take a shot at explaining it here. With no intended sleight to Fletcher, from here on, I simply refer to the algorithm as de Bruijn's algorithm. (I feel slightly less foolish calling it de Bruijn's algorithm since I have at least examined and understood the diagrams in his paper.)
De Bruijn's algorithm takes the tack of picking holes to fill. Now I previously said that when filling a hole, that for each orientation of each piece, an attempt must be made to place each of the piece's constituent cubes in that hole; but with de Bruijn's technique, only one of the cubes must be attempted. This saves a lot of piece fitting attempts. To understand how this works, first assume the puzzle is partially filled with pieces. De Bruijn's algorithm requires you pick a particular hole to fill next. A simple set of nested for loops will find the correct hole. The code could look like this:
GridPoint* Puzzle::getNextBruijnHole() { for(int z = 0; z < zDim; ++z) for(int y = 0; y < yDim; ++y) for(int x = 0; x < xDim; ++x) if(grid[x][y][z]>occupied == false) return grid[x][y][z]; return NULL; }
This search order is also shown visually on the left hand side of Figure 2.
Because of the search order there can be no holes in the puzzle with a lesser z value than the target hole. Similarly, there can be no holes with the same z value as the target hole having a lesser y value. And finally, there can be no hole with the same y and z values as the target hole having a lesser x value.
Now consider a particular orientation of an arbitrary piece like the one shown in the center of Figure 2. Because there can be no holes with a lesser z value than the target hole, it would be pointless to attempt to place either of its two top cubes in the hole. That would only force the lower cubes of the piece into necessarily occupied GridPoints of the puzzle. So only those cubes at the bottom of the piece could possibly fit in the target hole. But of those three bottom cubes, the one with the greater y value (in the foreground of the graphic) can't possibly fit in the hole because it would force the other two bottom tier pieces into occupied puzzle GridPoints at the same height as the hole but with lesser y value. Applying the same argument along the x axis leads to the conclusion that for any orientation of a puzzle piece, only the single cube with minimum coordinate values in z, y, x priority order (which I call the root tiling node of the piece) can possibly fit the hole. This cube is highlighted pink in Figure 2.
So with de Bruijn's algorithm, a piece with 6 cubes and 24 orientations would only require 24 fit attempts instead of 144 at a given target hole. This allows the algorithm to fly through piece permutations in a hurry.
De Bruijn's paper focused on the 10x6 pentomino puzzle, perhaps the most famous of all polyomino and polycube puzzles. The puzzle pieces in this problem consist of all the ways you can possibly stick 5 squares together to form rotationally unique shapes. There are 12 such pieces in all and each is given a standard single character name. Figure 3 shows the twelve pieces with their names as well as one of the 2339 ways you can fit these pieces into a 10x6 box. To be accurate, de Bruijn only used the algorithmic steps described above to place the last 11 pieces in this puzzle. He forced the X piece to be placed first in each of seven possible positions in the lower left quadrant of the box. This was done to eliminate rotationally redundant solutions from the search, and significantly sped up processing. But where possible I'd like to avoid optimization techniques that require processing by University professors. So when I speak of the de Bruijn algorithm, I do not include this special case processing. This restriction significantly weakens the algorithm. (I found it to take ten times longer to find all solutions to the 10x6 pentomino puzzle without this trick.) As I explain later, I've implemented an image filter that can constrain a piece to eliminate rotationally redundant solutions from the search. Applying this filter to the 10x6 pentomino puzzle algorithmically reproduces de Bruijn's constraint on the X piece.
In Figure 4, I've captured an excerpt of de Bruijn's algorithm working on the 10x6 pentomino puzzle at a point where it's behaving particularly badly. It reveals an interesting weakness of the algorithm: it can be slow to recognize a position in the puzzle that's clearly impossible to fill. The algorithm doesn't recognize this problem until it selects the troublesome hole as a fill target, but even then it won't back up all the way to the point where the hole is freed from its confinement: it only backs up one step. So depending on how far away the isolated hole is in the hole selection sequence at the time the hole appeared, it may get stuck trying to fill the hole many many times. Because pentominoes pieces are all fairly small, and because the algorithm uses a strict packing order from bottom to top and from left to right, such troublesome holes can never be that far away from the current fill target and are thus usually discovered fairly quickly. The example I've given may be among the most troublesome to the algorithm, but things can get worse if you are working with larger pieces, or if you are working in 3 dimensions instead of 2. In either case, unfillable holes can appear further down the hole selection sequence and the algorithm can stumble over them many more times before the piece that created the problem is finally removed.
The next algorithm examined does not suffer from this weakness.
Dancing Links (DLX) is Donald Knuth's algorithm for solving these types of puzzles. The DLX data model provides a view of each remaining hole and each remaining piece and can pick either a hole to fill or a piece to place depending on which (among them all) is deemed most advantageous.
Knuth's own paper on DLX is quite easy to understand, but I'll attempt to summarize the algorithm here. Create a matrix that has one column for each hole in the puzzle and one column for each puzzle piece. So for the case of the Tetris Cube the matrix will have 64 hole columns + 12 piece columns = 76 columns in all. We can label the columns for the holes 1 through 64, and the columns for the pieces A through L. The matrix has one row for each unique image. (Only images that fit wholly inside the puzzle box are included.) If you look at one row that represents, say, a particular image of piece B, it will have a 1 in column B and a 1 in each of the columns corresponding to the holes it fills. All other columns for that row will have a 0. (Those are the only numbers that ever appear in the matrix: ones and zeros.) Now, if you select a subset of rows that correspond to piece placements that actually solve the puzzle, you'll notice something interesting: the twelve selected rows together will have a single 1 in every column. And so the problem of solving the puzzle is abstracted to the problem of finding a set of rows that cover (with a 1) all the columns in the matrix. This is the exact cover problem: finding a set of rows that together have exactly one 1 in every column. With Knuth's abstraction there is no distinction between filling a particular hole, or placing a particular piece; and that is truly beautiful.
In each iteration of the algorithm, DLX first picks a column to process. This decision is rather important and I discuss it at length below. Once a column is selected, DLX will in turn place each image having a 1 in that column. For each such placement, DLX reduces the matrix removing every column covered by the image just placed, and removing every row for every image that conflicts with the image just placed. In other words, after this matrix reduction, the only rows that remain are for those images that still fit in the puzzle, and the only columns that remain are for those holes that are still open and for those pieces that have yet to be placed. Knuth uses some nifty linked list constructions to perform this manipulation, which you can read about in his paper if interested.
DLX maintains the total number of ones in each column as state information. If the number of ones remaining in any column hits zero, then the puzzle is unsolvable with the current piece configuration and so the algorithm backtracks. The situation in Figure 3 that gave the de Bruijn algorithm so much trouble gives DLX no trouble at all: it immediately recognizes that the matrix column corresponding to the hole that can't be filled has no rows with a one and backtracks, removing the piece that isolated the hole immediately after it was placed.
Some benefits of DLX are:
As noted above, the first step in each iteration of the algorithm is to pick a column to process. If the column selected is a hole column, then the algorithm will onebyone place all images that fill that hole. If the column selected is a piece column, then the algorithm will onebyone place all the images of that piece that fit anywhere in the puzzle. There are any number of ways to determine this column selection order, which Knuth refers to as ordering heuristics. Knuth found that the minimum fit heuristic (simply picking the column that has the fewest number of ones) does well. Using this selection criteria, DLX will always pick the more constrained of either the hole that's hardest to fill or the piece that's hardest to place. By reducing the number of choices that have to be explored at each algorithmic step, the total number of steps to execute the entire algorithm is greatly reduced. In the case of the Tetris Cube with one piece rotationally constrained (to eliminate rotationally redundant solutions), the de Bruijn algorithm places pieces in the box almost 8 billion times, whereas DLX running with the minfit heuristic places pieces only 68 million times: a reduction in the number of algorithmic steps by two orders of magnitude. (Remember though that each DLX step requires much more processing and DLX was actually only twice as fast as de Bruijn for this problem.)
Knuth stated in the conclusions of his paper, "On large cases [DLX] appears to run even faster than those specialpurpose algorithms, because of its [minfit] ordering heuristic." But I don't think things are quite this simple. I have found that for larger puzzles, the minfit heuristic is often only beneficial for placing the last N pieces of the puzzle where the number N depends upon both the complexity of the pieces and upon the geometries of the puzzle. I also believe that using the minfit heuristic for more than N pieces can actually negatively impact performance relative to other simpler ordering heuristics.
To see the problem, we need a larger puzzle: let's up the ante from pentominoes to hexominoes. There are 35 uniquely shaped free hexominoes shown in Figure 5. Each piece has area 6 so the total area of any shape constructed from the pieces is 210 — 3.5 times the area of a pentomino puzzle.
Consider first a hexomino puzzle shaped like a parallelogram consisting of 14 rows of 15 squares each stacked in a sloping stairstep pattern. Figure 6 shows one solution to this puzzle. (As I explain in a later section, you can't actually pack hexominoes in a rectangular box, so the parallelogram is one of the simplest feasible hexomino constructions.) The first time I ran DLX on this puzzle I used a onetime application of a volume constraint filter which throws out a bunch of images up front that can't possibly be part of a solution (see below). It ran for quite some time without finding a single solution. A trace revealed that DLX had placed the first few pieces into the rather unfortunate construction shown in Figure 7. Note the small area at the lower left that has almost been enclosed. Every square in that area has many ways to cover it, so the minfit heuristic didn't consider this pocket very constrained and ignored it. There is actually no way to fill all the squares: the only piece that could fill it has already been placed on the board. DLX didn't recognize the problem and so continued to try to solve the puzzle. I call such a well concealed spot in the puzzle that can't be filled a landmine.
This behavior is exactly similar to the problem the de Bruijn algorithm exhibited in Figure 4: DLX can also create spaces in the puzzle that can't possibly be filled, not immediately see them, and stumble upon them many times before dismantling the problem. It is interesting that the de Bruijn algorithm is actually less susceptible to this particular pitfall. Although de Bruijn's algorithm can also create landmines, it can't wander all over the puzzle before discovering them. DLX running with the minfit heuristic is able to wander farandwide filling in holes it thinks more constrained than the landmine; finally step on it; get stuck; back up a little and wander off in some other direction. And because the landmine created in Figure 7 was created so early in the piece placement sequence, there were many ways for DLX to go astray: it took almost two million steps for DLX to dismantle this landmine. (I decided not to make a movie of this one.)
As a second example, consider the boxinadiamond hexomino puzzle of Figure 8. Due to the center rectangle, DLX frequently partitions the open space into two isolated regions as shown. Each time the open space is divided, there's only 1 chance in 6 that the two areas will have a volume that can possibly be filled. Out of 1000 runs of the solver each using a different random ordering of the nodes in each DLX column, 842 runs resulted in a partitioning of the open space into two (or more) large unfillable regions before the eleventhtolast piece was placed. When such a partition is created, DLX examines every possible permutation of pieces that fails to fill (at least one of) the isolated regions.
So here again, the minfit ordering heuristic has lead to the creation of a topology that can't possibly be filled. And again, DLX can't see the problem, and wastes time exhaustively exploring fruitless branches of the search tree. De Bruijn's algorithm can also be made to foolishly partition the open space of puzzles: if you ask the de Bruijn algorithm to, say, start at the bottom of a Ushaped puzzle and fill upwards, it will inevitably partition the open space into the left and right columns of the U. But aside from such cruel constructions, de Bruijn's algorithm is relatively immune to this pitfall.
When I first saw these troubles, my faith in the minfit heuristic was unshaken: the extreme reduction in the number of algorithmic steps seen for the Tetris Cube and other small puzzles had me convinced it was the way to go. So I built landmine sweepers and volume checkers to provide protection against these pitfalls. These worked pretty well, but as I thought about what was happening more, I began to doubt the approach. As you pursue the most constrained hole or piece, you end up wandering around the puzzle space haphazardly leaving a complex topology in your wake. This strikes me as the wrong way to go: it's certainly not the way I'd work a puzzle manually. When you finally get down to the last few pieces you are likely to have many nooks and crannies that simply can't all be filled with the pieces that are left. And I think that is ultimately the most important thing with these kinds of puzzles: you want to fill the puzzle in such a way that when you near completion you have simple geometries that have a higher probability of producing solutions.
One could argue that wandering around the board spoiling the landscape is a good thing! Won't that make it more likely to find a hole or a piece with very few fit options; reducing the number of choices that have to be examined for the next algorithmic step and ultimately reducing the run time of the entire algorithm? I used to have such thoughts in my head, but I now think these ideas flawed. When solving the puzzle, your current construction either leads to solutions or it doesn't. You want to minimize how much time you spend exploring constructions that have no solutions. (The day after I wrote this down, Gerard sent me an email saying exactly the same thing...so I decided it was worth putting in bold.) The real advantage of picking the minfit column is not that it has fewer cases to examine, but rather that there's a better chance that all the fit choices lead quickly to a dead end. In other words, by using the minfit heuristic DLX tends to more quickly identify problems with the current construction that preclude solutions, and more quickly backtrack to a construction that does have solutions. The problem with this approach is that as it wanders about examining the most constrained elements, it can create more difficulties than it resolves.
For large puzzles, instead of looking for individual holes or pieces that are likely to have problems, I think it is better to look at the overall topology of the puzzle and fill in regions that appear most confined (a macro view instead of a micro view). By strategically placing pieces so that at the end you have a single simply shaped opening, you will find solutions with a higher probability. This is just another way of saying that there is a high probability that you are still in a construction that has solutions — which is your ultimate goal.
So if the puzzle is shaped like a rectangle, start at one narrow side and work your way across to the other narrow side. If the puzzle is shaped like a U, then fill it in the way you'd draw the U with a pencil: down one side, around the bottom and up the other side. If the puzzle is a five pointed star, fill in the points of the star first, and leave the open space in the middle of the star for last. (Hmmm, or maybe it would be better to finish heading out one point? I'm not sure.)
So if what I say is true, then why does the minfit heuristic work so well for the Tetris Cube? I think the minfit heuristic works well once the number of pieces remaining in a puzzle drops to some threshold which depends on the complexity of the pieces and the geometries of the puzzle. Because Tetris Cube pieces are rather complicated, and because the geometry of the puzzle is small relative to the size of these pieces, the minfit heuristic works well for that puzzle from the start.
Knuth explored the possibility of using the minfit heuristic for the first pieces placed, but then simply choosing holes in a predefined order for the last pieces placed. The thinking was that for the last few pieces, picking the most constrained hole or piece doesn't buy you much and you're better off just trying permutations as fast as you can and skipping the search for the most constrained element (and skipping the maintenance of the column counts that support this search). Knuth was not able get any significant gains with this technique. I propose the opposite approach: initially, deterministically fill in regions of the puzzle that are most confined, then when you've worked your way down to a smaller area (and placement options are more limited) start using the minfit heuristic.
To explore this idea, my solver supports a few alternative ordering heuristics. You can turn off one heuristic and enable another when the number of remaining pieces hits some configured threshold. One available heuristic (named heuristic x) has these behaviors:
So I ran the solver against the 15x14 parallelogram initially applying the x heuristic, but switching to the minfit heuristic when the number of remaining pieces hit some configured number. Unfortunately, an exhaustive examination of the search tree for this puzzle is not feasible. Instead, I used MonteCarlo techniques to estimate the best time to stop using the x heuristic and start using the minfit heuristic. Each data point shown in Figure 9 shows the average number of solutions found per second over 10,000 runs of the solver each initialized with a different random ordering of the nodes in each column of the DLX matrix. Each run was terminated the first time the 16thtolast piece was removed from the puzzle. (In other words, once the 16thtolast piece is placed, the solver examined all possible ways to place the last 15 pieces, but then terminates that run.) Solutions to this puzzle tend to appear in great bursts, and even at 10,000 runs I think there is quite a bit of uncertainty in these solution rates. It should also be noted that the DLX processing load for the early piece placements is enormous compared to the latter pieces. Terminating algorithm processing each time the 16thtolast piece is removed means the great efforts expended to reduce the matrix were largely wasted. This results in reduced average performance.
Despite these weaknesses the analysis offers evidence that when there are more than approximately 20 pieces left to be placed in this puzzle, there is no real benefit to using the minfit heuristic. In fact, using the minfit heuristic beyond 20 pieces seems to show some slight degradation in performance; although the last data point (where the minfit heuristic is used for the last 34 pieces placed) seems to again offer a performance increase. This could be a statistical fluke, but I rather suspect there is some significant benefit to filling the two opposite acute corners of the puzzle early. The xheuristic simply ignores the tight corner at the opposite side of the puzzle. It is my suspicion that as puzzle sizes increase application of the minfit heuristic across the entire puzzle will result in ever worsening performance relative to heuristics that (at least initially) pack confined regions of the puzzle first; but larger puzzles are exceedingly difficult to analyze even with MonteCarlo techniques.
Depending on the geometry of the puzzle, it can be even more important to follow your intuition. Gerard's Polyomino Solution Page includes a puzzle similar to the one shown in Figure 10. This puzzle, however, is 2 squares longer and somewhat more difficult to solve. I was unable to find any solutions to this puzzle through exclusive use of the minfit heuristic even after hours of execution time; but by initially picking the holes farthest from the geometric center of the puzzle (my "R" heuristic) and then switching to minfit heuristic once the lessconfined central cavity was reached I averaged about 9 solutions per minute over 6 hours of run time.
DLX is quite crafty, but all the linked list operations can be overly burdensome for small to medium sized puzzles. In a headtohead match up, my implementation of de Bruijn's algorithm runs more than 6 times faster than DLX for the 10x6 pentomino puzzle (and remember — that's without the de Bruijn algorithm using the trick of placing the X piece first.) For the more complex three dimensional Tetris Cube puzzle, DLX fairs much better, but still takes more than twice as long to run as de Bruijn's algorithm.
The Most Constrained Hole (MCH) algorithm attempts to reap at least some of the benefits of DLX without incurring the high cost of its matrix manipulations.
I present this algorithm third, because the story flows better that way, but it is actually the first polycube solving algorithm I implemented and is of my own design. I make no claims of originality: it is a simple idea and others have independently devised similar algorithms. I first implemented a variant of this technique to solve pentomino puzzles one afternoon in the summer of 1997 whilst at a TCL/TK training class in San Jose, CA.
MCH simply chooses the hole that has the fewest number of fit possibilities as the next fill target. Therefore DLX (when using the min fit ordering heuristic) and MCH only deviate in their decision making process in those situations where a piece turns out to be more constrained than any hole. To find the MCH, the software examines each remaining hole in the puzzle, and for each of those holes it counts the number of ways the remaining pieces can be placed to fill that hole. The MCH is then taken to be the hole with the fewest fits.
Although DLX will sometimes choose a piece to place next rather than a hole to fill, the biggest difference between MCH and DLX is not in the stepbystep behavior of the two algorithms, but rather in their implementation. My polycube solving software has as one of its fundamental data structures an object called a GridPoint. During Puzzle initialization I create an array of GridPoint objects with the same dimensions as the Puzzle. So for the Tetris Cube I create a 4x4x4 matrix of GridPoints. To support MCH, at each GridPoint I create an array of 12 lists — one list for each of the twelve puzzle pieces. The list for piece B at GridPoint (3, 1, 2) contains all the images of piece B that fill the hole at grid coordinates (3, 1, 2). To count the total number of fits at (3, 1, 2) I traverse the image lists for all the pieces that have not yet been placed in the puzzle and count the total number that fit. To find the most constrained hole, I perform this operation at every open hole in the puzzle and take the hole with the minimum fit count.
Recall that for the de Bruijn algorithm a piece of size 6 with 24 unique rotations only requires 24 fit attempts, but that only works because the algorithm restricts itself to filling the hole with minimum coordinate values in z, y, x priority order. For MCH the image lists must contain all the images of a piece that fill a hole. So a piece with 6 cubelets and 24 unique rotations would have nominally 144 entries in the list. (I actually throw out images that don't fit inside the puzzle box as discussed in the section on image filters below.) So these lists can be rather long, and many lists at many holes have to be checked to find the MCH.
The whole idea sounds loopy I know, but for the case of the 10x6 pentomino puzzle, MCH runs 25% faster than DLX (which still makes it almost 5 times slower than de Bruijn). For the case of the Tetris Cube, MCH is the fastest of the three algorithms running about 2.5 times faster than DLX and about 10% faster than de Bruijn.
The solver also includes a variant of MCH that only considers those holes with the fewest number of neighboring holes. I call this estimated MCH (EMCH). This approach sometimes gets the true MCH wrong, but overall seems to perform better — about 25% faster for 10x6 Pentominoes and more than a third faster for the Tetris Cube.
I think for larger puzzles, when the number of images at each grid point starts to increase by orders of magnitude, this approach of explicit fit counting will break down. There are other ways you can estimate the MCH: I had one MCH variant that didn't count fits at all, but rather looked purely at the geometry of the open spaces near a hole to gauge how difficult it was going to be to fill. In any case, I only apply MCH on smaller 3D puzzles because this is where I've found it to outpace the other two algorithms.
Each of the three algorithms examined had different strengths. When there are very few pieces, the simple de Bruijn algorithm had best performance. For medium sized 3D puzzles, EMCH performed best. Only DLX can choose to iterate over all placements of a piece which can provide huge performance benefits in the right situation. (See, for example, the section on rotational redundancy constraint violations.) Also the ability to define different ordering heuristics makes DLX quite useful for large puzzles with nontrivial topologies.
To allow the best algorithm to be applied at the right time to the right problem, I've implemented all three algorithms into a single puzzle solving application with the capability to turn off one algorithm and turn on another when the number of remaining pieces reaches configured thresholds. As you shall see, this combined algorithmic approach gives much improved performance for many puzzles.
I still have the broad topic of constraints to discuss, but I first want to share some software optimizations I've made on the de Bruijn and MCH algorithms. Together these software optimizations reduced the time to find all Tetris Cube solutions with these algorithms by about a factor of five. (This probably just means my initial implementation was really bad, but I think the optimizations are still worth discussing.)
I was originally tracking the occupancy state of the puzzle via a flag named occupied in each GridPoint object. To determine if an image fit in the puzzle, this flag was examined for each GridPoint used by the image. Most of the popular polyomino (e.g., Pentominoes) and polycube puzzles (e.g., Soma, Bedlam, Tetris) have an area or volume of not more than 64. This is rather convenient as it allows one to model the occupancy state of any of these puzzles with a 64bit field. So I ditched all the occupied flags in the GridPoint array and replaced them all with a single 64 bit integer variable (named occupancyState) bound to the Puzzle as a whole. Each image object was similarly modified to store a 64bit layoutMask identifying the GridPoints it occupies. To see if a particular image fits in the puzzle you now need only perform a binaryand of the puzzle's occupancyState with the image's layoutMask and check for a zero result. To place the piece, you need only perform the binaryor of the puzzle's occupancyState with the image's layoutMask and store that result as the new occupancyState. This is really greasy fast and cut the run times by more than a factor of two.
The only downside to this approach is that it prevents you from solving puzzles that are larger than the size of the bit field. You could increase the size of the field, but this quickly starts to wipe out the benefit. But you can still take advantage of the performance benefit of bit masks for puzzles that are bigger than size 64 by simply using DLX until the total volume of remaining pieces is 64 or less. Then you can morph the data model into a bitoriented form and use the MCH or de Bruijn algorithms to place the last several pieces (which is the only time speed really matters). For very large puzzles (e.g., a heptominoes puzzle) I think this approach will break down: by the time you get down to an area of size 64 the search tree is collapsing and it's probably too late for a data model morph to pay off.
The MCH routine examines different holes remaining in the puzzle and finds the number of possible fits for each of them. I modified the procedure that counts fits to simply return early once the number of fits surpasses the number of fits of the most constrained hole found so far. This trivial change sped the software up 20% for the Tetris Cube.
In my original implementation of both MCH and de Bruijn's algorithm, I was lazily using an STL set (sorted binary tree) to store the index numbers of the remaining pieces. (Some of you are rudely laughing at me. Stop that.) Only the pieces in this set should be used to fill the next target hole. The index of the piece being processed is removed from the set. If the piece fits, the procedure recurses on itself starting with the now smaller set. Once all attempts to place a piece are exhausted, the piece is added back to the set, and the next entry in the set is removed and processed. This worked fine, but STL sets are not the fastest thing in the galaxy. As you might imagine there's been lots of research on fast permutation algorithms (dating back to the 17th century). I settled on an approach that was quite similar to what I was already doing, but the store for the list of free index numbers is a simple integer array instead of a binary tree. An index is "removed" from the array by swapping its position with the entry at the end of the array. So my STL set erase and insert operations were replaced with a pair of integer swaps. This change improved the fastest run times by about another 20%.
The algorithms above observe the constraint that when a hole can't be fitted, or (in the case of DLX) a piece can't be fit they back up. But other constraints (beyond this obvious fit constraint) exist for polycube and polyomino puzzles which if violated prohibit solutions. My solver can take advantage of these constraints in two different ways. First I've implemented monitors that watch a particular constraint and when a violation is detected an immediate backtrack is triggered. Second, I've implemented a set of image filters that remove images that would violate constraints if used.
Let's first look at the technique of monitoring constraints during algorithm execution and triggering a backtrack when the constraint is violated.
I first read about the notion of parity at Thorleif's SOMA page where in one of his Soma newsletters he references Jotun's proof. It's a simple idea: color each cube in the solution space either black or white in a three dimensional checkerboard pattern and then define the parity of the solution space to be the number of black cubes minus the number of white cubes. When you place a piece in the solution space, the number of black cubes it fills less the number of white cubes it fills is the parity of that image. Suppose that the parity for some image of a piece is 2. If you move that piece one position in any ordinal direction, all of its constituent cubes flip color and the parity of the piece will become 2. But those would be the only possible parities you could achieve with that piece: either 2 or 2. So the magnitude of the parity of a piece is defined by its shape, but depending where you place the piece, it could be either positive or negative.
As you place pieces, the total parity of all placed pieces takes a random walk away from an initial parity of zero, but to achieve a solution the random walk must end exactly at the parity of the solution space. It is possible for the random walk to get so far from the destination parity that it is no longer possible to walk back before you run out of pieces. More generally, you can get yourself in situations where it's just not possible to use the remaining pieces to walk back to exactly the right parity.
It is possible to show that some puzzles can't possibly be solved because the provided pieces have parities that just can't add up to the parity of the solution. As an example, consider again the 35 hexominoes shown in Figure 5. The total area of these pieces is 35x6 = 210. It is quite tempting to try to fit these pieces in a rectangular box. You could try boxes measuring 14x15, 10x21, 7x30, 6x35, 5x42 or even 3x70. The parity of all of these boxes is 0, so our random parity walk must end at 0. Of the 35 hexominoes 24 have parity 0 and the other 11 have parity magnitude 2. Because there is no way to take 11 steps of size 2 along a number line and end up back at 0, there is no way to fit the 35 hexominoes into any rectangular box.
Knowing that certain puzzles can't be solved without ever having to try to solve them is quite useful, but how can we make more general use of the parity constraints to speed up the search for solutions in puzzles?
Knuth attempted to enhance the performance of DLX though the use of parity constraints for the case of a onesided hexiamond puzzle. The puzzle has four pieces with parity magnitude two (and the rest have parity zero). The puzzle as a whole has parity zero, so exactly two of these four pieces must be placed so their parity is 2 and two must be placed so their parity is 2. Knuth took the approach of dividing this problem into 6 subproblems, one for each way to choose the two pieces that will have positive parity. His expectation was that since each of the four pieces were constrained to have half as many images, that each subproblem would run 16 times as fast. Then, the total work for all 6 subproblems should be be only 6/16 of the work to solve the original (unconstrained) problem. But the total work to solve all 6 subproblems was actually more than the original problem. (I offer an explanation as to why this experiment failed below.)
I use a different approach to take advantage of parity constraints: simply monitor the parity of the remaining holes in the puzzle and if it ever reaches a value that the remaining pieces cannot achieve, then immediately trigger a backtrack.
To implement this paritybased backtracking feature, after each piece placement you must determine if the remaining puzzle pieces can be placed to achieve the parity of the remaining holes in the puzzle. This may sound computationally expensive, but it's not. Consider the Tetris Cube puzzle as an example. Piece A has parity 0, pieces B, E and L have a parity magnitude of 2, and the remaining eight pieces have a parity magnitude of 1. We can immediately forget about piece A since it has parity 0. So we have three pieces with parity magnitude 2 and eight pieces with parity magnitude 1. If you look at the parity of the pieces that are left at any given time, there are only (3+1) x (8+1) = 36 cases. During puzzle initialization I create a parity state object for each of these situations. So, for example, there is a parity state object that represents the specific case of having three remaining pieces of parity magnitude 1 and two remaining pieces of parity magnitude 2. In each of these 36 cases, I precalculate the parities that you can possibly achieve with that specific combination of pieces. I store these results in a boolean array inside the state object. So if you know your parity state, the task of determining if the parity of the remaining holes in the puzzle is achievable reduces to an array lookup. It looks something like this:
if ( ! parityState>parityIsAchievable[parityOfRemainingHoles] ) // force a backtrack due to parity violation else // parity looks ok so forge on
In addition to this boolean array, each parity state object also keeps track of its neighboring states in two arrays indexed by parity. One array is called place which you use to lookup your new state when a piece is placed; and the other is called unplace which you use to lookup your new state when a piece is removed. The only other task is to update the running sum of the parity of the remaining holes in the puzzle. So the processing for a piece placement looks like this:
parityState = parityState>place[parityOfPiecePlaced]; parityOfRemainigHoles = parityOfPiecePlaced;
and piece removal processing looks like this:
parityOfRemainigHoles += parityOfPieceRemoved; parityState = parityState>unplace[parityOfPieceRemoved];
Here, I'm using a double sided arrays so place[2] and place[2] actually take you to the same state, saving the trouble of calculating the absolute value of parityOfPiecePlaced.
So the cost of parity checking is quite small, but typically parity violations do not start to appear until the last few pieces are being placed. In the 10x6 pentomino puzzle, the first parity violations did not appear until the 9th piece was placed; and adding the parity backtrack trigger to the fastest solver configuration for that puzzle actually increased run times by about 8%. (So adding just the above 5 lines of code to the de Bruijn processing loop increased the work to solve the problem by 1 part in 12! Indeed, even the time required to process the if statements that are used to see if various algorithm features are turned on, or if trace output is enabled, etc, measurably impairs the performance of the de Bruijn algorithm for this puzzle.) For the Tetris Cube, parity violations started to appear after only 6 pieces were placed, and use of the parity monitor improved performance by about 3%.
This parity backtrack trigger technique leaves the algorithms blinded to the true constraints on pieces with nonzero parity; so parity constraints are only hit haphazardly as opposed to being actively sought out by the algorithms. There is likely some better way to take advantage of the parity constraints on a puzzle. Thorleif found that for the case of the the Soma cube puzzle, forcibly placing pieces with nonzero parity first improved performance markedly; but I am skeptical that such an approach would work well in general because typically it's so much better to fill holes, rather than place pieces. One approach might be to simply assign some fractional weight to the counts maintained for piece columns that have nonzero parity. This would gently coax DLX into considering placing them earlier. I have not pursued such an investigation.
Still, with the right puzzle, monitoring parity constraints can be more useful. I've reproduced the boxinadiamond puzzle layout in Figure 11 to call attention to the parity of this puzzle. I designed this puzzle to have the the interesting property that its parity is exactly 22, which is the maximum parity the 35 hexomino puzzle pieces can possibly achieve. Any solution to this puzzle requires all eleven hexomino pieces with nonzero parity to favor black squares. Figure 12 shows one such solution with the 11 pieces having non zero parity highlighted. Monte Carlo estimation techniques showed that enabling the parity backtrack trigger on this puzzle produces about a twothirds increase in performance. Although a substantial performance boost, this is less than I would have expected.
The volume backtrack trigger, when enabled, performs the following processing after each piece placed:
To find and measure the volumes of subspaces in step 1, I use a simple fill algorithm. Step 2 of the problem — determining whether a particular volume is achievable — is easy if all pieces have the same size; but to handle the problem generally (when pieces are not all the same size) I use the same technique used by the parity monitor above: I precalculate the achievable volumes for each possible grouping of remaining pieces and track the group as a state variable.
The solver allows you to configure the minimum number of remaining pieces required to perform volume backtrack processing. As long as you turn off the volume backtrack feature sufficiently early, its cost is insignificant.
I originally implemented this backtrack trigger to keep DLX from partitioning polyomino puzzles into unfillable volumes while it wandered about the puzzle pursuing constrained holes or pieces; but I now believe it's often better to initially follow a simple packing strategy that precludes the creation of isolated volumes. I think this backtrack trigger may still be useful for some puzzles once the minfit heuristic is enabled, but I have not had the time to study its effects.
In the previous sections we examined the technique of monitoring a constraint during algorithm execution and triggering a backtrack when the constraint is violated. Another (more aggressive) way you can take advantage of a constraint is to check all images onebyone to see if using them would result in a constraint violation. For each image that causes a violation, simply remove it from the cache of available images. Some of the image filters discussed below are applied only once before the algorithms begin. Other image filers can be applied repeatedly (after each piece placement). In my solver, image filters can only be applied when DLX is active because the linked list structure of DLX makes it easy to remove and replace images at each algorithmic step.
N.G. de Bruijn's original software used to solve the 10x6 pentomino problem predefined the layouts of the 12 puzzle pieces in each of their unique orientations. There are 63 unique orientations of the pieces in total, but the various possible translations of these orientations were not precalculated. This was a simple approach, and perhaps more importantly (in those days) made for a quite small program. This results, however, in the de Bruijn algorithm spending a lot of time checking fits for images that clearly fall outside the 10x6 box. These days, memory is cheap, so it is easy to improve on this basic approach. I've already explained the technique in the section on MCH above: for MCH I keep at each GridPoint a separate list for each piece which holds only those images of a piece that both fill the hole and actually fit in the puzzle box. I've done exactly the same thing for the de Bruijn algorithm: I created another array of image lists at each GridPoint, each list holding only the images of a particular piece that fill the hole with its root tiling node and also fit in the puzzle box. This completely eliminates all processing associated with placement attempts for images that don't even fit in the solution box.
By filtering out images that don't fit in the box, the average number of de Bruijn images at each hole in the 10x6 pentomino puzzle drops from 63 to 33.87  an almost 50% reduction. This should translate to a significant performance boost, though I can't say for sure since this is the only way I've ever implemented the algorithm.
If you picked up a Tetris Cube when it was already solved; turned it on its side; and then excitedly told your brother you found a new solution, you'd likely get thumped. Because the puzzle box is cubic, there are actually 23 ways to rotate any solution to produce a new solution that is of the same shape as the original. My software can filter out solutions that are just rotated copies of previously discovered solutions (just enable the unique command line option), but the search algorithms as described so far do actually find all 24 rotations of every solution (only to have 23 of them filtered out).
If by imperial decree, we only allow rotationally unique solutions, then it is possible to produce an image filter to take advantage of this constraint. If we simply fix the orientation of a piece that has 24 unique orientations, then the algorithms will only find rotationally unique solutions. Why does this work? If you fix the orientation of a piece, any solution you find is going to have that constrained piece in its fixed orientation; and the other 23 rotations of that same solution cannot possibly be found because those solutions have the constrained piece in some orientation that you never allowed to enter the box. Application of just this one filter reduced the time it takes DLX to find all solutions to the Tetris Cube from over seven hours down to about 20 minutes. Quite useful indeed.
It is possible to apply this same technique to puzzles that are not cubic; but instead of keeping the orientation of the piece completely fixed, you limit the set of rotations allowed.
But what if all of the pieces have fewer unique rotations than the puzzle has symmetric rotations? In this case you can also try constraining the translations of the piece within the solution box. This is slightly harder to do (it was for me anyway), and is not always guaranteed to eliminate all rotationally redundant solutions from the search. As an example try eliminating the rotationally redundant solutions from a 3x3x3 puzzle by constraining a puzzle piece that is a 1x1x1 cube. It can't be done. The best you can do is to constrain the piece to appear at one of four places: the center, the center of a face, the center of an edge and at a corner. This will eliminate some rotationally redundant solutions from the search, but not all.
A much harder problem is to try to eliminate rotationally redundant solutions from the search when none of the pieces in the puzzle have a unique shape. In this case, you can't simply constrain a single piece, but must instead somehow constrain in concert an entire set of pieces that share the same shape. I have some rough ideas on how one might algorithmically approach this problem, but I have not yet tried to work the problem out in detail.
For now, you can ask my solver to constrain any uniquely shaped piece so as to eliminate as many rotationally redundant solutions as possible. But even better, you can ask the solver to pick the piece to constrain. In this case it compares the results of constraining each uniquely shaped puzzle piece and picks the piece that will do the best job of eliminating rotationally redundant solutions. If two or more pieces tie, then it will pick the piece that after constraint has the fewest images that fit in the puzzle box. If for example you ask my solver to pick a piece for constraint on the 10x6 pentomino puzzle, it will pick X (the piece that looks like a plus sign), and constrain it so that it can only appear in the lowerleft quadrant of the box. This is exactly the approach de Bruijn took when he solved the 10x6 pentomino puzzle 40 years ago, but de Bruijn identified this as the best constraint piece through manual analysis of the puzzle and programmed it as a special initial condition in his software. With my solver, you need only add the option r to the command line.
Often times a piece that has been constrained will have so few remaining images that it becomes the best candidate for the first algorithm step. But of the algorithms I've implemented, only DLX will consider the option of iterating over the images of a single piece. So when running my solver with a piece constraint I usually use the DLX algorithm with a minfit heuristic for at least the first piece placement. For the 10x6 pentomino problem, if you turn on constraint processing (which constrains the images of the X piece), but fail to use DLX for the first piece placement you'll find the run time to be eight times longer.
This feature was far and away the most difficult part of the solver for me to design and implement. (Perhaps some formal education in the field of spatial modeling would have been useful.) I have copious comments on the approach in the software. There are two parts to the problem: I first identify which rotations of the puzzle box as a whole produce a new puzzle box of exactly the same shape. This is normally a trivial problem, but the solver also handles situations where some of the puzzle pieces are loaded into fixed positions. If some of those pieces have a shape in common with pieces that are free to move about, then things get tricky. Onesided polyomino problems (which the solver also handles) also add complexity. Once I know the set of rotations that when applied to the puzzle can possibly result in a completely symmetric shape, I apply a second algorithm that filters the images produced for a (uniquely shaped) piece through a combination of rotational and/or translational constraints that eliminate these symmetries and has the net effect of preventing the algorithms from discovering rotationally redundant solutions. For a more exacting description of these techniques, please read my software comments for the methods Puzzle::initSymmetricRotationAndPermutationLists() and for Puzzle::genImageLists().
You can also filter images based on parity constraints. So instead of waiting around for an image to be placed to trigger a parity backtrack; after each piece placement, you can look at the parity of each remaining image and determine if placing that image would introduce a parity violation; and if so, remove the image.
Of course I don't actually do the check for all images — only twice for each remaining piece with nonzero parity (once for positive parity and once for negative parity). If a violation would be introduced through the use of that piece when its parity is, say, negative, then I traverse the list of images for that piece and remove all the copies that have a negative parity. Also, the parity filter is skipped completely if the parity of the last piece placed was zero: nothing has changed in that case and it's pointless to look for new potential parity violations.
Applying the parity filter to the boxinadiamond puzzle causes the solver to filter out roughly half of the images of eleven pieces before DLX even starts. Replacing the parity backtrack trigger with the parity filter for this puzzle increased performance by more than 40%. In total, the solver running with the parity filter generates solutions 2.4 times as fast as it does without any parity constraintbased checks at all.
You can also use volume constraints to filter images. This is very much akin to using volume constraints to trigger backtracks, but instead of waiting around for an image to be placed that partitions the open space; you can instead, onebyone, place all remaining images in the puzzle and perform a volume check operation. This can be particularly useful as an initial step before you even set the algorithms to working. Of the 2056 images that fit in the 10x6 pentomino puzzle box, 128 of them are jammed up against a side or into a corner in such a way as to produce a little confined area that can't possibly be filled as seen in Figure 13. Searching for and eliminating these images up front improved my best run times for this puzzle by about 13%. This is the only technique I've found (other than the puzzle bounds filter discussed above) that actually improved performance for this classic puzzle.
The previous polyomino puzzles were all based on free polyominoes: polyomino pieces that you are free to not only rotate in the plane of the puzzle, but are also free to flip up side down; but there is another class of puzzles based on onesided polyominoes: polyomino pieces that you are allowed to rotate within the plane, but are not allowed to flip upsidedown. Where there are only twelve free pentominoes, there are eighteen uniquely shaped onesided pentominoes. Consider the problem of placing the eighteen onesided pentominoes into a 30x3 box as shown in Figure 14. Because pieces can actually reach from one long wall of this puzzle box to the other, 40% of the images (776 out of 1936) that fit in this box produce unfillable volumes. (See Figure 15.) Applying the volume constraint filter to the images of this puzzle improved performance by about a factor of nine.
Consider next another puzzle I came across at Gerard's Polyomino Solution Page: placing the 108 free heptominoes into a 33x23 box with 3 symmetrically spaced holes left over as shown in Figure 16. One of the heptomino pieces has the shape of a doughnut with a corner bit out of it. This piece is shown in red in Figure 16. There's no way for another piece to fill this hole, so heptomino puzzles always leave at least one unfilled hole. To solve this puzzle, the doughnut piece clearly must be placed around one of these holes; but none of the algorithms are smart enough to take advantage of this constraint and will only place the doughnut around a hole by chance. And this could take a very long time indeed! Applying the volume constraint filter to this problem, removes not only the images that produce confined spaces around the perimeter of the puzzle, but also all images of the doughnut piece except those few that wrap one of the prescribed holes. The DLX minfit heuristic will then correctly recognize the true inflexibility of this piece and place it first.
For 3D puzzles, I think it would be rare for pieces to construct a planar barrier isolating two volumes large enough to cause serous trouble; accordingly, I have not studied the effects of this filter on 3D puzzles.
In all of these examples I've only applied the volume filter once to the initial image set (prior to algorithm execution), but you can also apply the filter repeatedly, after each step in the algorithm (turning it off when the number of remaining pieces reaches some prescribed threshold). This should have the effect of giving DLX a better view of the puzzle constraints; but I haven't studied this primarily because my current implementation of the filter is so inefficient: at each algorithmic step each remaining image is temporarily placed and a graphical fill operation is performed to detect isolated volumes. This is simple, but the vast majority of these image checks are pointless. The next time I work on this project, I'll be improving the implementation of this filter which I hope will offer performance benefits when reapplied after each piece placement.
Another filter I've implemented is based on a nextstep fit constraint. By this I mean, if placing an image would result in either a hole that can't be filled, or a piece that can't be placed, then it is pointless to include that image in the image set. Running this fit filter on the 2056 images of the 10x6 pentomino puzzle finds all of the 128 images found by the volume constraint filter plus an additional 16 images like those shown in Figure 17. There can obviously be no puzzle solution with these piece placements. If the rotational redundancy filter is also enabled (which constrains the X piece to 8 possible positions in the lower left quadrant of the box), then the fit filter will eliminate 227 images. (There are numerous images that conflict with all of the constrained placements of the X piece.)
Note that running the fit filter twice on the same image set can sometimes filter additional images: on the first run you may remove images that were required for previously tested images to pass the fit check. If you run the fit filter twice on the 10x6 pentomino problem while the X piece is constrained, the number of filtered images jumps from 227 to 235. To do a thorough job you'd have to filter repeatedly until there was no change in the image set.
Although this filter is interesting, its current implementation is too slow to be of much practical use. I use DLX itself to identify fitconstraint violations and it takes 45 seconds to perform just the first fit filtering operation for the boxinadiamond hexomino puzzle on my 2.5 GHz P4. I suspect I could write something that does the same job much quicker, but I'm skeptical I could make it fast enough to be effective. Still, if your aim is the lofty goal of finding all solutions to this puzzle, this filter could prove worthwhile: 45 seconds is nothing for at least the first several piece placements of this sizable puzzle.
Some of the image filters discussed above are only run once before the algorithms begin. I wanted to share some insight as to why such filters sometimes don't give the performance gains you might expect.
As a first example, consider the effects of filtering pentominoes images that fall outside the 10x6 puzzle box. This cuts the total number of images that have to be examined by the de Bruijn algorithm at each algorithm step by almost a factor 2. In an extraordinary flash of illogic, you might conclude that since there are 12 puzzle pieces, the performance advantage to the algorithm would be a factor of 2^{12} = 4096. The problem with this logic is that the algorithm immediately skips these images as soon as they are considered for placement anyway.
For the same reason, filtering images that produce volume constraint violations before you begin running the algorithms do not give such exponential performance gains: such images typically construct tiny little confined spaces that the algorithm would have quickly identified as unfillable anyway.
But the filter that removes images of a single piece to eliminate rotational redundancies among discovered solutions seems different: the images removed are not images that will necessarily cause the algorithm to immediately backtrack and so you might reasonably expect the filter to not only reduce the number of solutions found for the Tetris Cube by a factor of 24 (which it does); but also to improve the overall performance of the algorithm by a factor of 24, but it only gave a factor of 21. (Close!)
Knuth expected that reducing the number of images of 4 pieces each by a factor of 2 (to take advantage of a parity constraint on a onesided hexiamond puzzle) would lead to a reduction in the work needed to solve the puzzle by a factor of 16, but the gains again fell far short of this expectation.
And although I wasn't expecting a performance improvement factor of 2^{11} from the parity filter for the boxinadiamond problem, I thought I'd get a lot more than a factor of 2.4 (which is all it gave me). This result was very surprising to me.
The problem in all of these cases is that you're trying to extract efficiencies from a search tree that is already significantly pruned by the algorithm. Here are some other observations that might be illuminating.
First, consider the case where you a priori force a piece whose images are to be filtered to be placed first; and then reduce the number of images of that piece by a factor of N. Then the number of ways to place that first piece is reduced by a factor of N. Assuming each of those placements originally had similar work loads, then the total work would indeed be reduced by a factor of N. But what if you always placed this piece last? Would performance still improve by a factor of N? Of course not! The vast majority of search paths terminate in failure before you even get to the last piece. Now assuming the piece is not being placed first, or last but is instead placed uniformly across the search tree, you'll find that a sizable percentage of search paths don't even use the filtered piece: they die off before that piece can be placed. Filtering the images of that piece won't reduce the weight from these branches of the search tree at all.
Second, the vast majority of the appearances of the piece will be high up in the branches of the search tree. At this part of the tree, the branching factor is small and obviously drops below 1 at some point. Because of this, when you prune the tree at one of the appearances of the constraint piece, you can't assume that the weight of the path left behind is negligible (even though that weight is shared by other paths).
These arguments are obviously imprecise and contain (at least) one serious flaw: the DLX algorithm (unlike the other two) can reshape the entire search tree after you constrain the piece to take advantage of its new constrained nature, but if during execution of the algorithm, the piece is still found to be less constrained than other elements of the puzzle, then the arguments above still apply. Even if DLX decides to place the newly constrained piece first (and it often does), the average branching factor will still not typically improve sufficiently to achieve a factor of N performance improvement.
Table 2 shows the performance results for a few different puzzles with many different combinations of algorithms, backtrack triggers and image filters. Many of these results have already been discussed in earlier sections but are provided here in detail. The run producing the best unique solution production rate is highlighted in yellow for each puzzle. The table key explains everything.
Using the unique solution generation rate as a means of comparing algorithm quality is flawed as these rates are not completely consistent from runtorun. The relative performance of the different algorithms can also change with the processor design because, for example, one algorithm may make better use of a larger instruction or data cache. I liked Knuth's technique of simply counting linked list updates to measure performance, but since I'm comparing different algorithms, such an approach seems difficult to apply.
Test Case  Algorithms  Image Filters  Backtrack Triggers  MonteCarlo  Attempts  Fits  Unique  Run Time (hh:mm:ss)  Rate  

DLX  MCH  EMCH  de Bruijn  R  P  V  F  P  V  N  R  S  
P1  12f  0  0  0  OFF  OFF  OFF  OFF  OFF  OFF        3,615,971 
3,615,971 
2339 
00:00:38.761  60 
P2  0  12  0  0  OFF  OFF  OFF  OFF  OFF  OFF        138,791,760 
9,077,194 
2339 
00:00:29.250  80 
P3  0  0  12  0  OFF  OFF  OFF  OFF  OFF  OFF        191,960,438 
12,114,615 
2339 
00:00:22.485  104 
P4  0  0  0  12  OFF  OFF  OFF  OFF  OFF  OFF        178,983,597 
25,848,915 
2339 
00:00:06.086  384 
P5  12f  0  0  0  ON  OFF  OFF  OFF  OFF  OFF        892,247 
892,247 
2339 
00:00:09.449  246 
P6  0  12  0  0  ON  OFF  OFF  OFF  OFF  OFF        114,753,421 
7,646,476 
2339 
00:00:24.504  95 
P7  0  0  12  0  ON  OFF  OFF  OFF  OFF  OFF        153,036,807 
9,875,973 
2339 
00:00:19.992  117 
P8  0  0  0  12  ON  OFF  OFF  OFF  OFF  OFF        133,086,329 
20,073,791 
2339 
00:00:04.700  498 
P9  12f  11  0  0  ON  OFF  OFF  OFF  OFF  OFF        12,411,752 
924,167 
2339 
00:00:02.701  866 
P10  12f  0  11  0  ON  OFF  OFF  OFF  OFF  OFF        20,374,275 
1,425,356 
2339 
00:00:02.569  911 
P11  12f  0  0  11  ON  OFF  OFF  OFF  OFF  OFF        17,703,679 
2,455,947 
2339 
00:00:00.579  4049 
P12  12f  0  0  11  ON  OFF  OFF  OFF  ON  OFF        17,572,247 
2,454,746 
2339 
00:00:00.620  3781 
P13  12f  0  0  11  ON  OFF  12  OFF  OFF  OFF        15,198,004 
2,091,215 
2339 
00:00:00.510  4592 
OP1  18f  0  0  12  ON  OFF  OFF  OFF  OFF  OFF        38,479,316 
7,060,175 
46 
00:00:03.611  12.7 
OP2  18f  0  0  12  ON  OFF  18  OFF  OFF  OFF        1,930,304 
668,117 
46 
00:00:00.411  112.1 
TC1  12f  0  0  0  OFF  OFF  OFF  OFF  OFF  OFF        1,502,932,134 
1,502,932,134 
9839 
07:21:31  0.37 
TC2  0  12  0  0  OFF  OFF  OFF  OFF  OFF  OFF        58,306,235,943 
1,604,152,199 
9839 
02:54:07  0.94 
TC3  0  0  12  0  OFF  OFF  OFF  OFF  OFF  OFF        109,746,141,977 
2,835,090,958 
9839 
02:19:27  1.18 
TC4  0  0  0  12  OFF  OFF  OFF  OFF  OFF  OFF        737,892,116,733 
38,637,085,619 
9839 
03:29:26  0.78 
TC5  12f  0  0  0  ON  OFF  OFF  OFF  OFF  OFF        68,141,081 
68,141,081 
9839 
00:20:37  7.95 
TC6  0  12  0  0  ON  OFF  OFF  OFF  OFF  OFF        9,727,894,584 
297,896,605 
9839 
00:33:17  4.93 
TC7  0  0  12  0  ON  OFF  OFF  OFF  OFF  OFF        19,436,156,238 
551,894,232 
9839 
00:28:45  5.70 
TC8  0  0  0  12  ON  OFF  OFF  OFF  OFF  OFF        140,658,669,459 
7,992,209,655 
9839 
00:43:08  3.80 
TC9  12f  11  0  0  ON  OFF  OFF  OFF  OFF  OFF        2,153,543,323 
72,670,225 
9839 
00:07:20  22.35 
TC10  12f  0  11  0  ON  OFF  OFF  OFF  OFF  OFF        4,196,219,275 
129,746,342 
9839 
00:06:02  27.16 
TC11  12f  0  0  11  ON  OFF  OFF  OFF  OFF  OFF        32,810,876,767 
1,898,921,763 
9839 
00:09:48  16.74 
TC12  12f  11  6  4  ON  OFF  OFF  OFF  OFF  OFF        10,380,361,756 
453,289,747 
9839 
00:04:04  40.25 
TC13  12f  11  6  4  ON  OFF  OFF  OFF  ON  OFF        9,421,945,256 
439,737,621 
9839 
00:03:57  41.53 
HP1  35x 0f  0  0  0  ON  OFF  OFF  OFF  OFF  OFF  10,000 
16 
5 
1,131,766,165 
1,131,766,165 
17,435 
05:28:15  0.885 
HP2  35x 2f  0  0  0  ON  OFF  OFF  OFF  OFF  OFF  10,000 
16 
5 
1,131,686,519 
1,131,686,519 
17,435 
05:29:15  0.883 
HP3  35x 4f  0  0  0  ON  OFF  OFF  OFF  OFF  OFF  10,000 
16 
5 
1,097,629,194 
1,097,629,194 
17,435 
05:28:58  0.883 
HP4  35x 6f  0  0  0  ON  OFF  OFF  OFF  OFF  OFF  10,000 
16 
5 
849,818,771 
849,818,771 
17,435 
04:48:02  1.009 
HP5  35x 8f  0  0  0  ON  OFF  OFF  OFF  OFF  OFF  10,000 
16 
5 
507,044,709 
507,044,709 
17,435 
03:14:58  1.490 
HP6  35x 10f  0  0  0  ON  OFF  OFF  OFF  OFF  OFF  10,000 
16 
5 
326,708,081 
326,708,081 
17,435 
02:07:31  2.279 
HP7  35x 12f  0  0  0  ON  OFF  OFF  OFF  OFF  OFF  10,000 
16 
5 
272,000,566 
272,000,566 
17,435 
01:44:36  2.778 
HP8  35x 14f  0  0  0  ON  OFF  OFF  OFF  OFF  OFF  10,000 
16 
5 
241,487,173 
241,487,173 
17,435 
01:34:34  3.073 
HP9  35x 16f  0  0  0  ON  OFF  OFF  OFF  OFF  OFF  10,000 
16 
5 
420,363,728 
420,363,728 
30,945 
02:15:46  3.799 
HP10  35x 18f  0  0  0  ON  OFF  OFF  OFF  OFF  OFF  10,000 
16 
5 
880,638,007 
880,638,007 
60,415 
04:04:34  4.117 
HP11  35x 20f  0  0  0  ON  OFF  OFF  OFF  OFF  OFF  10,000 
16 
5 
1,363,568,660 
1,363,568,660 
106,960 
05:53:36  5.041 
HP12  35x 22f  0  0  0  ON  OFF  OFF  OFF  OFF  OFF  10,000 
16 
5 
1,508,462,348 
1,508,462,348 
112,975 
06:18:54  4.970 
HP13  35x 24f  0  0  0  ON  OFF  OFF  OFF  OFF  OFF  10,000 
16 
5 
1,728,349,705 
1,728,349,705 
119,370 
07:10:09  4.625 
HP14  35x 26f  0  0  0  ON  OFF  OFF  OFF  OFF  OFF  10,000 
16 
5 
1,718,965,943 
1,718,965,943 
119,862 
07:09:28  4.652 
HP15  35x 28f  0  0  0  ON  OFF  OFF  OFF  OFF  OFF  10,000 
16 
5 
1,858,118,645 
1,858,118,645 
133,872 
07:44:54  4.799 
HP16  35x 30f  0  0  0  ON  OFF  OFF  OFF  OFF  OFF  10,000 
16 
5 
1,700,793,486 
1,700,793,486 
108,882 
07:07:56  4.241 
HP17  35x 32f  0  0  0  ON  OFF  OFF  OFF  OFF  OFF  10,000 
16 
5 
1,735,853,837 
1,735,853,837 
115,466 
07:19:01  4.384 
HP18  35x 34f  0  0  0  ON  OFF  OFF  OFF  OFF  OFF  10,000 
16 
5 
1,912,535,104 
1,912,535,104 
140,085 
07:53:32  4.930 
HBD1  35f  0  0  0  OFF  OFF  OFF  OFF  OFF  OFF  10,000 
22 
0 
9,381,994,687 
9,381,994,687 
16,735 
49:54:40  0.093 
HBD2  35a@160 15f  0  0  0  OFF  OFF  OFF  OFF  OFF  OFF  10,000 
22 
0 
978,785,202 
978,785,202 
3731 
05:28:58  0.189 
HBD3  35a@160 15f  0  0  0  OFF  OFF  OFF  OFF  ON  OFF  10,000 
22 
0 
968,436,589 
968,436,589 
5761 
05:08:05  0.312 
HBD4  35a@160 15f  0  0  0  OFF  35  OFF  OFF  OFF  OFF  10,000 
22 
0 
1,156,373,212 
1,156,373,212 
8738 
05:27:04  0.445 
HBD5  35a@160 15f  0  0  0  OFF  35  35  OFF  OFF  OFF  10,000 
22 
0 
1,629,292,460 
629,292,460 
5652 
02:44:54  1.582 
KEY  

Test Case  P  Pentomino 10x6  Test Cases P, OP, and TC were run on a 2.5 GHz P4 running Unbuntu Linux. Test Cases HP and HBD were run on a 2.4 GHz Core 2 Quad CPU Q6600 running Windows XP (using only one of the four processors). 
OP  OneSided Pentomino 30x3  
TC  Tetris Cube  
HP  Hexomino 15x14 Parallelogram  
HBD  Hexomino BoxinaDiamond  
Algorithms  The number in each algorithm column is the number of remaining pieces when the algorithm is activated. For the case of DLX multiple activation numbers can be given, each with a different ordering heuristic. An entry 12f means the minfit ordering heuristic is activated when 12 pieces remain. Other heuristics used are the x heuristic which picks the hole with minimum x coordinate value; and the a@160 heuristic which picks the hole that forms the minimum angle from the center of the puzzle with an initial angle of 160 degrees.  
Image Filters  R  Rotational Redundancy Filter  A number in a column gives the minimum number of remaining pieces for the image filter to be applied. 
P  Parity Constraint Filter  
V  Volume Constraint Filter  
F  Fit Constraint Filter  
Backtrack Triggers  P  Parity Constraint Backtrack Trigger  A number in a column gives the minimum number of remaining pieces for the backtrack trigger to be applied. 
V  Volume Constraint Backtrack Trigger  
MonteCarlo  N  Number of trials.  
R  If after removing a piece from the puzzle there are exactly R pieces left to place, the MonteCarlo trial is ended.  
S  Seed value to the Mersene Twister random number generator.  
Attempts  The number of times pieces were attempted to be placed.  
Fits  The number of times pieces were successfully placed.  
Unique  The number of unique solutions found.  
Run Time  The total run time for the test.  
Rate  The number of unique solutions found per second (Unique / Run Time). 
The first set of test cases (P) examines the 10x6 pentomino puzzle shown in Figure 3. Runs 1 through 4 show the performance of the four basic algorithms.
Comparing these first four runs with runs 5 through 8 shows the significant performance advantage of the rotational redundancy filter. This filter consistently offers significant performance gains when looking for all solutions to a symmetric puzzle. Also note that DLX performs relatively better with this filter enabled as it's the only algorithm capable of iterating over the possible placements of the piece constrained by the filter.
Runs 9 through 11 use DLX only for the first piece placement (to take full advantage of the rotational redundancy filter) but then switch to the other lighterweight algorithms to place the last 11 pieces. Comparing run 8 with run 11 shows this combined algorithmic approach to be about eight times faster than any single algorithm.
Run 12 shows that although the parity filter does offer a very moderate reduction in attempts and fits, the net effect is a reduction in the production rate of unique solutions.
Run 13 uses a oneshot volume filter to expunge many useless images from the image set and results in about a 13% increase in performance.
The second set of test cases (OP) examines the problem of placing the onesided pentominoes in a 30x3 box as shown in Figure 14. The volume filter is shown to be particularly useful for this puzzle delivering a factorofnine performance improvement.
The third set of test cases (TC) examines the Tetris Cube as shown in Figure 1. The first four runs show the performance of MCH and EMCH to be superior to DLX and de Bruijn for this small 3D puzzle.
Runs 5 through 8 again show the huge performance benefits of the rotational redundancy filter; and again DLX performs relatively better than the other algorithms with the rotational redundancy filter active, even outperforming de Bruijn for this 3D puzzle.
In runs 9 through 11 I start to combine the algorithms only using DLX to place the first piece (to get it to iterate over the possible placements of the piece constrained by the rotational redundancy filter) but then switching to just one of the simpler algorithms. As can be seen from the table, the benefits of this combined approach are quite significant.
In run 12 all four algorithms are combined to solve the puzzle. If you number pieces as you place them in the box counting down from 12 (so the last piece placed is numbered 1); then DLX was used to place piece 12; MCH to place pieces 11 through 7; EMCH to place pieces 6 and 5; and de Bruijn was used to place pieces 4 through 1. As the number of remaining pieces gets smaller, it pays to use simpler algorithms. Compare the performance of run 12 with the performance of runs 5 through 8 (where just one algorithm was used) and you see that the combined algorithmic approach is more than 5 times faster than the fastest of those single algorithm runs.
Run 13 shows that the parity backtrack trigger offers a small benefit (about 3%) for this puzzle. It is interesting that run 13 is well over 100 times faster than the straight DLX approach used in run 1.
The fourth set of test cases (HP) examines the problem of placing the 35 hexominoes in the 15x14 parallelogram shown in Figure 6. Here I did not try to find the overall best solver configuration, but instead only studied the effects of packing pieces simply from left to right (using the x ordering heuristic) for initial piece placements and then switching to the DLX minfit heuristic for latter piece placements. I should not have had the rotational redundancy filter active for these tests — this only slows solution production rates when examining such a small portion of the search tree — but I didn't want to tie up my computer for another week to rerun the tests. The best performance was had when using the minfit heuristic only for the last twenty piece placements. Using the minfit heuristic for more than twenty pieces resulted in little performance change but seems to exhibit some small degradation.
It is likely that application of the volume constraint filter, the parity constraint backtrack trigger, and the de Bruijn algorithm (for latter piece placements) would offer additional performance gains for this puzzle.
The last set of test cases (HBD) examines the hexomino boxinadiamond puzzle shown in Figure 12. The first run is a straight nofrills DLX run using the minfit heuristic. For the second run I instead used my angular ordering heuristic which packs pieces into the puzzle in a counter clockwise direction. I started placing pieces at 160 degrees (about ten oclock) so that the less confined region at the top of the puzzle would be left for last. Once there were only 15 pieces left I switched to the minfit heuristic. The number 15 was just a guess and probably too low for best performance; but this approach was still twice as fast as using a pure minfit heuristic.
Run 3 shows that enabling the parity constraint backtrack trigger improved performance by about 65% in this very highparity puzzle. Run 4 switches to the parity constraint filter which improves performance by another 42%.
Most interestingly, Run 5 shows a oneshot application of the volume constraint filter increased performance by a factor of 3.5.
This software is protected by the GNU General Public License (GPL) Version 3. See the README.txt file included in the zip file download for more information.
EDIT (February 11, 2019): the software described and linked below is several years old, but is retained here as it is consistent with this document. I encourage you, however, to instead download and use my improved version of polycube linked from my more recent article on FILA.
The source is about 10,000 lines of C++ code, with dependencies on two other libraries (boost and the Mersene Twister random number generator) which are also included in the download. The executable file polycube.exe is a Windows console application (sorry, no GUIs folks). For nonWindows platforms you'll need to compile the source.
The README.txt file gives the full details about how to run the software (algorithm control, solution output control, trace output control, puzzle definition file formats, etc); but here is a brief introduction. Simply pass polycube one (or more) puzzle definition files on the command like this:
polycube def/pentominoes_10x6_def.txt
This will immediately start spitting out solutions to the 10x6 pentominoes puzzle. Once you see that it's working you'll probably want to explore available command line options. To see them run:
polycube help
There are several puzzle definition files provided. These are simple text files that look like this:
# Tetris Cube Puzzle Definition D:xDim=4:yDim=4:zDim=4 C:name=A:type=M:layout=0 0 2, 1 0 2, 2 0 2, 2 0 1, 2 0 0, 2 1 0 # Blue angle with endnotch C:name=B:type=M:layout=0 0 0, 1 0 0, 2 0 0, 2 1 0, 2 2 0, 2 1 1 # Blue angle with midnotch C:name=C:type=M:layout=0 0 0, 1 0 0, 1 0 1, 2 0 1, 2 1 1 # Blue angled staircase C:name=D:type=M:layout=0 0 0, 1 0 0, 2 0 0, 2 1 0, 3 1 0 # Blue 2D serpent C:name=E:type=M:layout=0 0 1, 0 0 0, 1 0 0, 1 1 0, 1 2 0, 2 1 0 # Red dog with tail C:name=F:type=M:layout=0 0 0, 1 0 0, 0 0 1, 1 0 1, 0 1 0 # Red ziggurat C:name=G:type=M:layout=0 2 0, 0 1 0, 0 0 0, 1 0 0, 2 0 0 # Red angle C:name=H:type=M:layout=0 0 0, 1 0 0, 2 0 0, 3 0 0, 2 1 0 # Red line with midnotch C:name=I:type=M:layout=0 0 1, 1 0 1, 2 0 1, 0 1 1, 0 1 0 # Yellow pole with twisty top C:name=J:type=M:layout=0 0 0, 1 0 0, 1 0 1, 1 1 1, 2 1 1 # Yellow corkscrew C:name=K:type=M:layout=0 0 0, 1 0 0, 2 0 0, 1 1 0, 1 1 1 # Yellow radar dish C:name=L:type=M:layout=0 1 0, 1 1 0, 1 1 1, 2 1 1, 1 0 1, 1 2 1 # Yellow sphinx ~D
The type=M means the piece is mobile and free to move about (the typical case), but you can also declare a piece to be type S (stationary) to forcibly load the piece at the given coordinates. It's often easier to define the puzzle pieces graphically. Here's a definition file for the boxinadiamond hexomino puzzle that uses graphical layouts for piece definitions.
# BoxinaDiamond Hexomino Puzzle Definition D:xDim=23:yDim=23:zDim=1 L . . . . . . . . . . . A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A . B B . C . . D . . . E . . F . . . . . . . . . . . . . . . . . . . . . . . A . B . . C C . D . . E E . . F . . . . . . . . . . . . . . . . . . . . . . . A . B . . C . . D D . E . . F F . . . . . . . . . . . . . . . . . . . . . . . A . B . . C . . D . . E . . F . . . . . . . . . . . . . . . . . . . . . . . . A . B . . C . . D . . E . . F . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G G . H H . I I . J . . K . . . L . M M M . N . . . O . . . P . . . Q . . . . G G . H . . I . . J J . K K . L L . M . . . N N N . O . . . P . . . Q . . . . G . . H H . I . . J J . K K . L . . M . . . N . . . O O O . P P . . Q Q Q . . G . . H . . I I . J . . . K . L L . M . . . N . . . . O . . . P P . . . Q . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R . . . S . . . T . . . U U U . V V . . W W . . X X . . . Y . . . Z . . 1 . . R R R . S S . . T T . . . U . . . V V . . W . . . X . . Y Y Y . Z Z . . 1 1 . . R . . . S S . . T . . . U . . . V . . . W W . . X . . . Y . . . Z Z . . 1 1 . R . . . S . . . T T . . U . . . V . . . W . . . X X . . Y . . . Z . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 . 3 . . . 4 . . . 5 . . . 6 . . . 7 7 . . 8 8 . . 9 9 . . . . . . . . . . 2 2 . 3 3 3 . 4 4 . . 5 5 5 . 6 . 6 . 7 7 7 . 8 8 . . . 9 9 . . . . . . . . . 2 2 . 3 3 . . 4 4 4 . 5 . 5 . 6 6 6 . . 7 . . . 8 8 . 9 9 . . . . . ~L L:stationary=* * * * * * * * * * * * . * * * * * * * * * * * * * * * * * * * * * . . . * * * * * * * * * * * * * * * * * * * . . . . . * * * * * * * * * * * * * * * * * . . . . . . . * * * * * * * * * * * * * * * . . . . . . . . . * * * * * * * * * * * * * . . . . . . . . . . . * * * * * * * * * * * . . . . . . . . . . . . . * * * * * * * * * . . . . . . . . . . . . . . . * * * * * * * . . . . . . . . . . . . . . . . . * * * * * . . . . * * * * * * * * * * * . . . . * * * . . . . . * * * * * * * * * * * . . . . . * . . . . . . * * * * * * * * * * * . . . . . . * . . . . . * * * * * * * * * * * . . . . . * * * . . . . * * * * * * * * * * * . . . . * * * * * . . . . . . . . . . . . . . . . . * * * * * * * . . . . . . . . . . . . . . . * * * * * * * * * . . . . . . . . . . . . . * * * * * * * * * * * . . . . . . . . . . . * * * * * * * * * * * * * . . . . . . . . . * * * * * * * * * * * * * * * . . . . . . . * * * * * * * * * * * * * * * * * . . . . . * * * * * * * * * * * * * * * * * * * . . . * * * * * * * * * * * * * * * * * * * * * . * * * * * * * * * * * ~L ~D
Note that a single stationary piece named * is used to shape the puzzle.
When I get time to work on this again, these are the features I'll be adding:
Here are all the solutions to the Tetris Cube and a few other popular puzzles:
The solutions for 3D puzzles need explanation. The first three solutions to the Tetris Cube are shown below. Each solution is displayed as four horizontal slices of the puzzle box like the layers of a fourlayer cake. The first slice (on the left) shows the bottom layer of the box; the next slice is the second layer of the box; etc. The letters match the labels of the pieces shown in the photo above, identifying which piece occupies each cell in each layer of the puzzle box. The background color is also set to match the color of the pieces. Because the pieces are three dimensional, they can span multiple layers of this output.
One thing I find fun to do is to use the solutions to place just the first 6 or 7 pieces in the box, and then see if I can solve it from there. It's still challenging, but won't cost you a week of vacation to find a solution.
Applying a host of different algorithms and constraintsbased optimizations to a single polyomino or polycube problem can deliver great performance benefits. For large problems it appears that initially simply packing confined regions of the puzzle works well. When the number of remaining pieces reaches some critical threshold (that depends on the complexity of the piece and the topology of the remaining puzzle), switching to an algorithm that seeks out constrained holes or pieces does better. Examples of such algorithms include DLX using a minfit ordering heuristic, and the MCH algorithm. For the final piece placements, the deBruijn algorithm appears most efficient. Paritybased constraints can offer performance benefits especially for highparity puzzles. Investing the time to purge the images of pieces that partition a puzzle into two regions that are clearly unfillable based on volume considerations consistently offered considerable performance benefits for all polyomino puzzles examined. Constraining the allowed images of a piece to eliminate rotationally redundant solutions from the search also provides great performance benefits when enumerating all solutions to a puzzle that has rotational symmetries. Some of these techniques are easy to apply, others (like knowing when to start applying the minfit heuristic, or when to switch to the de Bruijn algorithm) unfortunately require some combination of intuition, experimentation, or experience to use effectively.
These types of puzzles are certainly a marvelous distraction and I end this effort (at least for the time being) leaving many ideas unexplored. I haven't even examined the effectiveness of some of the techniques I've implemented in the solver (e.g., the volume constraint backtrack trigger). Correspondence with other folks interested in these puzzles has brought other promising strategies for attacking these types of problems to my attention, but I must for now return to other more practical projects.
The octagon on the right is only about 14 inches above grade. The octagon on the left is one step up (maybe 7 inches higher) and surrounds an octagonshaped hottub. I wanted the decking for each octagon to be laid out circularly (as shown) to emphasize the octagon shapes.
Although my web searches did turn up a few pictures of octagon decks with circular decking, I found no plans on how to build them. Figuring out how to frame this deck was quite challenging to me — someone with zippo framing experience. Let me emphasize this point with the following disclaimer:
DISCLAIMER: I am not a professional engineer and have no training or experience in construction. I'm a novice. This page chronicles my own deck building experience. I hope you find the information provided here useful, but it is provided WITHOUT ANY WARRANTY; without even any implied warranties. I make no claims as to the structural integrity of this deck design or whether it is fit for any particular use.
I spent hours talking to my fatherinlaw (who's built a deck or two before) and to the folks at Front Range Lumber who familiarized me with the various hardware available from Simpson. With their help, here's the framing design I finally came up with:
Unfortunately, this drawing is drawn looking north instead of south so left and right are reversed relative to the previous sketch, so I'll refer to the two octagons as lower (the octagon without the hot tub) and upper (the octagon that's one step up and surrounds the hottub).
The lower octagon has one long double2x10 beam that stretches between two opposite corners, and two more double2x10 beams that connect at the center of the long beam to form a cross. Each side of the octagon is also a loadbearing double 2x10 beam. The beams forming a cross, split the octagon into four quadrants. All joists are of 2x8 lumber. The joists in each quadrant all run parallel to each other. Note that the joist running through the center of each quadrant is doublewide to provide ample surface area to screw down the ends of the deck boards. Because I could find no hardware to interconnect eight incident doublewide 2x lumber, these four double wide joists don't run all the way to the center of the deck but instead hang from a side of the small diamond structure which is made just big enough to clear all hanger hardware at the center.
Each side of the upper octagon is a doublewide 2x12 beam. (Note that 2x10 beams would be sufficient structurally, but I'm using 2x12s to provide more room to bolt the two octagons together on their common side.) All joists on this side are made of 2x6 lumber because they are so short. At the outside the joists hang from the beams that form the sides of the octagon. On the inside the joists rest on top of (and are cantilevered over) four underlying double 2x10 beams laid out in a simple square. The four concrete piers at the corners of this square have good clearance from the concrete slab for the hot tub.
Each octagon corner (both lower and upper), has three incident beams (or two beams and a doublewide joist) coming in at 67.5 degree angles. So how do you tie these all together and support them? There's no post cap I could find that lets you do this. My solution was to notch a post in such a way to support all three beams. I needed a rather fat post to have enough cross sectional area to both support all three beams and still have enough wood left over to bolt everything together. I settled on a post size of 6 inches by 8 inches which I fabricated from two 4x6 roughcut treated posts that are bolted together. Here is an orthographic threeview drawing of one half of one of my deck posts.
The other half of the deck post is just the mirror image of this drawing. Note that all dimensions in the drawing are in inches and all bolt holes are 1/2 inch in diameter. The notch is shown as nine inches high which is appropriate for a double2x10. (You'd obviously need to modify this dimension to accommodate 2x8 or 2x12 beams.) It's important to get the 1.5 inch depth cut just right to match the thickness of your lumber, lest you snap the post when you sandwich the two post halves around the radial beam. The 3.247 inch dimension is chosen so that the exterior face of the side beam will just grace the underlying post corner (so the corner of the post will not stick out underneath the beam).
Here are a couple of pictures of my samplepost.
This samplepost served to give me some confidence that I could actually build this deck before I placed the big order for all the building materials. It was also useful during my first visit to the Lakewood building office. They were initially saying I'd need to pay an engineer to approve my post design, but when I showed them my sample post they decided the thing looked so solid I could forgo an engineer's approval.
To minimize the delay before having at least a portion of the deck ready for use and also to reduce the risk of lost time and money in the woeful event this whole project flops, I decided early on to build the lower octagon first and save the other half of the deck for some later summer.
Once I finished all the drawings required to get my building permit, I marked the post hole locations. I also had to move some sprinklers around. (The black tube is a section of a sprinkler zone that ran under the deck's new location.)
Digging the holes to the three foot depth required by local building code was truly a challenge. I started by renting a rather large 9 horsepower auger which worked great until I hit some nasty breccia about 15 inches down. I rented the auger for two days (at $110.00 per day) trying to drill through this stuff, but made very little progress. Then I rented a jack hammer which easily broke up the breccia, but since I could only loosen about 2 inches of rock at a time, several applications of the jack hammer were required on each hole to get them to depth. Getting the loose dirt and rock out of a hole after using the jack hammer was a tedious, and backwrenching process. (My fatherinlaw did actually wrench his back.) After accruing over $300.00 in rental fees, I got annoyed and bought my own electric jack hammer for about $600.00. I just wish I had back all the money I've spent renting jack hammers and that all but useless auger.
Usually when you bulid a deck, you pour your piers first; then place your posts; then cut or notch them all to the right height; then connect your beams to your posts, etc. But I needed my posts to be located exactly at the octagon corners. I doubted my ability to get them located accurately enough to make the octagon look true. With so many posts, I thought it would also be quite challenging to get them all level. Lastly, the funny 22.5 degree notch cuts are all made on a bandsaw and tablesaw — not something I can do once the post is mounted. So instead I built this temporary square support structure out of some 14foot long 2x4s.
The idea is to build the basic frame of the octagon on top of this structure, with the posts at each corner floating over their respective holes. And then once everything is lined up nice, pour the piers. There's actually three 2x4s on each side glued and nailed together to provide sufficient strength to support the framing which I estimated to weigh around 1500 pounds (without the joists). Each corner of this square is supported by a threaded rod which itself is stuck in a short length of buried 4x4 to keep it from sinking into the earth. A nut and washer under each corner is used to adjust the height to get the top of the square structure level with the top of the foundation wall of my house.
Now it was time to start making posts. Because the lower octagon is so low to the ground my posts are only about twelve inches long. Nine inches of the twelve is notched out so that the post really only extends a scant three inches below the bottom of the beams it supports. (This is actually my major worry about my deck design: that the three inch tall block of wood the beams sit on could split off from the rest of the post. On the bright side, if my deck does ever fail, it can't possibly fall more than a couple of inches.) Here I am cutting sixteen 12inchlong posthalves from 4x6 rough cut treated lumber. (I was using clamps to hold the lumber steady because it was a little crooked and I didn't want it shifting while I cut it.)
SO MUCH SAWDUST! Although I used a full breather most of the time, there was at least one evening I worked late in my garage without a mask and assuredly breathed in more chromated copper arsenate (CCA) than anyone should. Hopefully I'll finish this deck before lung cancer sets in.
Each posthalf required 4 cuts to make the needed notches. Two of the four cuts were trivial, but the other two required a jig to hold the lumber in place as I ran it through the saw. Here's one of the two jigs I had to make:
And here's the other. This one was only needed because my bandsaw table only tilted clockwise. You sort of expect this kind of thing from a table saw, but I was surprised to find this limitation on my band saw.
With the posts all cut, I could start attaching posts to beams (and doublewide joists) and begin framing the deck. One thing I noticed before I started was that not all my 2x lumber was the same size. For example my 2x10s ran anywhere from 9 3/16 inches to 9 7/16 inches in width: a full quarter inch of variance. Normally this isn't a big deal, but I didn't want to deal with notching my posts all differently to accommodate this variance so I ended up ripping all my 2x10s down to the standard width of 9 1/4 inches. I figured since this is the nominal size of a 2x10 the inspector wouldn't complain too much and it did give the added benifit of straightening my lumber. It's probably worth mentioning that the small diamond structure can't be nailed with a hammer: I had to buy an airpowered palm nail driver to work in the tight spaces on the interior of this diamond.
Once in place, I verified the lengths of all the radial beams and doublewide joists (measured from one corner to the opposite corner) were close to their designed size of 188 5/16 inches. I then cut the eight perimeter beams to their design length of 72 and 1/16 inches (measured on their shorter inside). I set them on the posts and stretched a rope around the octagon to pull them all in as tight as I could. I then started bolting them in place. I used a couple of clamps to make sure the beam was well seated in the notch and to hold the beam tight against the post face. A sledge hammer was useful for tapping the post (and radial beam) into proper alignment before drilling the holes. I also countersinked the holes from the front so the carriagebolt heads wouldn't get in the way of decking and/or steps I intend to mount around the periphery of the deck.
Oct 28, 2010
After a two month hiatus (spent pursuing the creation of this website and replacing some rotting siding on my house) I am again working on my deck. I used my clamps in spreader mode between the deck frame and the underlying square support structure to scootch the deck a few inches to the side giving me access to the holes. I cleaned out the holes (that had partially filled with loose soil from recent rains) and dropped in the sonotubes. I trimmed the tubes to their proper length.
The previous deck happened to have a concrete pier where I could reuse it for my center post. While I had the deck frame slid to the side, I drilled a hole in this pier and installed a wedge anchor bolt to which the post base will attach. It was really hard to find a wedge anchor bolt that was both galvanized and long enough to meet code. I highly recommend wholesalebolts.com. They have a great selection and charge only about 20% of what Home Depot charges.
I scootched the deck frame back into place and then attached the Simpson Strong Tie post bases to each of the eight corner posts. Before I attached each base to its post, I attached a 5/8" LBolt to the bottom of the base. I used two nuts to attach these bolts: one nut on the topside of the base and one on the bottom. In this way the bolts will be held tight to the base during the concrete pour.
I installed the two beams that bridge the deck to the house. These beams simply sit on the house foundation wall and are secured to the house rim joist by lag bolts. I think it's more typical to use a hanger to connect these to the rim joist, but since my basement is unfinished I was easily able to lag them into place.
I was having a devilofatime trying to get everything level with my fourfoot level. Is gravity crooked in Lakewood? I began spending my evening hours lusting after absurdly expensive laser levels. Then an old college friend of mine, David Cenedella, told me about a water level. If you have a bucket and some clear plastic tubing, you have everything thing you need to make an extremely accurate level. Just put some water in the bucket; put one end of the tube in the bucket; and fill the tube with water (like you would to make a siphon). Holding the other end of the tube slightly higher than the water level in the bucket will ensure no water siphons out. The water level at the end of the tube is always the same no matter where you move it. Using this technique I leveled each corner of the deck with the top of the beams at the foundation wall. I made gross adjustments by raising or lowering the nuts that the support square rested on; and made finer adjustments with some thin wooden shims. This was so simple and worked so very very well! Thanks David!
My fatherinlaw came out again to visit and together we hand mixed the thirtyfour 80 pound bags of concrete needed to fill the eight sonotubes. We were worried about how hard it would be to get concrete in the small gap between the post base and the top of the tube. Initially I made a huge funnel out of a piece of sheet metal, but then we found a simple sheet metal chute worked wonderfully. The concrete pour was then actually straightforward. It took us about seven hours of very hard work over two days. Here's one thing I learned: it's about four times harder to simultaneously mix two bags of concrete than it is to mix one!
After the piers had cured for a few days, I removed the square structure that the beams had previously been sitting on.
The ground in my backyard slopes toward my house. Although I made my posts as short as I dared, the ground level at the posts farthest from the house was still an inch or two higher than the top of the concrete pier so the bottom of these posts would be about an inch below ground. This would cause the posts to rot out far faster than they otherwise would. So I spent a lot of time lowering the ground level around and under the deck so that the tops of all of the concrete piers were at least 2 inches above ground and to create a slight ground slope away and to the side of my house. I moved probably three or four yards of dirt (which I'm still working to get rid of).
My center post isn't quite in the center because the center of the deck has hanger hardware in the way. Instead it's about a footandahalf off center under the main deck beam. The only thing of note about this step is that my 16' long main beam had begun to sag under the weight of the other beams hanging off of it, so I had to get out my car jack to raise it about a half inch before installing this post. Fortunately, this upward force wasn't enough to crack my new concrete piers.
November 11, 2010
The aluminum sill under the sliding glass door has been precariously unsupported since I removed the old deck. To remedy this, I shaped a couple of boards to sit under this sill and fill the gap in the exterior wall. I also shaped some flashing that should help keep water out of the wall. Once the boards and flashing were installed, I sealed up all the joints with silicone (not shown).
I followed this same procedure to fill the pocket hole in the brick wall.
November 26, 2010
Over the months that I've been working on this deck the 2x lumber has cupped and curled. About half of my beams developed sizeable gaps between the two boards as seen here.
If I had angled the nails in when I put them together I suspect I would have prevented much of this gapping. To remedy the problem I bolted all the beams and doublewide joists together.
I placed several bolts on each beam placing them alternately about 2 inches from the top and bottom edge of the beam as shown below. (You don't want to drill any holes too close to the edge of the beam as that would structurally weaken the beam. Check your local building codes for the exact rules.)
The hangers I installed months earlier (the doublewide hangers supporting the crossing beams and the small diamond structure) gave me some troubles because things wanted to move around on me as I nailed in the hangers. I've since learned I should have first toenailed the beams into place and then installed the hanger. I did much better installing the joist hangers. Here are the steps I took for each joist.
I used galvanized 10d nails for all angled connections into the beam; but used galvanized "hanger" nails (1.5 inch long nails with a fat shank and a thick head) for everything else (as permitted by Lakewood building code). Here's a look at the installation of one of my joists.
And here's a look at the deck after all joists were installed.
The Lakewood building inspector thought everything looked great and was musing he should show pictures of my deck to the professionals to show them how things should be done. I suspect he was blowing smoke with this remark, but appreciated the compliment nonetheless.
August 26, 2011
Over the winter months, the framing lumber (which was initially weeping wet with CCA treatment) dried resulting in significant shrinkage in board width and thickness. (Board lengths, in contrast, seemed relatively unaffected.) This caused the tops of every one of my joists to drop so they were an eighth of an inch (or more) below flush with the tops of the beams. This same shrinkage caused the bottoms of the joists to pull upwards so they were no longer seated in their hangers. What a mess! Here's a look at the top of one of my troubled joists.
I ended up pulling every nail out of every joist, and using hundreds and hundreds of metal shims to get the joists back to flush with the tops of the beams. It turns out that the fat heads of hanger nails are easy to grab with a pair of vice grips which gave me something to pry against to pull the nail; so this job wasn't as hard as you might think. If I had taken the hangers off altogether it would have saved the time to cut all the shims, but the longer nails used to mount the hanger to the beams had no such fat heads making the removal of these nails all but impossible; and in any case I didn't relish the idea of repositioning all the hangers. I couldn't figure out where to buy metal shims, so I cut them from galvanized landscape edging. This was a cheap (but time consuming) solution. Here's a look at one of my joists after I've shimmed it to bring it back to flush with the beam.
Many websites offer warnings that treated lumber can shrink significantly, but I've found very little documenting how shrinking joists in hangers can lose level with the beam or pull up out of the hanger seat. When (and if) I get around to building the other half of this deck, I'm going to search long and hard to find KDAT (kilndriedaftertreatment) lumber. I also intend to redesign the second half of this deck to use a cantilever arrangement where the joists instead rest on top of the beams (avoiding the use of joist hangers altogether).
Over the winter months, my joists also bowed quite a bit. If I had managed to get the decking on before winter this surely would not have been a problem. To straighten them, I blocked all the joists. This should stiffen the floor up nicely too.
I put down a bunch of weed block under the deck and bought some landscaping rock to hold it down. This step may have been unnecessary as I don't expect much sunlight will make it down through the decking.
At the suggestion of a local deck builder professional (whom I was chatting with at the building office), I'm placing iceandwater down over all the beams, joists and postheads. He claimed that water tends to sit on all the horizontal surfaces under the decking eventually leading to rot. Having seen how my cedar siding rotted away where a joist (from the old deck) was lagged against the side of the house, I decided to follow his advice.
Although the iceandwater is quite easy to cut to shape (with an exacto knife), it would not stay folded down on the sides of the joists and beams, so I spent a lot of time tacking it down. I was surprised by this as a couple of different people told me this stuff is so sticky it's impossible to pull up once placed.
I'm only covering a piepiece of the deck at a time since (as I understand) iceandwater degrades in full sunlight.
I first needed to mark the center of the deck. Now you could simply measure to the midpoint of opposite corners, but I decided to stretch some mason lines between corners to get a precise placement. Remarkably, all the lines crossed at a single point. I put a nail at this point which I used to make sure all the decking is placed concentrically.
To deck one piesection, I started by placing a nail at two adjacent corners of the deck positioned so they fall in the gap between the outer two deckboards. In my case, this was 93 and 1/4 inches from the center nail. I left these nails in place so I could later use them to mark the cut line for the deck boards.
Later, I'll be using some cedar decking as fascia boards to cover the sides of the perimeter beams (for a clean appearance). I want the outer deck board to over hang that by almost half an inch, which means it needs to over hang the beam by an inchandahalf. I also cut a drip edge on this first board which (supposedly) will help keep the rain off the fascia boards.
For each pie section, I chopped off one end of a bunch of deck boards to the needed 22.5 degree angle and let the other end of the board run long. I jammed each deck board tight against the end of the deck board from the previous piesection. Once the first deck board was screwed down, I stretched some mason strings along each joistline to help me keep all my screws in a neat line with the screws in the first board.
I actually used 10 penny nails as spacers between each deck board for my first piesection, but because the widths of my boards varied by a quarter inch, this made it difficult to match this spacing on subsequent sections. I wish instead I would have precisely placed about every fourth board (assuming some nominal board width) and then simply placed the boards between by eye to achieve a good look. This would have kept the variance in board width from accumulating during placement.
Once all the boards are screwed down, I used a circular saw to cut a gap between the piesection just finished and the previous pie section. (Be sure to use a hightoothcount blade to minimize splintering. I failed to do this on my first cut  you can see how the boards on the right show a lot more chipping.) This cut gives the boards a little room to expand with moisture and heat and also makes a nice looking line. The cut depth must be set very slightly less than the board thickness to remove most of the board but without cutting into the iceandwater below.
I then moved the straight edge to chop off the long ends of the deck boards. To find the right line, I simply stretch a line between the guide nail at the corner and the center nail. I clamped my straight edge so the circular saw will follow this line. After the cut (which again was about 1/32 of an inch shy of cutting through the board), I snap off the boards and flipped up the splinters left behind (seen in the image below) with a knife edge.
This procedure worked fine until I got to the final piesection at which point I had to carefully cut each deck board to length before slipping it into place. I cut these boards so they fit tight at their ends so I could again cut the nice spacer lines as I did between other piesections.
To simplify the task of decking the bridge between the house and the octagon, I carefully setup some mason lines to mark the outer edge of the octagon and then temporarily removed the adjacent deck boards from the octagon. This allowed me to again cut the deck boards sloppy long. Once screwed down I cut the ends off with the circular saw along the line marked by the mason lines; and finally replaced the outer deck boards from the octagon. I should have taken a picture of this process, but here's a look at the finished work.
For the center piece, I cut some decking down to about 2inches wide and glued several together with polyurethane glue and biscuits. (Gorilla Glue sure is messy stuff.) I'm skeptical this will hold together in the weather. If it falls apart I'll try building something different.
I wanted this center piece to be of a different color, so I went ahead and stained it chocolate brown before screwing it down. Here's a look at the deck with completed decking.
I screwed deck boards to the sides of the beams giving the deck a polished look.
Keeping with the octagon theme, I first planed some roughcut cedar 4x4 posts down to a consistent 3^{5}/_{8}" x 3^{5}/_{8}" and ripped the corners off on my table saw to give the posts an octagonshaped crosssection. I followed this up with some crosscuts to fashion an octagonshaped ball on the post head. This was all pretty darn easy. I then cut slots through the post to accommodate the top and bottom rails. The slots were really hard. Unable to find a mortising bit that would do the job, I actually cut them with a jigsaw; but the wander on the 7 inch blade was bad enough that I had to clean all the cuts up with a hand chisel. A neighbor friend of mine (who happens to be a retired craftsman) tells me I should have used a fluted router bit with a jig to guide the cuts. Live and learn.
I bolted the railposts to the beams that run around the perimeter of the deck. (I had to unscrew a few deck boards to do this — no big deal.) A few posts wanted to lean in or out noticeably (because the perimeter beams weren't all vertically aligned perfectly). I've heard people use washers to correct this problem. I also heard it's a good idea to install washers to produce an airgap between the post and the beam (or fascia board in my case) to prevent rot; but it seems to me that concentrating post torque on the smaller washersized surface area would lead to wood fiber compression and ultimately a loose post. This is just my guess, I'm likely wrong. In any case I chose to instead use a power sander to knock off a sixteenth of an inch on the top or bottom of the fascia boards to correct my leaning posts.
While I was installing these posts, I also stapled some hailscreen to the same perimeter beams. This screen hangs to the ground, forming a fencing that should keep skunks, foxes, cats and other varmints out from under the deck.
I also had to put a few posts in the ground around my window well. These posts were tricky since they not only had to be plum, square and in good alignment with posts on the deck, but also had to be at exactly the right height (since the rail slots were precut). So I first poured concrete for just one post at the far end of the window well and let that set. I dug the post holes for the other two posts slightly deeper than needed and dropped the posts in, but before I poured the concrete, I installed all my rails around the windowwell propping them up as necessary. This suspended the last two posts in proper position while I poured the concrete.
December 24, 2011
For some reason, I can only seem to find rough sawn cedar in my area, so I used my planer to manufacture smooth and consistently sized lumber. Although the slots in my rail posts were a full 1 1/2 inch in width; I ended up planing my rails down to 1 7/16" so they would slide in easily and allow for expansion.
I slid the top and bottom rails into place and then marked positions for the 3/4 inch round metal balusters which I spaced at four inches on center. I then took all the rails down and drilled 3/4 inch diameter holes one halfinch deep at each baluster mark. For the bottom rail, I drilled a quarter inch diameter hole through the center of each baluster hole all the way through the board. This should allow any water that gets in the hole to drain out the bottom of the rail. I also countersinked some screw holes from the under side of the top rail that will allow me to screw down the rail caps from the under side.
I then put everything back together. Notice that the rail slots for the top rail are an inch taller than the rail itself. Once the rails are slid into place, you can pull the toprail up by one inch which gives you room to drop the balusters into the holes on the bottom rail and then slide the toprail back down over the metal balusters. I drove two 3 inch deck screws through the posts at each rail slot to secure the rails. When it comes time to restain the deck, I can remove these screws, slide the top rails up, and remove all balusters in just a few minutes time.
I wanted the window well gate to have the same look as the fencing, so I wanted to avoid using the diagonal brace that is typical of outdoor gates. I used my bandsaw and planer to make a dozen cedar boards that were 1/2" thick and 3 1/2" wide. I sandwiched and glued these boards together to make a gate frame that was 3 boards thick on each side, the corners interleaved to make strong joints. (I'm not sure what you call this kind of joint or if it even has a name.) Note that the baluster holes at the top are drilled all the way through so the balusters can be dropped in. I dropped some cedar sticks into the holes before I attached the top rail so the balusters could not slide up once everything was put together. I used a spacer board and some clamps to hold the gate to the post while attaching hinges and latch hardware. This worked great.
At this point I had my final building inspection. I had a different inspector this time and he didn't have much to say, which I guess is a good thing.
I was working on my steps around the deck when the winter snows hit. I'll finish this up and finish with the staining as soon as whether conditions permit, but since this deck is on the north side of my house, this may not be till spring time.
May 22, 2012
I framed out a single continuous step that wraps halfway around the deck. Only two sides are functional, the rest lead up to a railing and are just for aesthetics. Perhaps we'll put some potted plants here. The steps are fashioned from box frames that rest on halfburied cinder blocks and are bolted to the side of the deck (and to each other). I had to trim down some 2x4s to use as spacers to clear the rail posts.
I applied two coats of Cabot Transparent Cedar stain. The color matches well with the rail posts that I stained last year. (No fading or darkening.) This stain does have a bit of a sheen. I do like the color contrast of the dark brown center piece, but I don't like the difference in gloss. Someday, I may sand down and restain the center piece with a different Cabot stain.
November 16, 2012
Here's my first attempt at a flagstone patio. Because I'm both inexperienced and wholly ignorant of good flagstone installation techniques, I strongly advise you to read other flagstone installation guides and make your own informed decisions about how best to install a flagstone patio. Trying to do anything seen here will probably lead to uneven stones, stones that slide around, smashed fingers, lossofeye, excessive weed growth, environmental damage to your soil, the growth of brainkilling fungus in and around your patio, gunfire from angry neighbors, dementia, global warming, lawsuits from multiple government agencies and environmental groups, and the spontaneous evolution of a new globally dominant life form. Proceed at your own risk.
You don't want to place flagstone on dirt. Dirt is difficult to level, difficult to compact, and susceptible to settling. So I started by excavating a few inches of earth around the deck. I then put down 1to3 inches of crusher fines. I've read other sites that recommend a layer of roadbase, followed by a layer of crusherfines or sand. Given the hardness of my soil, I suspect I'll do just fine with only the fines.
I buried lengths of 2x4 lumber every 6 feet or so, and leveled the tops with the desired level of the fines. I used a small mason level to level each 2x4 along its length and used a long straight edge between three adjacent 2x4s to make sure they were all level with each other.
I spread the crusher fines in a section with a rake, and then dragged a straight edge across adjacent 2x4s to get a level surface.
I shoveled out any excess soil and tossed it into the next section. I bought a tamper to compact the surface, but the crusher fines really don't seem to need much compaction. If I had to do it over I probably wouldn't have bought the tamper and just skipped this step.
I pulled out the 2x4 closest to the flagstone already laid and patched up the whole using a little board with a straightedge. I then put down flagstone in this section before leveling the next section. I only leveled one section at a time because I couldn't seem to avoid disturbing the ground in the work area.
I found selecting which piece to lay next a time consuming and laborious task. Each of my stones were nearly 2 inches thick and weighed between 100 and 150 pounds. It was no fun lugging them around trying to find pieces that fit well with the pieces already laid. And finding just one piece that fits well is not really enough: you might find a stone that fills a nook well, only to later find it created a new space that is difficult to fill. For this reason I think it important to always know how the next 2 or 3 pieces are going to be laid. Usually I'd do this by putting pieces roughly into place (slightly laying them on top of each other at the edges) to be sure the next few pieces would cover the immediate work area without much waste. Here you can see how I'm testing the fit of a couple of pieces.
The remaining open space above was the final space in the patio. I only had four pieces left and it wasn't obvious how to best cover this area with them. And of course they were all big heavy things. To save my back I traced the remaining stones onto newspaper and cut them out. I then used these paper copies to puzzle out how to cover the hole. In the first image below I managed to cover the hole with just 3 of my 4 paper cutouts. I doubt this saved time, but it sure saved a lot of grunting and also kept the flagstone scratch free. To make things look right I had to trim the existing pieces as well as the new pieces. The second picture below shows how things looked in the end. (I failed to take the second photo at the same angle. It's a bit confusing. Sorry!)
Before cutting flagstone, be sure to don protective gear for your eyes and ears. Little bits of rock would regularly fly into my safety glasses, and cutting rock is really really loud. Here's a picture of me in my safety gear. I'm still waiting for that first big movie contract.
Cutting stone puts out a lot of dust which seems to destroy my power tools. I'm sure there are expensive professional quality concrete saws built to withstand this dust, but I chose instead to sacrifice an old cheap nearly wornout circular saw that I probably paid $25.00 for 25 years ago. By the end of the project this saw sounded horrible, but it still spins.
Because I was only putting in a 3footwide patio around the deck, most flagstone pieces needed one straight edge. Some pieces had one edge that was straight enough and I just used it as is, but others I decided to straighten. I started by flipping the stone over and drawing a chalk line (using some of my kid's sidewalk chalk). I set the blade depth so it only cut a little over half way through the stone and made the cut. Then I'd flip the stone back over and tap the edge with a sledge hammer to snap it off. I made all cuts from the back side because it leaves a natural looking broken edge on the visible top side of the stone. Be sure to knock it off from the top (uncut) side: hitting it from the bottom (cut) side will sometimes knock off a layer of rock on the walking surface.
To trim a piece so that it fits well with other pieces already laid, I'd start by slipping the new piece under any pieces already laid that it intersects. I then traced the edge of the pieces already laid onto the new stone with chalk. I'd mark the line to get about a one inch gap. I'd then pull the piece out and transcribe the chalk line to the back of the stone. I did this by hand: I found simply placing a pointed finger from my left hand on the chalk line and trying to touch the same point on the other side with a finger from my right hand works very well. After marking a three or four points this way, I'd just connect the dots to sketch the line. It doesn't have to be perfect: some imperfection looks better anyway.
I'd then cut the stone, sometimes making a few straight straight cuts to carve out a curved line. For interior angle cuts, I'd make a series of cuts perpendicular to the chalk line and knock it out with a sledge. There may be better ways to do this. This worked ok though.
After getting a piece to the right shape and placing it, it would almost always need adjustment either because it wasn't quite level with the existing stones or because it would rock slightly when you walk on it. I'd typically have to remove a piece just placed three or four times before I got it both level and stable.
I wanted to be sure no weeds grew in the gap between my deck and the flagstone. The plan was to use the weed block I had previously put down under the deck which I purposefully left long to extend out under the flagstone patio. Here I've pulled back the rock on some of Home Depot's "Contractor Grade" weed block fabric. This has only been down for a year and grass has already penetrated the fabric. Complete Junk. Frankly, I'm not sure where to buy good weed block. So I ended up supplementing this fabric with some very heavy black plastic. After I had a large section of flagstone down and level, I temporarily pulled up the stones near the deck and put down some plastic under the deck and under the inner patio. Using plastic like this is bad for your soil and promotes the growth of harmful fungus and mold, so don't do what I did! If someone knows where I can buy landscaping fabric that will last more than a year in the Denver area please let me know!
I filled the cracks with a product called Envirostone. You water it into place and it sets up hard; but if it ever cracks you can just get it wet again and repack it. I'm not sure this is the best product, but some friends of mine used it on their patio and were pleased. I started by sweeping it into the cracks. The instructions on the bag then recommend blowing the remaining dust off the flagstone with a leaf blower. This removed some of the dust for me, but not all. I went ahead with the last step of using a hose to soak the Envirostone in the cracks and also to further clean the flagstone surface. I thought things looked good, but after drying I found I was still left with some residue on my walking surface making my previously red flagstone look pink. On the whole, I think the patio still looks very good, but maybe a power washer would have served me better (if I could have avoided blasting the Envirostone out of the cracks). I don't own a power washer, but I may buy or rent one to try to further clean the surface of the flagstone. Short of that I'm hopeful the red color will return on its own in time.
The End!
]]>I modified my Zilch strategy generation software to model the scoring rules for the Super Farkle game available at Facebook. I wasn’t previously a Facebook user so I created my account just to try out my strategy. Over several days, I played about 180 games of Farkle and was winning about 55% of the time. But I’m not sure if this means much for a few reasons.
First, almost everyone I played, played very well. I guess this makes sense since in Super Farkle you play for chips; and if you don’t play reasonably well, it will be very difficult to win enough chips to play at the higher stakes tables. The people I played against rarely made strategy errors that cost more than a few points. One notable exception was the common mistake of taking twoones on an opening roll instead of just oneone. This play returns 57 fewer points for your expected score and it occurs with enough frequency to be significant in a typical game. But in general, I was impressed with how closely people played to the strategy that maximizes expected scores — especially at low turn score states. So if my strategy offered any advantage at all, it was probably very slight and I doubt 180 games was enough to make a clear differentiation.
Second, in Super Farkle whoever forms the table rolls first. The average score for a wellplayed Super Farkle turn is just under 550 points. One can argue the disadvantage to the player going second is half that or about 275 points. (Why don’t they just roll to see who goes first?) To make a fair test of the strategy, I should have played half my games by forming a new table, and played the other half by joining an existing table, but my competitive nature just wouldn’t allow me to concede 275 points to my opponent. So instead I spoiled my own test by always forming my own table. I suspect that the advantage of going first may have overshadowed any advantage my strategy was offering over the high quality play of my opponents.
Finally, and perhaps most importantly, there’s almost certainly something wrong with the Super Farkle dice. I detailed why I believe this to be so on the Farkle review page at Facebook. Here’s the text from that review:
This game is quite nice; but there is a serious problem. The probability of rolling a 6die FARKLE is exactly 1 chance in 43.2. You can find this calculation all over the web. Here’s one professor at Michigan Ann Arbor that shows the calculation: http://notaboutapples.wordpress.com/2009/07/27/multinomialcoefficientsandfarkle/
Apparently I’ve played about 180 games of Farkle, but I’ve never once thrown a 6 die Farkle. If you assume a typical game has 15 turns, then that’s 15 x 180 = 2700 6die rolls — and that’s not even considering hotdice rolls. The probability of not throwing even one 6 die farkle in that many rolls is exactly:
(1(1/43.2))^2700 = .000 000 000 000 000 000 000 000 000 344
If I’m counting my zeros right, that’s less than one chance in an octillion. Yes, octillion is a real number — a very very big number. So I suggest there is something wrong with the dice. Can I be sure there’s something wrong with the dice? Of course not, but I can say this. According to wikipedia, the visible universe is about 92 billion light years across. And 1 light year is about 6 trillion miles. And there are 5280 feet in a mile. If you lined people up one foot apart (you’d have to use skinny people) across our entire visible universe; and then sat them all down in front of their own laptop playing farkle; and had them all roll 6 dice overandover only stopping when they had their first 6die farkle; then you’d expect about ONE of them (yes just one) to go as far as 2700 rolls without farkling. I suppose I could be that one person….uhmmm…yeah…right.
Maybe some manager made a marketing decision that 6die farkles just annoyed people too much and the developers were simply asked to reroll the dice one time when a 6die farkle showed up. Or maybe they are just using a really bad random number generator for their dice rolling engine. Or maybe there’s something more insidious going on. But something is surely amiss.
Interestingly, shortly after I posted this review, I was mysteriously logged out of Facebook and subsequent login attempts were denied. Coincidence? In any case, my foray into Super Farkle play is ended. I played enough games to at least see that the strategy was doing very well — and was highly consistent with the play of seasoned Farkle addicts veterans.
]]>Zilch is a fun little dice game codified into an online game by Gaby Vanhegan that can be played at http://playr.co.uk/. Zilch is actually a variation of the game Farkle which goes by several other names including Zonk, 5000, 10000, Wimp Out, Greed, Squelch and Hot Dice^{1}. I've worked out the strategy that maximizes your expected game score and wanted to share the analysis, my strategy finder software, and the strategy itself. Depending on whether you have zero, one or two consecutive zilches from previous turns, three successively more conservative turnplay strategies are required to maximize your long term average score. Using these three strategies you rack up an average of 620.855 points per turn, which is the best you can possibly do.
Beyond the scope of Gaby's implementation of Zilch, the scoring rules of Farkle vary from venue to venue and the strategies provided here do not generally apply, but the analysis and the software do.
If you understand conditional probabilities, expectations, and can do a little algebra, you should be able to follow along. If you're just here to take the money and go pound someone in the game, you'll need to at least read and understand Strategy Formulation before you try to interpret the tables.
I've found several blog postings where folks have offered probabilistic analyses of various aspects of the game^{2,3,4}, but none (that I've seen) find the game strategy that maximizes your expected points. It is possible that I'm the first to publish these solutions. If not, it was still a fun problem. I've always enjoyed software, algorithms, optimization, and probabilities and this problem delves into all of these areas.
Zilch is played with two players and six sixsided dice. (Though really there's nothing to stop you playing with more people, but this is not supported in the online game.)
Each player takes turns rolling the dice. The dice in a roll can be worth points either individually or in combination. If any points are available from the roll, the player must set aside some or all of those scoring dice, adding the score from those dice to their point total for the turn. After each roll, a player may either reroll the remaining dice to try for more points or may bank the points accumulated this turn (though you can never bank less than 300 points).
If no dice in a roll score, then the player loses all points accumulated this turn and their turn is ended. This is called a zilch, a sorrowful event indeed.
If all dice in a roll score, the player gets to continue his turn with all six dice. This is called a free roll and is guaranteed to brighten your day.
A player may continue rolling again and again accumulating ever more points until he either decides to bank those points or loses them all to a zilch.
If a player ends three consecutive turns with a zilch, they not only lose their points from the turn but also lose 500 points from their banked game score. (This is the only way to lose banked points.) After a triple zilch, your consecutive zilch count is reset to zero so you're safe from another triple zilch penalty for at least three more turns.
The game ends when one player has banked a total of 10,000 points and all other players have had a final turn.
Scoring is as follows:
The strategy presented will maximize your expected Zilch scores, but this is not necessarily the same strategy that will let you reach 10,000 points in the fewest number of turns; and certainly falls short of giving a complete gaming strategy that will maximize your chances of winning the game^{5}, the holy grail of Zilch analysis. In particular, the strategy considers neither your current overall score, nor your opponent's score, nor the fact that the game ends when a player reaches 10,000 points (after the other player gets a final turn). All that I offer is a way to maximize your expected Zilch scores.
My intuition is that when you're in the lead you should play more conservatively; and when you're behind you should play more aggressively. (Though I think it a common mistake to be too aggressive too early when behind.) Consider this extreme example. Let's say you're currently beating your opponent 7500 to 1500 and it's your turn. On your turn you rack up 2500 points and are faced with the choice of either banking the 2500 or rolling five dice to go for more points. The strategy identified here advises you to roll the five dice; but surely in this case it is better to bank the 2500, putting you at the game goal of 10,000 points and forcing your opponent to try to put out 8550 points in a single turn to steal the win away from you.
I will start by showing how to maximize the expected points for a particular turn. Because of the three consecutive zilch rule, the strategy that actually maximizes the average points gained across all turns is different: it is possible to trade off some expected gain in those turns where you have either zero or one consecutive zilches to reduce your zilch probability and more strongly avoid even getting into a turn where you are facing your third consecutive zilch. I will solve for this more complete strategy later, but for now let's stick with maximizing the expected points for a single turn and just ignore how such a greedy strategy might negatively affect the outcome of subsequent turns.
For my purposes, a Zilch turn has a state that may be completely defined by two variables (s, n) where s is the number of points accumulated in the current turn, and n is the number of dice you are about to roll. At the beginning of a new turn, the turn state is (s=0, n=6). Let's say for your opening roll you throw:
1, 3, 3, 4, 4, 6
The turn state will then advance to (s=100, n=5). You actually have no choice here: you must always select at least one scoring die and since the 1 (worth 100 points) is the only scoring die, you must select it. Furthermore, you are not allowed to bank less than 300 points so you must roll the five remaining dice.
Suppose with the remaining 5 dice you roll:
1, 1, 2, 3, 5
Here you have three scoring dice: two 1s and a 5. You now have a choice of turn states that you may enter:
Note that s includes not just the points taken from this roll, but also all points accumulated in previous rolls during this turn as well. It should be clear that state B is better than A and state D is better than C. Of the remaining three states (B, D and E) it's not so obvious which is better. You also have the option of banking from either of states D or E (but not from B since you don't have 300 points in that case). Obviously, banking from state D is just plain dumb: if you're going to bank you'll do so from state E to bank as many points as you can! That leaves you with four reasonable choices:
My objective is to find the optimal turn play strategy that defines what to do in all such situations which when followed will maximize the expected number of points for the entire turn starting from any given turn state.
Let E(s, n) be the expected number of additional points you will gain for the turn if you (perhaps nonoptimally) roll while in state (s, n) but then follow the optimal turn strategy (which we hope to find) for all subsequent decisions in the turn. Note that E(s, n) includes not just the expected points for the upcoming roll, but all the expected points from all subsequent rolls, if any, as dictated by chance and the optimal play strategy.
Suppose we somehow solve for E(s, n) and find that:
E(s=200, n=4) =  149  
E(s=300, n=3) =  34  
E(s=350, n=2) =  20 
Applying this information to the example leads to the following final expected scores for the turn.
So, the choice that leads to the highest expected score for the turn is to bank the 350 points. From this example, it should be clear that if we can find E(s, n) for all possible game states (s, n) we'll have the optimal Zilch turn play strategy.
Let,
T(s, n) =  {  s + max(0, E(s, n)) s + E(s, n) 
for s ≥ 300 for s < 300 
(1) 
T(s, n) is simply the total expected points for the turn given that you are in turn state (s, n) and you follow the optimal strategy. The special case for s < 300 models the rule that you can't bank less than 300 points. The max function used when s ≥ 300 models the requirement that you bank when E(s, n) is negative, and roll otherwise.
Suppose we are in some particular state (S, N) then let r_{1}, r_{2}, … r_{R} be all possible rolls of N dice that do not result in a zilch. For any given roll r_{i} you can potentially enter multiple game states (s_{1}, n_{1}), (s_{2}, n_{2}), … (s_{K}, n_{K}) (depending on which combination of scoring dice you choose — just like in the previous example). Define C(r_{i}, S, N) to be the particular scoring combination among all scoring combinations possible with roll r_{i} that when applied to turn state (S, N) will advance the turn to the new state (S_{i}, N_{i}) that maximizes T. What could be simpler? Let C_{S} be the number of points taken in scoring combination C, and let C_{N} be the number of dice used in scoring combination C.
I also need a simple little function F(n) to reset the state variable n back to 6 when a score is selected that uses all remaining dice:
F(n) =  {  6 n 
for n = 0 for n ≠ 0 
(2) 
We can now express E(S, N) as a weighted sum of the expected scores of all states reachable from (S, N):
E(S, N) = p_{N}(S+y) +  ∑_{i}  T(S_{i}, N_{i})  S 6^{N} 
(3) 
where
S_{i}  =  S + C_{S}(r_{i}, S, N) 
N_{i}  =  F(N  C_{N}(r_{i}, S, N)) 
p_{N}  =  probability of zilching when you roll N dice 
y  =  zilch penalty. 
To handle the three zilch rule, I've introduced the constant y which gives the additional penalty (beyond loss of all turn points) for rolling a zilch. Setting y to 0 models turns where you have only zero or one consecutive zilches. Setting y to 500 models turns where you are playing with two consecutive zilches. As we shall see, these two cases will lead to two different turn play strategies.
The term p_{N}(S+y) gives the expected decrease in your score due to the likelihood of a zilch. The terms (T(S_{i}, N_{i})  S) give the expected increase in your score given that you throw r_{i}. Summing over all possible r_{i} and multiplying by the probability of rolling any particular r_{i} gives the appropriate weighted sum.
Equation 3 expresses E(S, N) in terms of the T values of all the game states reachable from (S, N). But here's the important thing: any game state (s, n) reachable by any roll r from (S, N) has s > S. (Your score can only go up if you don't zilch and by definition r is not a zilching roll.) So, if we already know T(s, n) for all s > S, then we can calculate E(S, N) using the above summation.
I claim there exists some large value of accumulated turn points S_{BIG} where the optimal turn play strategy is to always bank when faced with rolling less than six dice and to always roll when you have six dice to roll. If I set S_{BIG} equal a million points, then I'm claiming that if you've somehow accumulated a million or more points on the current turn (an absurdly large number of points to be sure) you'll want to bank them if you're ever faced with rolling five (or fewer) dice: the 7% chance of losing all of your points far outweighs any comparatively meager gains you might achieve by continuing to roll. This claim is equivalent to saying:
E(s, n) < 0 for s ≥ S_{BIG}, 1 ≤ n ≤ 5  (4) 
Now if you have six dice to roll, you risk nothing so you might as well further insult your opponent by adding to your million point score. The number of points you expect to gain in this situation through the end of your turn is a constant:
E(s, n) = E_{BIG6} for s ≥ S_{BIG}, n = 6  (5) 
Here's how to solve for E_{BIG6}. Let
E_{B}  =  the expected number of points gained from a single roll of 6 dice given that the roll does not grant another free roll (so you have to bank).  
E_{F}  =  the expected number of points gained from a single roll of 6 dice given that the roll does grant another free roll.  
p_{F}  =  probability of a 6 die roll granting another free roll. 
These terms are easily calculable by simply enumerating all the six die rolls and determining the best possible scoring combination in each case. (There's a subtlety here I'm not going to bore you with regarding how to score a roll of four 1s and a pair of either 2s, 3s, 4s or 6s; I explain this in detail in the software comments for the interested reader.) Once found they can be used in the following sum:
E_{BIG6} = (1p_{F}) E_{B} + p_{F} (E_{F} + (1p_{F}) E_{B} + p_{F} (E_{F} + ... ))  (6) 
This nicely simplifies to,
E_{BIG6} =  E_{B} +  p_{F} 1p_{F} 
E_{F}  (7) 
Combining Equations 1, 4 and 5 give
T(s, n) =  {  s + E_{BIG6} s 
for s ≥ S_{BIG}, n = 6 for s ≥ S_{BIG}, 1 ≤ n ≤ 5 
(8) 
Knowing T(s, n) for s ≥ S_{BIG}, we can now use Equation 3 to iteratively calculate E(S_{BIG}  50, n), E(S_{BIG}  100, n), … E(0, n). The rest is just the grunt work of writing the software to implement the curious function C; solving for E_{BIG6}; and solving for all E(s, n) for s < S_{BIG}. (Did I just slander my own profession?) But before we start grunting let's see what we can do about the three consecutive zilch problem.
There are actually three different types of turns:
Using the technique already described, we can find the strategies that will maximize the expected points in each of these turns independently, but what we really want is a strategy for each turn that when used together will maximize the average score for all of these turn types when weighted by the frequency of the appearance of the turn type in a game.
If z_{i} is the probability of zilching while in turn type T_{i} (while following some strategy designed specifically for that turn type) then we have the state transition diagram shown in Figure 1.
Performing a steady state analysis of this system we can find the probability t_{i} of being in any particular state T_{i}. (I.e., we want to find what fraction of our turns will be of each type.) We have these flow equations which must balance:
t_{0} = (1z_{0})t_{0} + (1z_{1})t_{1} + t_{2} t_{1} = z_{0}t_{0} t_{2} = z_{1}t_{1} 
Also
t_{0} + t_{1} + t_{2} = 1 
Solving gives
t_{0} = 1 / (1 + z_{0} + z_{0}z_{1}) t_{1} = z_{0} / (1 + z_{0} + z_{0}z_{1}) t_{2} = z_{0}z_{1} / (1 + z_{0} + z_{0}z_{1}) 
Define E_{i} to be the expected points gained for a turn of type T_{i}. Then the average score for all turns is:
E_{AVG} = t_{0}E_{0} + t_{1}E_{1} + t_{2}E_{2} 
or
E_{AVG} =  E_{0} + z_{0}E_{1} + z_{0}z_{1}E_{2} 1 + z_{0} + z_{0}z_{1} 
(9) 
E_{AVG} is what we want to maximize. Both E_{i} and z_{i} are just a function of the strategy used to play a turn of type T_{i}. The strategy employed for T_{2} only affects E_{2}, so E_{2} can be independently maximized — something we already know how to do. That leaves the strategies for T_{0} and T_{1}. E_{2} is the term that's pulling down our average score since it's the turn played with the 500 point penalty for zilching. Can we modify our strategies for T_{0} and/or T_{1} in such a way so as to trade off some of our expected gains in those turns to reduce the coefficient z_{0}z_{1} on E_{2} and thereby actually increase E_{AVG}?
In Equation 3, I introduced the variable y to model the penalty for a zilch in a game. I said it should be set to 0 normally, but set to 500 if we are playing a turn where the third consecutive zilch is imminent. If we extend this idea and allow y to become a free variable, we can examine different levels of tradeoff between expected score and the probability of zilching. For each y value, we'll find the optimal strategy given that zilch penalty; and then find both the expected number of points per turn and the probability of zilching on the turn for that strategy. E_{AVG} then becomes a function of just two variables y_{0} and y_{1}. We then need only to find the particular values Y_{0} and Y_{1} that maximize E_{AVG}. Piece of cake!
When doing this analysis, it's important to understand that the penalties y_{0} and y_{1} are artificial. The true zilch penalty for these turns is of course zero. Accordingly, the values calculated for E(s, n) will not represent the true expected change in points for the turn from state (s, n). But the values E(s, n) do still define a strategy, dictating that you roll if E(s, n) is positive, and that you bank when E(s, n) is negative. Likewise, the E(s, n) values are still used in the normal way to determine which state among the reachable states after a roll is most desirable. To get the actual expected increase in score from state (s, n), you must add back the false zilch penalty times the probability of zilching for the remainder of the turn. Although you could calculate this for all states (s, n); we only really need to know the true expectation for the turn as a whole, which we can get by correcting E(0, 6). This gives rise to the notion of a corrected expectation for the turn:
E_{C} = E(0,6) + yz  (10) 
Enough analysis! On to the results! Now I am become death, destroyer of Zilch.
I wrote a little java program that solves for E_{BIG6}; finds E(s,n) for a supplied zilch penalty, y; for that strategy, calculates the probability of zilching, z; and also outputs the corrected expected points for the turn, E_{C}. Running the software for the case y=0 we get:
E_{BIG6} =  478.237  
E_{C} = E(0, 6) =  623.017  
z =  .193326 
So the best you can do for a single turn is to rack up an average of about 623 points, and zilch about 1 time in 5. I'll get to the actual strategy tables shortly, but first let's solve for the optimal strategies required for the three zilch rule for turn types T_{0}, T_{1} and T_{2}.
Finding the best strategy for T_{2} is easy: just set y=500 and you get these results:
E_{2} = E(0, 6) =  547.157  
z_{2} =  .132148 
You don't use E_{C} here since the 500 point zilch penalty is not artificial but real. This penalty reduces the maximum expected points per turn by about 12%. The more conservative play required here also reduces the zilch probability by about a third.
To find the best strategies for T_{0} and T_{1} we need to let the zilch penalty for those two turn types (y_{0} and y_{1}) vary and then maximize E_{AVG} as given by Equation 9. Table 1 shows how varying the penalty for zilching (y) affects the probability of zilching (z) and the corrected expected points per turn (E_{C}). Due to the integral nature of the problem, there are fairly large ranges of y that have no affect on the strategy. I'm only listing y values among those tried that produced a strategy change:
y  z  E_{C} 

0  0.193326  623.017489 
10  0.193326  623.017488 
15  0.193302  623.017141 
17  0.193296  623.017049 
22  0.190399  622.955542 
24  0.182110  622.759187 
26  0.178151  622.657753 
27  0.177759  622.647306 
30  0.177757  622.647238 
38  0.177723  622.645977 
42  0.177619  622.641662 
44  0.174575  622.509618 
65  0.174551  622.508057 
67  0.174543  622.507569 
68  0.170991  622.268940 
72  0.170988  622.268745 
77  0.170631  622.241338 
80  0.170620  622.240487 
88  0.170484  622.228569 
92  0.170389  622.219825 
115  0.170387  622.219696 
117  0.170383  622.219130 
122  0.157678  620.678963 
127  0.157507  620.657322 
130  0.157498  620.656168 
138  0.157427  620.646349 
142  0.157364  620.637441 
165  0.157362  620.637123 
167  0.157357  620.636297 
172  0.157356  620.636150 
177  0.157286  620.623706 
180  0.157271  620.620945 
188  0.157239  620.615023 
192  0.157171  620.602036 
203  0.144131  617.962533 
215  0.144129  617.962108 
217  0.144120  617.960165 
222  0.143469  617.816023 
227  0.143448  617.811242 
230  0.143424  617.805798 
231  0.142245  617.534058 
235  0.140672  617.165031 
238  0.140661  617.162252 
242  0.140573  617.141121 
265  0.140572  617.140733 
267  0.140558  617.137106 
272  0.140556  617.136678 
277  0.140553  617.135694 
280  0.140521  617.126870 
288  0.140519  617.126208 
292  0.140413  617.095273 
315  0.140411  617.094663 
317  0.140401  617.091542 
322  0.140391  617.088225 
327  0.140390  617.088121 
330  0.140376  617.083493 
338  0.140376  617.083407 
340  0.140343  617.072035 
342  0.140031  616.965512 
365  0.140029  616.964776 
367  0.139995  616.952543 
372  0.139981  616.947224 
380  0.139967  616.942009 
392  0.139576  616.788919 
415  0.139575  616.788296 
417  0.139555  616.780056 
422  0.139544  616.775513 
430  0.139522  616.765914 
442  0.139326  616.679483 
465  0.139325  616.678925 
467  0.139318  616.675611 
472  0.139308  616.670956 
480  0.139279  616.657114 
481  0.132230  613.270797 
492  0.132148  613.230640 
Pumping this table through a little awk script (which I hacked out at a command prompt and didn't save for you), I found that E_{AVG} is maximized when Y_{0} = 0 and Y_{1} = 72. Here are the summary statistics:
y_{0} =  0  
z_{0} =  .193326  
E_{0} =  623.017  
y_{1} =  72  
z_{1} =  .170988  
E_{1} =  622.269  
y_{2} =  500  
z_{2} =  .132148  
E_{2} =  547.157 
E_{AVG} = 620.855 
For turn type T_{0}, you're best off just going for the maximum expected points possible: trying to play more conservatively doesn't reduce your zilch probability (or the probability of entering state T_{2}) enough to offset the corresponding loss in expected points for turns of type T_{0}.
For turn type T_{1} (when you've got one consecutive zilch) you're best off pretending you will be penalized an extra 72 points if you zilch. This reduces your expected score by only 0.2% but reduces your probability of zilching by about 10%. This little extra protection against your third consecutive zilch slightly increases your overall average turn scores.
Let's move on to the actual strategies.
Table 2 below gives E(s, n) for all s ≤ 3200 for the case y = 0. This is the strategy achieving the maximum expected points for a turn and is the best strategy to use if you didn't zilch on your previous turn. The first table entry is E_{BIG6} = 478.237. The last table entry gives the total expected points for the turn: E_{0} = 623.017. The probability of zilching for the entire turn (not shown in the table) is z_{0} = .193326.
n  

s  6  5  4  3  2  1 
3200  478.237  6.608  340.997  775.515  1319.085  1948.921 
3150  478.237  2.750  333.126  761.626  1296.863  1915.588 
3100  478.237  1.108  325.256  747.737  1274.640  1882.254 
3050  478.323  4.966  317.386  733.848  1252.418  1848.921 
3000  478.706  8.824  309.515  719.959  1230.196  1815.573 
2950  479.301  12.682  301.645  706.070  1207.971  1782.162 
2900  479.897  16.540  293.774  692.181  1185.734  1748.665 
2850  480.492  20.398  285.904  678.291  1163.471  1715.134 
2800  481.088  24.256  278.033  664.394  1141.189  1681.602 
2750  481.683  28.114  270.161  650.488  1118.900  1648.070 
2700  482.278  31.973  262.286  636.578  1096.612  1614.538 
2650  482.874  35.833  254.407  622.667  1074.324  1581.006 
2600  483.470  39.694  246.527  608.754  1052.035  1547.475 
2550  484.069  43.558  238.645  594.840  1029.747  1513.943 
2500  484.677  47.423  230.761  580.924  1007.458  1480.410 
2450  485.290  51.290  222.876  567.008  985.170  1446.876 
2400  485.975  55.157  214.989  553.089  962.881  1413.339 
2350  486.949  59.026  207.101  539.170  940.591  1379.789 
2300  488.222  62.896  199.211  525.251  918.299  1346.179 
2250  489.496  66.767  191.320  511.331  895.995  1312.472 
2200  490.771  70.640  183.429  497.410  873.664  1278.714 
2150  492.048  74.513  175.538  483.482  851.309  1244.955 
2100  493.326  78.386  167.645  469.545  828.945  1211.197 
2050  494.604  82.260  159.748  455.601  806.581  1177.438 
2000  495.884  86.136  151.848  441.655  784.217  1143.678 
1950  497.164  90.013  143.944  427.706  761.853  1109.919 
1900  498.448  93.894  136.037  413.755  739.488  1076.159 
1850  499.740  97.776  128.128  399.802  717.124  1042.398 
1800  501.041  101.661  120.218  385.848  694.759  1008.635 
1750  502.344  105.547  112.306  371.893  672.394  974.870 
1700  503.867  109.434  104.392  357.936  650.029  941.102 
1650  505.684  113.323  96.476  343.979  627.662  907.298 
1600  507.502  117.213  88.558  330.022  605.289  873.408 
1550  509.327  121.105  80.641  316.064  582.895  839.469 
1500  511.172  124.998  72.723  302.102  560.480  805.529 
1450  513.033  128.891  64.805  288.132  538.055  771.583 
1400  514.894  132.784  56.883  274.156  515.630  737.633 
1350  516.756  136.679  48.958  260.177  493.203  703.679 
1300  518.619  140.575  41.030  246.196  470.774  669.725 
1250  520.484  144.474  33.100  232.212  448.345  635.771 
1200  522.356  148.374  25.167  218.227  425.916  601.816 
1150  524.236  152.277  17.233  204.240  403.487  567.860 
1100  526.119  156.180  9.297  190.252  381.057  533.901 
1050  528.180  160.085  1.360  176.263  358.627  499.941 
1000  530.368  163.992  6.579  162.273  336.196  465.950 
950  532.560  168.763  14.520  148.283  313.760  431.909 
900  534.870  174.577  22.461  134.293  291.310  397.845 
850  537.684  180.570  30.403  120.299  268.848  363.762 
800  540.959  186.564  38.345  106.301  246.379  329.574 
750  544.307  192.559  46.289  92.299  223.889  295.226 
700  547.655  198.555  54.235  78.294  201.356  260.789 
650  551.006  204.879  62.183  64.276  178.780  226.340 
600  554.457  212.146  70.136  50.240  156.188  191.890 
550  558.365  219.989  78.096  36.194  133.594  157.423 
500  562.820  227.838  86.062  22.143  110.997  122.863 
450  567.530  235.694  94.033  8.089  88.381  88.136 
400  572.248  243.557  102.010  5.970  65.722  53.275 
350  576.985  251.428      43.013  18.370 
300  581.746      34.134  20.274  16.539 
250        48.243  6.148  51.455 
200      149.232  64.645  40.331   
150      163.981  91.507     
100    306.667  184.939       
50    322.318         
0  623.017           
Using this table, you can easily figure out what to do in any turn play situation. Consider these examples.
Table 3 below gives E(s, n) for all s ≤ 3200 for the case y = 72. This is the optimal strategy for turns of type T_{1} (when you're playing with one consecutive zilch). The corrected expected points for the turn is: E_{1} = 622.269. The probability of zilching for the entire turn is z_{1} = .170988.
n  

s  6  5  4  3  2  1 
3200  478.237  12.164  352.330  795.515  1351.085  1996.921 
3150  478.237  8.306  344.460  781.626  1328.863  1963.588 
3100  478.237  4.448  336.589  767.737  1306.640  1930.254 
3050  478.237  0.590  328.719  753.848  1284.418  1896.921 
3000  478.237  3.268  320.848  739.959  1262.196  1863.588 
2950  478.490  7.126  312.978  726.070  1239.974  1830.254 
2900  479.039  10.984  305.108  712.181  1217.751  1796.879 
2850  479.635  14.842  297.237  698.292  1195.522  1763.412 
2800  480.230  18.700  289.367  684.403  1173.271  1729.888 
2750  480.826  22.558  281.497  670.510  1150.994  1696.356 
2700  481.421  26.416  273.625  656.607  1128.707  1662.824 
2650  482.016  30.275  265.751  642.699  1106.419  1629.292 
2600  482.612  34.134  257.874  628.788  1084.130  1595.760 
2550  483.208  37.995  249.995  614.876  1061.842  1562.229 
2500  483.804  41.858  242.113  600.962  1039.554  1528.697 
2450  484.408  45.722  234.230  587.047  1017.265  1495.165 
2400  485.020  49.588  226.345  573.131  994.977  1461.631 
2350  485.634  53.456  218.460  559.214  972.688  1428.095 
2300  486.436  57.324  210.572  545.295  950.399  1394.558 
2250  487.662  61.193  202.683  531.375  928.109  1360.988 
2200  488.935  65.064  194.792  517.456  905.813  1327.317 
2150  490.210  68.936  186.901  503.536  883.495  1293.567 
2100  491.486  72.808  179.010  489.612  861.147  1259.809 
2050  492.764  76.682  171.118  475.678  838.785  1226.051 
2000  494.042  80.555  163.224  461.737  816.421  1192.292 
1950  495.321  84.430  155.325  447.791  794.057  1158.532 
1900  496.600  88.307  147.422  433.843  771.693  1124.773 
1850  497.882  92.186  139.516  419.893  749.329  1091.013 
1800  499.169  96.067  131.608  405.942  726.964  1057.253 
1750  500.468  99.951  123.698  391.988  704.600  1023.492 
1700  501.770  103.837  115.788  378.034  682.235  989.727 
1650  503.075  107.723  107.874  364.077  659.870  955.960 
1600  504.884  111.611  99.959  350.120  637.503  922.192 
1550  506.702  115.501  92.042  336.163  615.137  888.340 
1500  508.520  119.393  84.125  322.206  592.755  854.402 
1450  510.357  123.285  76.207  308.248  570.346  820.463 
1400  512.214  127.178  68.289  294.281  547.922  786.520 
1350  514.075  131.071  60.370  280.306  525.497  752.572 
1300  515.936  134.965  52.446  266.328  503.071  718.619 
1250  517.799  138.861  44.519  252.348  480.643  684.665 
1200  519.663  142.758  36.590  238.365  458.214  650.711 
1150  521.529  146.658  28.658  224.381  435.785  616.756 
1100  523.408  150.559  20.724  210.394  413.356  582.801 
1050  525.290  154.463  12.789  196.408  390.926  548.844 
1000  527.217  158.367  4.853  182.418  368.496  514.884 
950  529.405  162.273  3.086  168.429  346.066  480.915 
900  531.595  166.585  11.026  154.439  323.633  446.896 
850  533.840  171.940  18.967  140.449  301.191  412.833 
800  536.398  177.933  26.908  126.458  278.733  378.761 
750  539.486  183.927  34.850  112.461  256.266  344.627 
700  542.833  189.921  42.793  98.460  233.787  310.353 
650  546.182  195.917  50.738  84.457  211.274  275.947 
600  549.531  201.970  58.685  70.445  188.717  241.498 
550  552.913  208.696  66.637  56.417  166.130  207.048 
500  556.531  216.537  74.593  42.375  143.535  172.593 
450  560.750  224.384  82.556  28.326  120.940  138.093 
400  565.457  232.237  90.525  14.273  98.336  103.453 
350  570.171  240.097      75.702  68.632 
300  574.899      13.848  53.014  33.729 
250        27.930  30.282  1.178 
200      130.189  35.303  7.274   
150      141.074  47.867     
100    288.116  151.530       
50    298.518         
0  609.958           
Table 4 below gives E(s, n) for all s ≤ 3200 for the case y = 500. This is the optimal strategy for turns of type T_{2} (when you're playing with two consecutive zilches). The expected points for the turn is: E_{2} = 547.157. The probability of zilching for the entire turn is z_{2} = .132148.
n  

s  6  5  4  3  2  1 
3200  478.237  45.189  419.700  914.403  1541.307  2282.254 
3150  478.237  41.331  411.830  900.515  1519.085  2248.921 
3100  478.237  37.472  403.960  886.626  1496.863  2215.588 
3050  478.237  33.614  396.089  872.737  1474.640  2182.254 
3000  478.237  29.756  388.219  858.848  1452.418  2148.921 
2950  478.237  25.898  380.348  844.959  1430.196  2115.588 
2900  478.237  22.040  372.478  831.070  1407.974  2082.254 
2850  478.237  18.182  364.608  817.181  1385.751  2048.921 
2800  478.237  14.324  356.737  803.292  1363.529  2015.588 
2750  478.237  10.466  348.867  789.403  1341.307  1982.254 
2700  478.237  6.608  340.997  775.515  1319.085  1948.921 
2650  478.237  2.750  333.126  761.626  1296.863  1915.588 
2600  478.237  1.108  325.256  747.737  1274.640  1882.254 
2550  478.323  4.966  317.386  733.848  1252.418  1848.921 
2500  478.706  8.824  309.515  719.959  1230.196  1815.573 
2450  479.301  12.682  301.645  706.070  1207.971  1782.162 
2400  479.897  16.540  293.774  692.181  1185.734  1748.665 
2350  480.492  20.398  285.904  678.291  1163.471  1715.134 
2300  481.088  24.256  278.033  664.394  1141.189  1681.602 
2250  481.683  28.114  270.161  650.488  1118.900  1648.070 
2200  482.278  31.973  262.286  636.578  1096.612  1614.538 
2150  482.874  35.833  254.407  622.667  1074.324  1581.006 
2100  483.470  39.694  246.527  608.754  1052.035  1547.475 
2050  484.069  43.558  238.645  594.840  1029.747  1513.943 
2000  484.677  47.423  230.761  580.924  1007.458  1480.410 
1950  485.290  51.290  222.876  567.008  985.170  1446.876 
1900  485.975  55.157  214.989  553.089  962.881  1413.339 
1850  486.949  59.026  207.101  539.170  940.591  1379.789 
1800  488.222  62.896  199.211  525.251  918.299  1346.179 
1750  489.496  66.767  191.320  511.331  895.995  1312.472 
1700  490.771  70.640  183.429  497.410  873.664  1278.714 
1650  492.048  74.513  175.538  483.482  851.309  1244.955 
1600  493.326  78.386  167.645  469.545  828.945  1211.197 
1550  494.604  82.260  159.748  455.601  806.581  1177.438 
1500  495.884  86.136  151.848  441.655  784.217  1143.678 
1450  497.164  90.013  143.944  427.706  761.853  1109.919 
1400  498.448  93.894  136.037  413.755  739.488  1076.159 
1350  499.740  97.776  128.128  399.802  717.124  1042.398 
1300  501.041  101.661  120.218  385.848  694.759  1008.635 
1250  502.344  105.547  112.306  371.893  672.394  974.870 
1200  503.867  109.434  104.392  357.936  650.029  941.102 
1150  505.684  113.323  96.476  343.979  627.662  907.298 
1100  507.502  117.213  88.558  330.022  605.289  873.408 
1050  509.327  121.105  80.641  316.064  582.895  839.469 
1000  511.172  124.998  72.723  302.102  560.480  805.529 
950  513.033  128.891  64.805  288.132  538.055  771.583 
900  514.894  132.784  56.883  274.156  515.630  737.633 
850  516.756  136.679  48.958  260.177  493.203  703.679 
800  518.619  140.575  41.030  246.196  470.774  669.725 
750  520.484  144.474  33.100  232.212  448.345  635.771 
700  522.356  148.374  25.167  218.227  425.916  601.816 
650  524.236  152.277  17.233  204.240  403.487  567.860 
600  526.119  156.180  9.297  190.252  381.057  533.901 
550  528.180  160.085  1.360  176.263  358.627  499.941 
500  530.368  163.992  6.579  162.273  336.196  465.950 
450  532.560  168.763  14.520  148.283  313.760  431.909 
400  534.870  174.577  22.461  134.293  291.310  397.845 
350  537.684  180.570      268.848  363.762 
300  540.959      106.301  246.379  329.574 
250        92.299  223.889  295.226 
200      37.142  128.047  266.961   
150      8.190  189.755     
100    196.017  32.853       
50    167.775         
0  547.157           
Comparing Table 4 with Table 2 you can see that playing with two consecutive zilches is almost identical to playing without any consecutive zilches while pretending you have 500 more points for the turn than you really do. To see this compare the 500 point line in Table 4 with the 1000 point line in the Table 2. They are identical. This remains true until you get down to point values below 300 at which time the 300 point minimum bank rule forces you to roll even though rolling gives a negative expected change in your score.
The software I wrote to find optimized Zilch strategies is 718 lines of java code (or 322 lines comments stripped). Please observe the GNU public license copyright protection or I may have to introduce you to my friend Guido. You can download with either zip or tgz compression as convenient:
Downloads: zilch.zip OR zilch.tgz
The compile command is simply:
javac Zilch.java
Then to run it type:
java Zilch
You can optionally add a zilch penalty to the command line. For example, to run the program with y = 500 type:
java Zilch 500
To find the best strategy for variations of the game that use different scoring rules, just change the scoring constants at the top of the file. If you set a score to zero, then that score combination is effectively eliminated from the game and is instead treated as a zilch. So if in your Farkle variant, the sixdie nothing roll is just a zilch, you need only set NOTHING_SCORE = 0. The software will then interpret this as a zilching roll.
E_{BIG6} is calculated for you. If you set the NOTHING_SCORE to 0, (giving you a nonzero chance of zilching on a 6 die roll) then E_{BIG6} will be correctly initialized to 0. There's a chickenandegg problem associated with the calculation of E_{BIG6} which required a bit of finger work to resolve reliably in the face of various possible scoring changes. Check out the comments for method initEBIG6 if you're interested.
The smallest valid value for S_{BIG} is also determined through a binary search, so you need not worry about changing that for different scoring options.
On my 10 year old home computer, the strategy for Zilch is determined in about 7 seconds. Farkle strategies take about 30 seconds. (Farkle is much slower because S_{BIG} has to be set big enough that the chance of a 6 die farkle outweighs the potential gains of a 6 die roll.) No doubt you could solve these same problems in a tiny fraction of a second with appropriate optimizations, but I personally don't have a need for better performance.
This was a fun problem. The trick is to work the problem backwards: finding the expected scores for high point states first, and then working your way back down to lower and lower scores until finally you get the expected score from the starting state. Everything else is just details (which hopefully I've gotten correct).
One thing I found surprising about the results is just how incredibly insensitive the corrected expected turn score is to the zilch penalty. The optimal strategy for the case of an infinite zilch penalty drops the probability of zilching from .193326 down to .126959 (the minimum zilch probability you can achieve for a turn). Playing that same strategy on a turn where the zilch penalty is actually zero drops your expected score from 623.017 down to 605.851 — it only costs you 17 points per turn! That's less than 300 points over the course of a typical game, and that's nothing in a game of Zilch. I think this is true because almost all the big points in Zilch come from 6 die rolls where there's no chance to zilch. So, playing to reach 300 points as reliably as you can and then banking as soon as you face a roll of less than six dice reduces your expected scores very little compared to playing for maximum expected points. I found this very surprising and somehow unsatisfying.
I'd be pleased to know if you found this document comprehensible; or if you found any errors in the analysis or the software. Leave a comment or send me an email. If you're lucky, you might even meet me masquerading as pips in a game of Zilch. Just don't expect to win.