Sunday, 6 May 2007

Matchpeg Sudoku

We're often asked, "Why does Matchpeg have a Sudoku solver?" And also, though not usually by the same people, "How does it work?"

Important information for non-Brits: it is compulsory for every British newspaper to print at least one Sudoku puzzle daily. The freesheets for London commuters typically have three (which gives you an insight into the average journey time of a commuter into London, and the fact that these freesheets are only distantly related to "newspapers", and have rather more in common with the puzzle compendiums which you can buy in any airport before undertaking a seventeen hour flight in steerage).

More useful background information: British newspapers are well known for their bouts of collective mania. Their peerless executives reach a more-or-less simultaneous decision to fill them with spot-the-ball competitions, bingo, vouchers for free flights and meals in restaurants, free DVDs. Sudoku was the last-but-one or -two of these crazes.

The creation of Matcheg Sudoku was prompted by The Guardian printing one of these witless puzzles on every single page of its G2 supplement. This spoiled a perfectly nice breakfast, and writing a tool to solve them provided a handy little intellectual challenge to kick off the working day.

We stuck the solver on the site because we feel it chimes quite nicely with Matchpeg's aim of solving business frustrations.


How it works.

From a programming perspective, solving a Sudoku puzzle is mainly a simple exercise in recursion. The best method seems to be to keep track of all the values which a particular cell can't be. For example, if the cell in the top-left corner is known to contain 5, then the twenty other cells in the first row, first column, and first block can't contain 5.

Each time you set a known value, you record these 20 knock-on values. And each time you record a value which a cell can't be, you do two checks:

  • Does the cell now have eight values which it can't be? If so, it must be the ninth.
  • Is it now known that eight of the cells in a row, column, or block can't be a particular value? If so, then the ninth cell must contain that value.

When you discover a new value as a result of either of the above, you fill it in and process the 20 new implications of that. So, setting a single cell triggers recursive calls setting the values of other cells.

This method by itself solves the vast majority of Sudoku puzzles - typically, everything other than those rated as "very hard".

The next step involves looking for cells in a row, column, or block where only two of the cells can contain a pair of values: for example, there are only two cells which can contain 4, and the same two cells are also the only ones which can contain 8. If so, then you know that those two cells must contain either 4 or 8, and you can eliminate anything else on their can't-be list.

We've never seen a puzzle where this extra step actually gives a solution rather than just narrowing things down a bit. You could apply further subtle logic to the solution, but that's more computationally expensive than simply brute-forcing the puzzle from this point onwards. In fact, the check for pairs usually takes longer than its benefits merit: it's quicker to skip this check, and just brute-force the solution without first looking for pairs.

So, if the Matchpeg solver can't work out the puzzle through simple logic, it starts guessing. This falls into two parts: trying out guesses for single cells in isolation, and and then trying out a combination of guesses for multiple cells if necessary.

The first part simply consists of going down the grid, taking every value which each individual unknown cell could still be, and seeing what happens if that value is filled in. There are three possible results from trying out each one of these isolated hypotheses:

  • The value can't be right. Its knock-on effects are that the grid must be invalid (i.e. duplicate values in a row, column, or block).
  • Filling in that one value solves the puzzle. In other words, the hypothesis becomes a lucky guess.
  • It makes no difference. Filling in the value neither solves the puzzle nor is impossible.

Recording the hypotheses which can't be right (and their knock-on effects) usually means that you then hit a "lucky guess" very quickly. You typically only need to try out about 10 hypotheses before you get a solution.

In a very, very small number of cases, no single cell by itself gives the solution. You have to try out a combination of guesses for multiple cells at once.

The Matchpeg solver builds a list of the hardest remaining cells, and then tries all the possible combinations of the two hardest cells. If that doesn't work, it tries all the permuations of the three hardest. And then four, and then five.

However, we've never seen a puzzle which requires guessing four or five cells at once. The "world's hardest Sudoku puzzle" gets solved from the combination of guesses for only three cells.