Pratap R Chakravarthy: Concurrency with Erlang - Making the algorithm concurrent (part-3)

This is a three part write-up explaining,

The algorithm which is used as a dipstick. (part-1)
Independent algorithms executed concurrently, measuring its performance. (part-2)
Re-designing the algorithm to measure concurrency at algorithmic level (part-3).

In part 2 we saw that spawning separate processes to solve each puzzle scaled well with multiple cores. We also saw that there can be pathological input combinations that can make the puzzle solver to keep executing bad paths for a long time before finding the right path (part-1 gives an introduction on the backtracking algorithm discussed here)

Let us first generate one such combination-of-inputs for a 9x9 puzzle table.
In the erlang shell (after setting up the application),

10> T = sudoku_gen:generate(switching, 92, 3, 10). 
{{0,0,2,0,3,0,5,0,0},
 {0,0,0,0,0,0,0,0,0},
 {0,0,0,0,0,0,0,0,0},
 {0,0,0,0,0,0,0,0,0},
 {0,0,0,0,0,0,0,0,5},
 {0,0,0,0,0,0,0,0,0},
 {4,0,0,7,0,0,0,0,0},
 {0,0,0,9,0,2,0,0,0},
 {0,0,0,0,0,1,0,0,2}}

This is the exact combination that threw the graph to its ceiling in our previous experiment in part-2. We will see how long the sequential algorithm takes to solve this puzzle,

11> St=now(), sudoku_slv:solve( 3, T ), timer:now_diff(now(), St).
*30111770

and it took 30111770 uS (30.1 seconds) to solve this tricky puzzle. In part-2 of this post we noticed that the average time taken for the first 90 puzzle was around 3377 mS (i.e) approximately 40 milli-second to solve a single puzzle.

pathological combination

We will first see, what these pathological combinations are and why it is difficult to get rid of it.

Any backtracking algorithm, while executing, will have to make several choices before finding the right solution. We will take our Sudoku puzzle solver as an example and expound further. For a 9x9 puzzle, with 10 elements pre-populated, should leave 71 slots unfilled. And for each slot there are 9 choices to make [1,2,3,4,5,6,7,8,9]. If we are using a brute force backtracking algorithm, it will have to pass 71 steps (to fill each slot) and for each step it will have to make nine choices, making the number of possible paths to,

56392087339601733413306017749077372989860250021295987473736382457209L, (do not try to interpret, it is way beyond trillions)

This is the worst case for a brute force backtracking algorithm before it succeeds or fails. But our implementation is not brute force. It uses intelligent functors to reduce the choices for each slot, for every choice it makes. For the sake of argument, let us say that the functors are able to reduce the choice to 2, all the time, our algorithm will have to pass 71 steps and for each step it will have to make 2 choices. This will reduce the number of possible paths to (in worst case),

2361183241434822606848L.

It will still take light years to exhaust all possible paths.

So, let us change our algorithm and re-design it to execute in parallel and see whether it helps. Please note that we are making the algorithm itself concurrent to solve a single tough puzzle.

re-designing the algorithm

We will try to parallelize the algorithm for every choice it makes. For instance, in our 92nd puzzle listed above, the first slot is unfilled and our functors can reduce the choices to [1,3,5,6,7,8,9]. Once the algorithm knows that those are un-reducible choices, it will spawn 7 processes, picking a unique choice for each one of them. Each of those process will continue from the second slot and repeat the same thing. Soon our algorithm is going to spawn millions and millions of processes only to find that it has taken a bad path. So we will add a cap on the maximum number of processes it can spawn.

All I did was to change couple of lines in existing code and add another 40 lines to implement the concurrency logic. With our algorithm re-designed, there are two configuration parameters that are worth mentioning,

concurrent : boolean()
`false` to use sequential algorithm and `true` to a concurrent backtracking algorithm.

procs : Integer()
While executing in concurrent mode, limits the number of spawn-able process.

Let us take our 92nd puzzle that took around 30 seconds with the sequential algorithm and see it's performance with following configuration.

{ concurrent, true } and { procs, 100 }

4>St = now(), T = sudoku_gen:generate(switching, 92, 3, 10),
sudoku_slv:solve(3, T), timer:now_diff(now(), St).
*102015

How much ? 102015 micro-seconds, that is just 0.1 second !!!!!!!! But this is on my i7 processor which has 4 cores with Hyper Technology. Still it is impressive, isn't it ?

Let us now see how our new concurrent algorithm performs for all hundred puzzles that were solved and plotted in part-2.

Next we will cap the maximum number of spawnable process to different values and measure the time taken to solve each of those puzzles. Let us cap the values at, [10, 100, 1000] and plot the results.

The graph shows that,

when the max-process is capped at 10, the performance is equivalent to sequential algorithm and unable to cope with pathological combination.

When max-process is capped at 1000, the performance wobbles from 2 times slower to 100 times slower. But it is able to manage pathological combination

When max-process is capped at 100, the average performance is about 4-5 times slower (but still within 500 ms) and at the same time manages the pathological combination.
Long ago, I wrote a similar Sudoku puzzle solver in C / Linux, but the the best part of writting it in Erlang is the way it prompted me to re-design the algorithm for concurrency. In C, I would have never thought about spawning a new process for every choice it makes, but here it was only natural.

That is how a language can affect our creativity, be it for programming or for poetry. And it is those simple and un-assuming things that changes one's perspective.

Reference :

Thesis by Yako on Sudoku algorithms.

Download the source code and experiment with it.

Wiki page explaining the source code.

Prev

Aug 25, 2010

Concurrency with Erlang - Making the algorithm concurrent (part-3)

No comments:

Post a Comment

About Me

Labels

Blog Archive

Search This Blog

Followers