Pratap R Chakravarthy: August 2010

Aug 25, 2010

Concurrency with Erlang - Making the algorithm concurrent (part-3)

This is a three part write-up explaining,

The algorithm which is used as a dipstick. (part-1)
Independent algorithms executed concurrently, measuring its performance. (part-2)
Re-designing the algorithm to measure concurrency at algorithmic level (part-3).

In part 2 we saw that spawning separate processes to solve each puzzle scaled well with multiple cores. We also saw that there can be pathological input combinations that can make the puzzle solver to keep executing bad paths for a long time before finding the right path (part-1 gives an introduction on the backtracking algorithm discussed here)

Let us first generate one such combination-of-inputs for a 9x9 puzzle table.
In the erlang shell (after setting up the application),

10> T = sudoku_gen:generate(switching, 92, 3, 10). 
{{0,0,2,0,3,0,5,0,0},
 {0,0,0,0,0,0,0,0,0},
 {0,0,0,0,0,0,0,0,0},
 {0,0,0,0,0,0,0,0,0},
 {0,0,0,0,0,0,0,0,5},
 {0,0,0,0,0,0,0,0,0},
 {4,0,0,7,0,0,0,0,0},
 {0,0,0,9,0,2,0,0,0},
 {0,0,0,0,0,1,0,0,2}}

This is the exact combination that threw the graph to its ceiling in our previous experiment in part-2. We will see how long the sequential algorithm takes to solve this puzzle,

11> St=now(), sudoku_slv:solve( 3, T ), timer:now_diff(now(), St).
*30111770

and it took 30111770 uS (30.1 seconds) to solve this tricky puzzle. In part-2 of this post we noticed that the average time taken for the first 90 puzzle was around 3377 mS (i.e) approximately 40 milli-second to solve a single puzzle.

pathological combination

We will first see, what these pathological combinations are and why it is difficult to get rid of it.

Any backtracking algorithm, while executing, will have to make several choices before finding the right solution. We will take our Sudoku puzzle solver as an example and expound further. For a 9x9 puzzle, with 10 elements pre-populated, should leave 71 slots unfilled. And for each slot there are 9 choices to make [1,2,3,4,5,6,7,8,9]. If we are using a brute force backtracking algorithm, it will have to pass 71 steps (to fill each slot) and for each step it will have to make nine choices, making the number of possible paths to,

56392087339601733413306017749077372989860250021295987473736382457209L, (do not try to interpret, it is way beyond trillions)

This is the worst case for a brute force backtracking algorithm before it succeeds or fails. But our implementation is not brute force. It uses intelligent functors to reduce the choices for each slot, for every choice it makes. For the sake of argument, let us say that the functors are able to reduce the choice to 2, all the time, our algorithm will have to pass 71 steps and for each step it will have to make 2 choices. This will reduce the number of possible paths to (in worst case),

2361183241434822606848L.

It will still take light years to exhaust all possible paths.

So, let us change our algorithm and re-design it to execute in parallel and see whether it helps. Please note that we are making the algorithm itself concurrent to solve a single tough puzzle.

re-designing the algorithm

We will try to parallelize the algorithm for every choice it makes. For instance, in our 92nd puzzle listed above, the first slot is unfilled and our functors can reduce the choices to [1,3,5,6,7,8,9]. Once the algorithm knows that those are un-reducible choices, it will spawn 7 processes, picking a unique choice for each one of them. Each of those process will continue from the second slot and repeat the same thing. Soon our algorithm is going to spawn millions and millions of processes only to find that it has taken a bad path. So we will add a cap on the maximum number of processes it can spawn.

All I did was to change couple of lines in existing code and add another 40 lines to implement the concurrency logic. With our algorithm re-designed, there are two configuration parameters that are worth mentioning,

concurrent : boolean()
`false` to use sequential algorithm and `true` to a concurrent backtracking algorithm.

procs : Integer()
While executing in concurrent mode, limits the number of spawn-able process.

Let us take our 92nd puzzle that took around 30 seconds with the sequential algorithm and see it's performance with following configuration.

{ concurrent, true } and { procs, 100 }

4>St = now(), T = sudoku_gen:generate(switching, 92, 3, 10),
sudoku_slv:solve(3, T), timer:now_diff(now(), St).
*102015

How much ? 102015 micro-seconds, that is just 0.1 second !!!!!!!! But this is on my i7 processor which has 4 cores with Hyper Technology. Still it is impressive, isn't it ?

Let us now see how our new concurrent algorithm performs for all hundred puzzles that were solved and plotted in part-2.

Next we will cap the maximum number of spawnable process to different values and measure the time taken to solve each of those puzzles. Let us cap the values at, [10, 100, 1000] and plot the results.

The graph shows that,

when the max-process is capped at 10, the performance is equivalent to sequential algorithm and unable to cope with pathological combination.

When max-process is capped at 1000, the performance wobbles from 2 times slower to 100 times slower. But it is able to manage pathological combination

When max-process is capped at 100, the average performance is about 4-5 times slower (but still within 500 ms) and at the same time manages the pathological combination.
Long ago, I wrote a similar Sudoku puzzle solver in C / Linux, but the the best part of writting it in Erlang is the way it prompted me to re-design the algorithm for concurrency. In C, I would have never thought about spawning a new process for every choice it makes, but here it was only natural.

That is how a language can affect our creativity, be it for programming or for poetry. And it is those simple and un-assuming things that changes one's perspective.

Reference :

Thesis by Yako on Sudoku algorithms.

Download the source code and experiment with it.

Wiki page explaining the source code.

Prev

Aug 16, 2010

Concurrency with Erlang - Concurrent programs (part-2)

This is a three part write-up explaining,

The algorithm which is used as a dipstick. (part-1)
Independent algorithms executed concurrently, measuring its performance. (part-2)
Re-designing the algorithm to measure concurrency at algorithmic level (part-3)

Part 1 gives an introduction on concurrency, Erlang and the sudoku puzzle solving algorithm, why it was chosen to experiment with Erlang's features. The source code is available for download.

Setting up the application

By programs, I refer to them as independent logic not interfering with each other. The objective here is to spawn them as parallel processes and measure its scalability and efficiency.

$ erl

Erlang R13B03 (erts-5.7.4) [source] [smp:8:8] [rq:8] [async-threads:0] [hipe]
[kernel-poll:false]

Eshell V5.7.4  (abort with ^G) 
1> application:load(games),     %% Load the application
2> application:start(games),    %% Start the application
3> sudoku_test:start_link([]).  %% Start the sudoku test server

In part-1, I explained how to setup the application, generate a sudoku puzzle and solve it. Let us now generate dozens of such puzzles and solve them first, by sequential logic and then by concurrent logic.

Sequential execution of programs

We will use the following test function in `sudoku_test` module to solve a 9x9 puzzle table pre-populated with 10 elements,

sudoku_test:run(seq, 3, 1, 10).

Using the same test function we can solve N number of 9x9 puzzle tables pre-populated with 10 elements.

sudoku_test:run(seq, 3, N, 10).

run(seq, ...) API solves the puzzle one by one (i.e) in sequential order. So let us try to measure the time take to execute run(seq, ...) from 1 to 90,

4> [ begin f(St), St=now(), sudoku_test:run(seq, 3, N, 10),
timer:now_diff(now(),St)end  || N &amp;lt;- lists:seq(1,90) ].

Above list comprehension gives a list of time values, where the first value tells us the time taken to solve 1 puzzle, the second value tells us the time taken to solve 2 puzzles, .... all the way up to 90 puzzles. This data is used to plot the graph below.

concurrent execution of programs

The objective is to solve the puzzles parallel (i.e) if there are N puzzles to be solved, spawn N number of process (one for each puzzle) and solve them parallel. The following is the code snippet that sets up this concurrency model,

psolve_(rcv, Count, Count, Results) ->
    Results;
psolve_(rcv, Count, N, Results) ->
    %% Collect the results
    receive {solved, Res} -> psolve_( rcv, Count, N+1, [Res|Results] ) end.
psolve_([], N) ->
    psolve_(rcv, N, 0, [] );
psolve_( [{S, Table} | Tables], N ) ->
    %% Spawn a process for a puzzle
    spawn( ?MODULE, spawnrun, [self(), S, Table] ),
    psolve_(Tables, N+1).

spawnrun(Frompid, S, Table) ->
    Frompid ! {solved, solve_(S, Table)}.   %% Send back the result

We will use the following test function in `sudoku_test` module to solve a 9x9 puzzle table pre-populated with 10 elements,

sudoku_test:run(parallel, 3, 1, 10).

Using the same test function we can solve N number of 9x9 puzzle tables pre-populated with 10 elements,

sudoku_test:run(parallel, 3, N, 10).

run(parallel, ...) API solve the puzzles concurrently, by spawning one process for every puzzle. So let us try to measure the time take to execute run(parallel, ...) from 1 to 90,

5> [ begin f(St), St=now(), sudoku_test:run(parallel, 3, N, 10),
timer:now_diff(now(),St) end || N &amp;lt;- lists:seq(1,90) ].

Above list comprehension gives a list of time values, where the first value tells us the time taken to solve 1 puzzle, the second value tells us the time taken to solve 2 puzzles, .... all the way up to 90 puzzles. The only difference is that, the puzzles are solved by concurrent processes.

Note that, the run() API internally uses the same seed value to generate identical puzzles for both sequential and concurrent logic, so that we can do a fair benchmarking.

Unlike sequential logic, concurrent program execution can be bench marked with multiple cores to measure its efficiency. In our case I used erl's +S N:N switch to limit the number of cores to 1, 2, 3, 4 and respectively collected their data. This data is used to plot the graph below.

We can observe that sequential programs and concurrent programs executed on a single core follow the same plot, except that in case of concurrent logic, a small penalty is introduced due to dozens of processes being spawned and scheduled. On the other hand concurrent programs are scaling linearly with multiple cores. Impressive !!

Now, let us extend this a little bit and see how the graph looks like for puzzles 1 to 100.

We see that the curve shoots up some where in between 90th puzzle to 100th puzzle. Reason, the 92nd puzzle combination is a pathological one. It makes the backtracking algorithm to execute a large number bad path, before it finds a good path to solve the puzzle. How bad can these pathological cases be ? It can be really bad ! It might never return ...

And that is what we will be exploring in our next part, re-designing the algorithm for concurrency and see how it performs for pathological inputs.

Reference :

Thesis by Yako on Sudoku algorithms.

Download the source code and experiment with it.

Wiki page explaining the source code.

Prev

Next

Concurrency with Erlang - Choosing a dipstick (Part-I)

For a long time I thought C was "the" programming language for me. It is my swiss army knife to solve problems from embedded systems to server applications. That changed when I met a reclusive band of hackers in my previous company. There was this deluge of possibilities and impossibilities in the programming world and among them Erlang in particular was more interesting.

Why is erlang interesting ?

For past three decades the semiconductor industry promised to double the speed of microprocessor every two years, and they kept to their promise. Now, it looks like they have exhausted all the available tricks and the only way to increase computation speed is by going parallel, packing more number of cores in the same die. But that ain't so easy, for it requires re-designing our algorithms to run parallel. Enter concurrent programming.

Concurrent programming is a paradigm of writing computer programs that can execute in parallel. Three decades of structured programming concepts popularized by languages like C and Java have made the entire eco-system of computer applications optimized for sequential programming. Now, even when we are ready to redesign our algorithms for concurrency, its implementation, testing and deployment are making life difficult for us. May be Erlang can save us from some of those difficulties. May be, may be not.

This is a three part write-up explaining,

The algorithm which is used as a dipstick. (part-1)
Independent algorithms executed concurrently, measuring its performance. (part-2)
Re-designing the algorithm to measure concurrency at algorithmic level.(part-3)

The dipstick

Get yourself introduced to Erlang's way of programming and its benefits. But to get a feel for the language we will have to pick an algorithm that can adequately challenge it. After going through the language specifications, it was clear that the language is most suited for server applications and applications that are naturally suited for concurrency, like Twitterfall. So I decided to pick computationally intensive algorithm that is also complex enough. My choice was to implement a sudoku puzzle solver.

Sudoku puzzle solver

This puzzle solving algorithm has exponential complexity and one can introduce pathological combinations throwing the program into a for-ever-loop. By tweaking the input combination and few other parameters we can adjust the complexity of the problem and study the behavior of the language and its run-time. Download the source code and experiment with it.

The program takes an unsolved sudoku puzzle with N elements pre-populated (i.e) say for a 9x9 table of numbers, may be just 10 can be pre-populated. The number of elements that are pre-populated can also be supplied as a parameter. The algorithm itself uses a double recursive backtracking logic to find the right combination of numbers satisfying the game rules. So, for a 9x9 table with 10 pre-populated elements, the program must fill the remaining 81 slots and for each slot there are 9 choices ([1,2,3,4,5,6,7,8,9]).

The first recursion logic moves from one slot to the next, column-wise and then row-wise. At every step, it tries to reduce the available choices for an unfilled slot. There are 6 intelligent functors that applies primitive human logic to reduce the choices. It then detects whether it has taken the right path by checking with the game rules. If it detects that it has taken a bad path it backtracks.

The second recursion logic kicks in when the first recursive function detects that it has taken the right path (so far), and finds that the current slot has more than one choice left un-reducible. The second recursive function is called that will recurs over each of the available choices and will once again call back to first recursive logic. So on and so forth ...

Let us have some hands on with the code now,


$ erl  # Invoke the erlang-shell
1> application:load(games).     %% Load the application
2> application:start(games).    %% Start the application
3> sudoku_test:start_link([]).  %% Start the sudoku test server

Generate a puzzle of 9x9 table size (denoted by the 3rd parameter), with 10 elements pre-populated. Second parameter is just a seed value to initialize Erlang's pseudo random generator,


4> T = sudoku_gen:generate(switching, 11, 3, 10).
{{0,0,0,0,3,0,0,0,0},
 {0,0,0,5,0,0,1,7,0},
 {0,0,0,0,0,0,0,0,0},
 {0,0,0,0,0,2,0,9,0},
 {0,0,0,0,0,0,0,0,0},
 {0,9,0,0,0,0,0,0,0},
 {0,0,2,0,0,3,0,0,0},
 {0,0,0,0,0,0,0,5,0},
 {0,0,0,0,0,0,0,0,0}}

Solve the generated puzzle, first parameter says the table T is of size 9x9,


5> sudoku_slv:solve( 3, T ).
*{passed,{{1,2,4,6,3,7,5,8,9},
          {3,6,8,5,2,9,1,7,4},
          {5,7,9,1,4,8,2,3,6},
          {4,1,3,7,5,2,6,9,8},
          {2,8,5,3,9,6,4,1,7},
          {6,9,7,4,8,1,3,2,5},
          {7,5,2,8,6,3,9,4,1},
          {8,3,6,9,1,4,7,5,2},
          {9,4,1,2,7,5,8,6,3}}}

This type of algorithm is necessary to check how friendly is Erlang in designing complex algorithms with concurrency. And of course its performance. But before that we will see how easy and efficient it is to get started with concurrent programming in Erlang.

Reference :

Thesis by Yako on Sudoku algorithms.

Download the source code and experiment with it.

Wiki page explaining the source code.

Next

Aug 25, 2010

Concurrency with Erlang - Making the algorithm concurrent (part-3)

Aug 16, 2010

Concurrency with Erlang - Concurrent programs (part-2)

Concurrency with Erlang - Choosing a dipstick (Part-I)

About Me

Labels

Blog Archive

Search This Blog

Followers

Aug 25, 2010

Concurrency with Erlang - Making the algorithm concurrent (part-3)

Aug 16, 2010

Concurrency with Erlang - Concurrent programs (part-2)

Concurrency with Erlang - Choosing a dipstick (Part-I)

About Me

Labels

Blog Archive

Subscribe To

Search This Blog

Followers