controls

Aug 16, 2010

Concurrency with Erlang - Concurrent programs (part-2)

This is a three part write-up explaining,
Part 1 gives an introduction on concurrency, Erlang and the sudoku puzzle solving algorithm, why it was chosen to experiment with Erlang's features. The source code is available for download

Setting up the application

By programs, I refer to them as independent logic not interfering with each other. The objective here is to spawn them as parallel processes and measure its scalability and efficiency.
$ erl

Erlang R13B03 (erts-5.7.4) [source] [smp:8:8] [rq:8] [async-threads:0] [hipe]
[kernel-poll:false]

Eshell V5.7.4  (abort with ^G) 
1> application:load(games),     %% Load the application
2> application:start(games),    %% Start the application
3> sudoku_test:start_link([]).  %% Start the sudoku test server

In part-1, I explained how to setup the application, generate a sudoku puzzle and solve it. Let us now generate dozens of such puzzles and solve them first, by sequential logic and then by concurrent logic.

Sequential execution of programs

We will use the following test function in `sudoku_test` module to solve a 9x9 puzzle table pre-populated with 10 elements,
sudoku_test:run(seq, 3, 1, 10).
Using the same test function we can solve N number of 9x9 puzzle tables pre-populated with 10 elements.   
sudoku_test:run(seq, 3, N, 10).
run(seq, ...) API solves the puzzle one by one (i.e) in sequential order. So let us try to measure the time take to execute run(seq, ...) from 1 to 90,
4> [ begin f(St), St=now(), sudoku_test:run(seq, 3, N, 10),
timer:now_diff(now(),St)end  || N <- lists:seq(1,90) ].
Above list comprehension gives a list of time values, where the first value tells us the time taken to solve 1 puzzle, the second value tells us the time taken to solve 2 puzzles, .... all the way up to 90 puzzles. This data is used to plot the graph below. 

concurrent execution of programs

The objective is to solve the puzzles parallel (i.e) if there are N puzzles to be solved, spawn N number of process (one for each puzzle) and solve them parallel. The following is the code snippet that sets up this concurrency model,
psolve_(rcv, Count, Count, Results) ->
    Results;
psolve_(rcv, Count, N, Results) ->
    %% Collect the results
    receive {solved, Res} -> psolve_( rcv, Count, N+1, [Res|Results] ) end.
psolve_([], N) ->
    psolve_(rcv, N, 0, [] );
psolve_( [{S, Table} | Tables], N ) ->
    %% Spawn a process for a puzzle
    spawn( ?MODULE, spawnrun, [self(), S, Table] ),
    psolve_(Tables, N+1).

spawnrun(Frompid, S, Table) ->
    Frompid ! {solved, solve_(S, Table)}.   %% Send back the result 
We will use the following test function in `sudoku_test` module to solve a 9x9 puzzle table pre-populated with 10 elements, 
sudoku_test:run(parallel, 3, 1, 10).
Using the same test function we can solve N number of 9x9 puzzle tables pre-populated with 10 elements,
sudoku_test:run(parallel, 3, N, 10).
run(parallel, ...) API solve the puzzles concurrently, by spawning one process for every puzzle. So let us try to measure the time take to execute run(parallel, ...) from 1 to 90,
5> [ begin f(St), St=now(), sudoku_test:run(parallel, 3, N, 10),
timer:now_diff(now(),St) end || N <- lists:seq(1,90) ].
Above list comprehension gives a list of time values, where the first value tells us the time taken to solve 1 puzzle, the second value tells us the time taken to solve 2 puzzles, .... all the way up to 90 puzzles. The only difference is that, the puzzles are solved by concurrent processes. 

Note that, the run() API internally uses the same seed value to generate identical puzzles for both sequential and concurrent logic, so that we can do a fair benchmarking.

Unlike sequential logic, concurrent program execution can be bench marked with multiple cores to measure its efficiency. In our case I used erl's +S N:N switch to  limit the number of cores to 1, 2, 3, 4 and respectively collected their data. This data is used to plot the graph below.

We can observe that sequential programs and concurrent programs executed on a single core follow the same plot, except that in case of concurrent logic, a small penalty is introduced due to dozens of processes being spawned and scheduled. On the other hand concurrent programs are scaling linearly with multiple cores. Impressive !!

Now, let us extend this a little bit and see how the graph looks like for puzzles 1 to 100.
We see that the curve shoots up some where in between 90th puzzle to 100th puzzle. Reason, the 92nd puzzle combination is a pathological one. It makes the backtracking algorithm to execute a large number bad path, before it finds a good path to solve the puzzle. How bad can these pathological cases be ? It can be really bad ! It might never return ...

And that is what we will be exploring in our next part, re-designing the algorithm for concurrency and see how it performs for pathological inputs.

Reference :

No comments:

Post a Comment

Please provide your name while adding your comment.