controls

Aug 25, 2010

Concurrency with Erlang - Making the algorithm concurrent (part-3)

This is a three part write-up explaining,
In part 2 we saw that spawning separate processes to solve each puzzle scaled well with multiple cores. We also saw that there can be pathological input combinations that can make the puzzle solver to keep executing bad paths for a long time before finding the right path (part-1 gives an introduction on the backtracking algorithm discussed here)

Let us first generate one such combination-of-inputs for a 9x9 puzzle table.
In the erlang shell (after setting up the application),
10> T = sudoku_gen:generate(switching, 92, 3, 10). 
{{0,0,2,0,3,0,5,0,0},
 {0,0,0,0,0,0,0,0,0},
 {0,0,0,0,0,0,0,0,0},
 {0,0,0,0,0,0,0,0,0},
 {0,0,0,0,0,0,0,0,5},
 {0,0,0,0,0,0,0,0,0},
 {4,0,0,7,0,0,0,0,0},
 {0,0,0,9,0,2,0,0,0},
 {0,0,0,0,0,1,0,0,2}}
This is the exact combination that threw the graph to its ceiling in our previous experiment in part-2. We will see how long the sequential algorithm takes to solve this puzzle,
11> St=now(), sudoku_slv:solve( 3, T ), timer:now_diff(now(), St).
*30111770 
and it took 30111770 uS (30.1 seconds) to solve this tricky puzzle. In part-2 of this post we noticed that the average time taken for the first 90 puzzle was around 3377 mS (i.e) approximately 40 milli-second to solve a single puzzle.

pathological combination

We will first see, what these pathological combinations are and why it is difficult to get rid of it.

Any backtracking algorithm, while executing, will have to make several choices  before finding the right solution.  We will take our Sudoku puzzle solver as an  example and expound further. For a 9x9 puzzle, with 10 elements pre-populated,  should leave 71 slots unfilled.  And for each slot there are 9 choices to make  [1,2,3,4,5,6,7,8,9]. If we are using a brute force backtracking algorithm, it will  have to pass 71 steps (to fill each slot) and for each step it will have to make nine  choices, making the number of possible paths to,

56392087339601733413306017749077372989860250021295987473736382457209L, (do not try to interpret, it is way beyond trillions)
This is the worst case for a brute force backtracking algorithm before it succeeds or fails. But our implementation is not brute force. It uses intelligent functors to reduce the choices for each slot, for every choice it makes. For the sake of argument, let us say that the functors are able to reduce the choice to 2, all the time, our algorithm will have to pass 71 steps and for each step it will have to make 2 choices. This will reduce the number of possible paths to (in worst case),
2361183241434822606848L.
It will still take light years to exhaust all possible paths.

So, let us change our algorithm and re-design it to execute in parallel and see whether it helps. Please note that we are making the algorithm itself concurrent to solve a single tough puzzle.

re-designing the algorithm

We will try to parallelize the algorithm for every choice it makes. For instance, in our 92nd puzzle listed above, the first slot is unfilled and our functors can reduce the choices to [1,3,5,6,7,8,9]. Once the algorithm knows that those are  un-reducible choices, it will spawn 7 processes, picking a unique  choice for each one of them. Each of those process will continue from the second slot and repeat the same thing. Soon our algorithm is going to spawn millions and millions of processes only to find that it has taken a bad path. So we will add a cap on the maximum number of processes it can spawn.

All I did was to change couple of lines in existing code and add another 40 lines to implement the concurrency logic. With our algorithm re-designed, there are two configuration parameters that are worth mentioning,

concurrent : boolean()
    `false` to use sequential algorithm and `true` to a concurrent backtracking algorithm.

procs : Integer()
    While executing in concurrent mode, limits the number of spawn-able process.
Let us take our 92nd puzzle that took around 30 seconds with the sequential algorithm and see it's performance with following configuration. 
{ concurrent, true } and { procs, 100 }
4>St = now(), T = sudoku_gen:generate(switching, 92, 3, 10),
sudoku_slv:solve(3, T), timer:now_diff(now(), St).
*102015 
How much ? 102015 micro-seconds, that is just 0.1 second !!!!!!!! But this is on my i7 processor which has 4 cores with Hyper Technology. Still it is impressive, isn't it ?

Let us now see how our new concurrent algorithm performs for all hundred puzzles that were solved and plotted in part-2.


Next we will cap the maximum number of spawnable process to different values and measure the time taken to solve each of those puzzles. Let us cap the values at, [10, 100, 1000] and plot the results.


The graph shows that,
  • when the max-process is capped at 10, the performance is equivalent to sequential algorithm and unable to cope with pathological combination.
  • When max-process is capped at 1000, the performance wobbles from 2 times slower to 100 times slower. But it is able to manage pathological combination
  • When max-process is capped at 100, the average performance is about 4-5 times slower (but still within 500 ms) and at the same time manages the pathological combination.
Long ago, I wrote a similar Sudoku puzzle solver in C / Linux, but the the best part of writting it in Erlang is the way it prompted me to re-design the algorithm for concurrency. In C, I would have never thought about spawning a new process for every choice it makes, but here it was only natural.

That is how a language can affect our creativity, be it for programming or for poetry. And it is those simple and un-assuming things that changes one's perspective.

Reference :

Aug 16, 2010

Concurrency with Erlang - Concurrent programs (part-2)

This is a three part write-up explaining,
Part 1 gives an introduction on concurrency, Erlang and the sudoku puzzle solving algorithm, why it was chosen to experiment with Erlang's features. The source code is available for download

Setting up the application

By programs, I refer to them as independent logic not interfering with each other. The objective here is to spawn them as parallel processes and measure its scalability and efficiency.
$ erl

Erlang R13B03 (erts-5.7.4) [source] [smp:8:8] [rq:8] [async-threads:0] [hipe]
[kernel-poll:false]

Eshell V5.7.4  (abort with ^G) 
1> application:load(games),     %% Load the application
2> application:start(games),    %% Start the application
3> sudoku_test:start_link([]).  %% Start the sudoku test server

In part-1, I explained how to setup the application, generate a sudoku puzzle and solve it. Let us now generate dozens of such puzzles and solve them first, by sequential logic and then by concurrent logic.

Sequential execution of programs

We will use the following test function in `sudoku_test` module to solve a 9x9 puzzle table pre-populated with 10 elements,
sudoku_test:run(seq, 3, 1, 10).
Using the same test function we can solve N number of 9x9 puzzle tables pre-populated with 10 elements.   
sudoku_test:run(seq, 3, N, 10).
run(seq, ...) API solves the puzzle one by one (i.e) in sequential order. So let us try to measure the time take to execute run(seq, ...) from 1 to 90,
4> [ begin f(St), St=now(), sudoku_test:run(seq, 3, N, 10),
timer:now_diff(now(),St)end  || N <- lists:seq(1,90) ].
Above list comprehension gives a list of time values, where the first value tells us the time taken to solve 1 puzzle, the second value tells us the time taken to solve 2 puzzles, .... all the way up to 90 puzzles. This data is used to plot the graph below. 

concurrent execution of programs

The objective is to solve the puzzles parallel (i.e) if there are N puzzles to be solved, spawn N number of process (one for each puzzle) and solve them parallel. The following is the code snippet that sets up this concurrency model,
psolve_(rcv, Count, Count, Results) ->
    Results;
psolve_(rcv, Count, N, Results) ->
    %% Collect the results
    receive {solved, Res} -> psolve_( rcv, Count, N+1, [Res|Results] ) end.
psolve_([], N) ->
    psolve_(rcv, N, 0, [] );
psolve_( [{S, Table} | Tables], N ) ->
    %% Spawn a process for a puzzle
    spawn( ?MODULE, spawnrun, [self(), S, Table] ),
    psolve_(Tables, N+1).

spawnrun(Frompid, S, Table) ->
    Frompid ! {solved, solve_(S, Table)}.   %% Send back the result 
We will use the following test function in `sudoku_test` module to solve a 9x9 puzzle table pre-populated with 10 elements, 
sudoku_test:run(parallel, 3, 1, 10).
Using the same test function we can solve N number of 9x9 puzzle tables pre-populated with 10 elements,
sudoku_test:run(parallel, 3, N, 10).
run(parallel, ...) API solve the puzzles concurrently, by spawning one process for every puzzle. So let us try to measure the time take to execute run(parallel, ...) from 1 to 90,
5> [ begin f(St), St=now(), sudoku_test:run(parallel, 3, N, 10),
timer:now_diff(now(),St) end || N <- lists:seq(1,90) ].
Above list comprehension gives a list of time values, where the first value tells us the time taken to solve 1 puzzle, the second value tells us the time taken to solve 2 puzzles, .... all the way up to 90 puzzles. The only difference is that, the puzzles are solved by concurrent processes. 

Note that, the run() API internally uses the same seed value to generate identical puzzles for both sequential and concurrent logic, so that we can do a fair benchmarking.

Unlike sequential logic, concurrent program execution can be bench marked with multiple cores to measure its efficiency. In our case I used erl's +S N:N switch to  limit the number of cores to 1, 2, 3, 4 and respectively collected their data. This data is used to plot the graph below.

We can observe that sequential programs and concurrent programs executed on a single core follow the same plot, except that in case of concurrent logic, a small penalty is introduced due to dozens of processes being spawned and scheduled. On the other hand concurrent programs are scaling linearly with multiple cores. Impressive !!

Now, let us extend this a little bit and see how the graph looks like for puzzles 1 to 100.
We see that the curve shoots up some where in between 90th puzzle to 100th puzzle. Reason, the 92nd puzzle combination is a pathological one. It makes the backtracking algorithm to execute a large number bad path, before it finds a good path to solve the puzzle. How bad can these pathological cases be ? It can be really bad ! It might never return ...

And that is what we will be exploring in our next part, re-designing the algorithm for concurrency and see how it performs for pathological inputs.

Reference :

Concurrency with Erlang - Choosing a dipstick (Part-I)

For a long time I thought C was "the" programming language for me. It is my swiss army knife to solve problems from embedded systems to server applications. That changed when I met a reclusive band of hackers in my previous company. There was this deluge of possibilities and impossibilities in the programming world and among them Erlang in particular was more interesting.
 
Why is erlang interesting ?

For past three decades the semiconductor industry promised to double the speed of microprocessor every two years, and they kept to their promise. Now, it looks like they have exhausted all the available tricks and the only way to increase computation speed is by going parallel, packing more number of cores in the same die. But that ain't so easy, for it requires re-designing our algorithms to run parallel. Enter concurrent programming.

Concurrent programming is a paradigm of writing computer programs that can execute in parallel. Three decades of structured programming concepts popularized by languages like C and Java have made the entire eco-system of computer applications optimized for sequential programming. Now, even when we are ready to redesign our algorithms for concurrency, its implementation, testing and deployment are making life difficult for us. May be Erlang can save us from some of those difficulties. May be, may be not.

This is a three part write-up explaining,
The dipstick

Get yourself introduced to Erlang's way of programming and its benefits. But to get a feel for the language we will have to pick an algorithm that can adequately challenge it. After going through the language specifications, it was clear that the language is most suited for server applications and applications that are naturally suited for concurrency, like Twitterfall. So I decided to pick computationally intensive algorithm that is also complex enough. My choice was to implement a sudoku puzzle solver.

Sudoku puzzle solver

This puzzle solving algorithm has exponential complexity and one can introduce pathological combinations throwing the program into a for-ever-loop. By tweaking the input combination and few other parameters we can adjust the complexity of the problem and study the behavior of the language and its run-time. Download the source code and experiment with it.

The program takes an unsolved sudoku puzzle with N elements pre-populated (i.e) say for a 9x9 table of numbers, may be just 10 can be pre-populated. The number of elements that are pre-populated can also be supplied as a parameter. The algorithm itself uses a double recursive backtracking logic to find the right combination of numbers satisfying the game rules. So, for a 9x9 table with 10 pre-populated elements, the program must fill the remaining 81 slots and for each slot there are 9 choices ([1,2,3,4,5,6,7,8,9]).

The first recursion logic moves from one slot to the next, column-wise and then row-wise. At every step, it tries to reduce the available choices for an unfilled slot. There are 6 intelligent functors that applies primitive human logic to reduce the choices. It then detects whether it has taken the right path by checking with the game rules. If it detects that it has taken a bad path it backtracks.

The second recursion logic kicks in when the first recursive function detects that it has taken the right path (so far), and finds that the current slot has more than one choice left un-reducible. The second recursive function is called that will recurs over each of the available choices and will once again call back to first recursive logic. So on and so forth ...


Let us have some hands on with the code now,

$ erl  # Invoke the erlang-shell
1> application:load(games).     %% Load the application
2> application:start(games).    %% Start the application
3> sudoku_test:start_link([]).  %% Start the sudoku test server
Generate a puzzle of 9x9 table size (denoted by the 3rd parameter), with 10 elements pre-populated. Second parameter is just a seed value to initialize Erlang's pseudo random generator,

4> T = sudoku_gen:generate(switching, 11, 3, 10).
{{0,0,0,0,3,0,0,0,0},
 {0,0,0,5,0,0,1,7,0},
 {0,0,0,0,0,0,0,0,0},
 {0,0,0,0,0,2,0,9,0},
 {0,0,0,0,0,0,0,0,0},
 {0,9,0,0,0,0,0,0,0},
 {0,0,2,0,0,3,0,0,0},
 {0,0,0,0,0,0,0,5,0},
 {0,0,0,0,0,0,0,0,0}}
Solve the generated puzzle, first parameter says the table T is of size  9x9,

5> sudoku_slv:solve( 3, T ).
*{passed,{{1,2,4,6,3,7,5,8,9},
          {3,6,8,5,2,9,1,7,4},
          {5,7,9,1,4,8,2,3,6},
          {4,1,3,7,5,2,6,9,8},
          {2,8,5,3,9,6,4,1,7},
          {6,9,7,4,8,1,3,2,5},
          {7,5,2,8,6,3,9,4,1},
          {8,3,6,9,1,4,7,5,2},
          {9,4,1,2,7,5,8,6,3}}}
This type of algorithm is necessary to check how friendly is Erlang in designing complex algorithms with concurrency. And of course its performance. But before that we will see how easy and efficient it is to get started with concurrent programming in Erlang.

Reference :

Jul 29, 2010

On hitting out on your own

I wrote this post some time back, but got busy with our first public release of Zeta.

Recently I kept receiving references to a blog post about Top 10 idea choices, as picked by first time entrepreneurs. The post was honest and humble, but it stirred several people quite to the core, after all, their idea is about to be called as one of the most over-used idea. The comments to the blog post was about ten times the size of the post itself. Having chosen one of the most common idea, I am stirred. But before proceeding I would like to tell a story,

That was the time when we were growing out of DOS and BASIC systems, into the fabulous world of Graphics and Multi-media. And for most of our teeming minds at school, the choice of their summer project was a natural one, to do a paint-brush application.

When one of my buddy, a self educated programmer (actually he is more like scientific adventurer than a programmer) chose to do paint-brush application, I thought, another mind wasted on a me-too project. Soon, he called me to his study and showed me his creation. The paint-brush tool. It had more or less the same features that any regular pain-brush application had, but wait, there was something different in the look and feel ! My buddy had chosen to do the whole thing using SVGA graphics as opposed to doing it with standard VGA mode libraries shipped with the Turbo-C compiler. I should say that he literally created his own GUI library to do his application.

Now, he did not stop at that. We where talking about FAT16 file system (which was kind of related to a project I was doing) when I found a gleam in his eyes. Once again I landed in his study. This time he showed me how to manipulate large image files and sequence them together to create some sort of animation. Apparently he got over the 640K limitation in DOS using EMM386 driver and a custom made file allocator similar to FAT16's techniques (remember the gleam !). That was good, well, better than a pain-brush application.

Time went on, and got another invitation to take a glimpse into what he was doing, this time though, his demonstration was more interactive. That is, he called me closer, gave me a mike and told me to say 'aaah', and I religiously told 'aaah', and ta-da there comes 'aaah' on the screen ! Then showed me a video, a chasing cheetah. You guessed it, the video was running in his so called paint-brush program using some sort of temporal and spatial compression between image frames, rendered in his SVGA window, and using his memory allocator to get over the 640K limitation. Unfortunately, before he could solve "Goldbach conjecture", "Riemann hypothesis" and a host of other pressing problems using his paint-brush application, we all graduated. While most of us completed our last exams without a job on hand, he had one with a top software company, which previously requested him to do an intern at their campus, apparently seeing his "paint-brush" application winning prizes in symposiums conducted by top technology institutes.

So, here is my take on me-too projects. There can be countless possible application for technology products, but in terms of building technology, the number of segments are limited by economics of mind and matter. Innovation can happen within those segments and it can take decades to change the landscape. Again, please don't mistake these segments with markets.

Bottom line ? Here it is. If you think your idea is unique and central to a disruptive technology, get ready for a dis-appointment. Ideas have to evolve, and it evolves better across multiple minds. So, let your ideas out and follow them, hopefully you will get some company and reap the benefits.