Tuesday, June 12, 2012

Parallel computing in Python for the masses


Case scenario: you wrote a python routine that does some kind of time-consuming computation. Then you think, wow, my computer has N cores but my program is using only one of them at a time. What a waste of computing resources. Is there a reasonably easy way of modifying my code to make it exploit all the cores of my multicore machine?

The answer is yes and there are different ways of doing it. It depends on how complex your code is and which method you choose to parallelize your computation.

I will talk here about one relatively easy way of speeding up your code using the multiprocessing python package. I should mention that there are many other options out there but the multiprocessing package comes pre-installed with any python distribution by default.

I am assuming that you really need to make your code parallel. You will have to stop and spend time thinking about how to break your computation in smaller parts that will be sent to the different cores. And I should mention that debugging is harder for parallel code compared to serial code, obviously.

Parallelization is one way of optimizing your code. Other ideas for optimizing your code is using Cython or f2py. Both these approaches may imply >10x speedup and are worth exploring depending on your situation. But both will involve using the C or Fortran languages along with your python code.

The ideal case is when your problem "embarassingly parallel". What I mean by this is: your problem  can be made parallel in a reasonably easy way since the computations which correspond to the bottleneck of the code can be carried out independently and do not need to communicate between each other. Examples:

  • You have a "grid" of parameters that you need to pass to a time-consuming model (e.g., a 1000x1000 matrix with the values of two parameters). Your model needs to evaluate those parameters and provide some output.
  • Your code performs a Monte Carlo simulation with 100000 trials which are carried out in a loop. You can then easily "dismember" this loop and send it to be computed independently by the cores in your machine.

Instead of giving code examples myself, I will point out the material I used to learn parallelization. I learned the basics of parallel programming by reading the excellent tutorial "introduction to parallel programming" written by Blaise Barney.

The next step was learning how to use the multiprocessing package. I learned this with the examples posted in the AstroBetter blog. I began by reading the example implemented with the pprocess package. The caveat here is that 'pprocess' is a non-standard package. The multiprocessing package which comes with python should be used instead. Somebody posted the original example discussed in the blog ported to the multiprocessing package.

As the posts above explain, the basic idea behind using 'multiprocessing' is to use the parallel map method to evaluate your time-consuming function using the many cores in your machine. Once you figure out a way of expressing your calculation in terms of the 'map' method, the rest is easy.

In my experience doing parallel programming in python using 'multiprocessing' I learned a few things which I want to share:

  1. Do not forget to close the parallel engine with the close() method after your computation is done! If you do not do this, you will end up leaving a lot of orphan processes which can quickly consume the available memory in your machine.
  2. Avoid using lambda functions when passing arguments to the parallel 'map' at all costs! Trust me,  multiprocessing does not play well with lambda constructs.
  3. Finally, as I mentioned before, parallelizing a code increases development time and the complexity of debugging your code. Only resort to parallelization if you really need it, i.e. if you think you will get a big speedup in your code execution. For example, if you code takes 24 hours to execute and you think you can get a 6x speedup by resorting to 'multiprocessing', then the execution time can be reduced to 4 hours which is not bad.