Thursday, October 16, 2014

Parallel computing with IPython and Python

I uploaded to github a quick tutorial on how to parallelize easy computing tasks. I have chosen embarrassingly parallel examples which illustrate some of the powerful features of IPython.parallel and the multiprocessing module.

Examples included:

  1. Parallel function mapping to a list of arguments (multiprocessing module)
  2. Parallel execution of array function (scatter/gather) + parallel execution of scripts
  3. Easy parallel Monte Carlo (parallel magics)


Parallel computing with Python. 

Please stop using colormaps which don't translate well to grayscale

I recently printed a paper with very nice results in B&W but the color images simply did not make sense when printed in grayscale (here is the paper if you are curious). Why? Not the best choice of colormap. Jake Vanderplas reminded me of this issue with his very nice blog post.

Please don't choose colormaps for the images in your paper which do not translate well when printed in grayscale.

Check out Nature's advice on color coding. Also check out the advice here.

Feel free to suggest other useful references about this issue in the comments.

Wednesday, October 8, 2014

Python Installation instructions (including IPython / IPython Notebook)

This page describes how to install Python and the other packages (Numpy, Scipy, IPython, IPython Notebook, Matplotlib) required for the course for Mac OS X, Linux and Windows.

Linux

In Linux, the installation instructions are pretty straightforward. Assuming that you are running Debian or Ubuntu, you just need to execute the following command in the terminal:

sudo apt-get install python-numpy python-scipy python-matplotlib ipython-notebook

For Fedora users, you can use the yum tool.

Mac OS X, Linux, Windows

We recommend downloading and installing the Anaconda Python distribution. The installations instructions are available here

Just download the installer and execute it with bash.

Anaconda includes most of the packages we will use and it is pretty easy to install additional packages if required, using the conda or pip command-line tools.


If the above two methods do not work for OS X

The MacPorts way

You can try installing everything using MacPorts. First download and install macports and then issue the following command in a terminal:

sudo port install py27-zmq py27-tornado py27-nose

The avove dependencies are required in order to run IPython notebook. Then run:

sudo port install py27-numpy py27-matplotlib py27-scipy py27-ipython

The advantage of this method is that it easy to do. The downsides:

  • It can take a couple of hours to finish the installation depending on your machine and internet connection, since macports will download and compile everything as it goes. 
  • If you like having the bleeding edge versions, note that it can take a while for them to be released on macports 
  • Finally, macports can create conflicts between different python interpreters installed in your system

Using Apple’s Python interpreted and pip

If you feel adventurous, you can use Apple’s builtin python interpreter and install everything using pip. Please follow the instructions described in this blog.

If you run into trouble

Leave a comment here with the issue you found.

Wednesday, August 27, 2014

Distributed arrays for parallel applications

I came across recently a very promising module: DistArray. The idea behind DistArray is to
provide general multidimensional NumPy-like distributed arrays to Python. It intends to bring the strengths of NumPy to data-parallel high-performance computing. 
Neat!

Some examples for easily creating distributed arrays are given in this IPython notebook.

Unfortunately I could not test DistArray so far because I am getting weird errors in my system, probably related to installation issues with my MPI installation.

Monday, July 21, 2014

Frequentism and Bayesianism

Jake VanderPlas has been writing a series of posts discussing frequentism and bayesianism. They are well-written, clear and insightful and use IPython for the statistical analysis. Here, I compiled his posts on the topic for convenience.

Frequentism and Bayesianism: A Practical Introduction
where he synthesizes the philosophical and pragmatic aspects of the frequentist and Bayesian approaches as they relate to the analysis of scientific data.

Frequentism and Bayesianism II: When Results Differ
where he discusses the difference between frequentist and Bayesian in the treatment of nuisance parameters.

Frequentism and Bayesianism III: Confidence, Credibility, and why Frequentism and Science do not Mix
where he discusses the subtle difference between frequentist confidence intervals and Bayesian credible intervals.

Frequentism and Bayesianism IV: How to be a Bayesian in Python
where he describes how to do Bayesian statistics in python with emcee, PyMC and PyStan.


Thursday, May 15, 2014

Linear regression with errors in X and Y with Python: BCES

I finally had the chance to upload my BCES linear regression python code to Github!

If you need to do linear regression with measurement errors in X and Y, including intrinsic scatter, please check it out. Even better, if you have suggestions to improve or speed up the code, please contribute by all means!

Wednesday, January 29, 2014

LaTeX tricks

Here is a collection of useful latex tricks I have been using in my papers.

How to add "sticky note" annotations to your manuscript

In the preamble of the document, add
\usepackage{todonotes}


For sticky notes:
\todo{your sticky note comment}



For notes placed in the text:
\todo[inline]{your sticky note comment}


For a missing figure box, such that you want to remind yourself to add a plot there later:
\missingfigure{figure description}


Read this for more options and the documentation of the todonotes package.


How to highlight or cross-out text

In the preamble of the document, add
\usepackage{color, soul}
\setstcolor{red}


For highlighted text:
\hl{your highlighted text}


For crossing-out text:


Check this out for more options.



For commenting out larger chunks of text

In the preamble of the document, add
\usepackage{verbatim}


Then put the text you want to comment out inside the environment:
\begin{comment}
everything your want to not show up in the document
\end{comment}

Can be way more powerful and faster than appending the % character to the beginning of each line.


Thursday, January 16, 2014

Video tutorial: Bayesian data analysis with PyMC 3

I highly recommend the tutorial by Thomas Wiecki on using PyMC 3 to perform Bayesian data analysis. Some of the cool things he demonstrates in this ~50 min video:

  • summary of Bayesian analysis and Bayesian theorem
  • application for parameter estimation for a few introductory examples: coin-flipping experiment, simple linear regression
  • new features of PyMC 3 with respect to v2
  • how to construct a model in PyMC 3 and a few notes on samplers
  • more advanced example: application to time series of correlated stocks, then linear regression of correlated stocks with a time-dependent slope (!!)