- sudo pip-2.7 install tornado
- sudo fink install zmq-py27
Tips and tricks on using Python and associated tools for astronomical and scientific purposes. Written by an astronomer who uses Python on a daily basis to do science.
Monday, December 19, 2011
To get the new "notebook" functionality working in iPython 0.12 ...
... I needed to install these additional modules under Mac OS X Snow Leopard:
Friday, December 16, 2011
Script illustrating how to do a linear regression and plot the confidence band of the fit
The script below illustrates how to carry out a simple linear regression of data stored in an ASCII file, plot the linear fit and the 2 sigma confidence band.
The script invokes the confband method described here to plot the confidence bands. The confband is assumed to be lying inside the "nemmen" module (not yet publicly available, sorry) but you can place it in any module you want.
I got the test data here to perform the fitting.
After running the script you should get the following plot:
where the best-fit line is displayed in green and the shaded area is in gray.
The script below is also available at Github Gist.
The script invokes the confband method described here to plot the confidence bands. The confband is assumed to be lying inside the "nemmen" module (not yet publicly available, sorry) but you can place it in any module you want.
I got the test data here to perform the fitting.
After running the script you should get the following plot:
where the best-fit line is displayed in green and the shaded area is in gray.
The script below is also available at Github Gist.
import numpy, pylab, scipy
import nemmen
# Data taken from http://orion.math.iastate.edu/burkardt/data/regression/x01.txt.
# I removed the header from the file and left only the data in 'testdata.dat'.
xdata,ydata = numpy.loadtxt('testdata.dat',unpack=True,usecols=(1,2))
xdata=numpy.log10(xdata) # take the logs
ydata=numpy.log10(ydata)
# Linear fit
a, b, r, p, err = scipy.stats.linregress(xdata,ydata)
# Generates arrays with the fit
x=numpy.linspace(xdata.min(),xdata.max(),100)
y=a*x+b
# Calculates the 2 sigma confidence band contours for the fit
lcb,ucb,xcb=nemmen.confband(xdata,ydata,a,b,conf=0.95)
# Plots the fit and data
pylab.clf()
pylab.plot(xdata,ydata,'o')
pylab.plot(x,y)
# Plots the confidence band as shaded area
pylab.fill_between(xcb, lcb, ucb, alpha=0.3, facecolor='gray')
pylab.show()
pylab.draw()
Wednesday, December 14, 2011
Quick Python reference documents
These quick reference sheets will help you to quickly learn and practice data analysis, plotting and statistics with Python:
Python data analysis reference card (created by Fabricio Ferrari)
Python data analysis reference card (created by Fabricio Ferrari)
Generating random numbers from an arbitrary probability distribution using the rejection method
I am pretty used to generating random numbers from a normal distribution. There are built-in methods to carry out that task in Python and many other data analysis softwares.
But what if you need to produce random numbers based on a non-standard probability distribution? In my case that challenge appeared and I needed to produce random numbers from a power-law (?!) probability density function. Astronomy can be pretty interesting.
The method below solves that problem. It is not very efficient since I did not try to optimize it. But it gets the job done.
Let me know if you have a more clever way of solving this problem in Python or if you use the method and find bugs.
Update Oct. 23rd 2014: This code snippet is available (and updated) at github. Correction: pmin=0 (thanks PZ!)
But what if you need to produce random numbers based on a non-standard probability distribution? In my case that challenge appeared and I needed to produce random numbers from a power-law (?!) probability density function. Astronomy can be pretty interesting.
The method below solves that problem. It is not very efficient since I did not try to optimize it. But it gets the job done.
Let me know if you have a more clever way of solving this problem in Python or if you use the method and find bugs.
def randomvariate(pdf,n=1000,xmin=0,xmax=1):
"""
Rejection method for random number generation
===============================================
Uses the rejection method for generating random numbers derived from an arbitrary
probability distribution. For reference, see Bevington's book, page 84. Based on
rejection*.py.
Usage:
>>> randomvariate(P,N,xmin,xmax)
where
P : probability distribution function from which you want to generate random numbers
N : desired number of random values
xmin,xmax : range of random numbers desired
Returns:
the sequence (ran,ntrials) where
ran : array of shape N with the random variates that follow the input P
ntrials : number of trials the code needed to achieve N
Here is the algorithm:
- generate x' in the desired range
- generate y' between Pmin and Pmax (Pmax is the maximal value of your pdf)
- if y'<P(x') accept x', otherwise reject
- repeat until desired number is achieved
Rodrigo Nemmen
Nov. 2011
"""
# Calculates the minimal and maximum values of the PDF in the desired
# interval. The rejection method needs these values in order to work
# properly.
x=numpy.linspace(xmin,xmax,1000)
y=pdf(x)
pmin=y.min()
pmax=y.max()
# Counters
naccept=0
ntrial=0
# Keeps generating numbers until we achieve the desired n
ran=[] # output list of random numbers
while naccept<n:
x=numpy.random.uniform(xmin,xmax) # x'
y=numpy.random.uniform(pmin,pmax) # y'
if y<pdf(x):
ran.append(x)
naccept=naccept+1
ntrial=ntrial+1
ran=numpy.asarray(ran)
return ran,ntrial
Update Oct. 23rd 2014: This code snippet is available (and updated) at github. Correction: pmin=0 (thanks PZ!)
Calculating the prediction band of a linear regression model
The method below calculates the prediction band of an arbitrary linear regression model at a given confidence level in Python.
If you use it, let me know if you find any bugs.
- def predband(xd,yd,a,b,conf=0.95,x=None):
- """
- Calculates the prediction band of the linear regression model at the desired confidence
- level.
- Clarification of the difference between confidence and prediction bands:
- "The 2sigma confidence interval is 95% sure to contain the best-fit regression line.
- This is not the same as saying it will contain 95% of the data points. The prediction bands are
- further from the best-fit line than the confidence bands, a lot further if you have many data
- points. The 95% prediction interval is the area in which you expect 95% of all data points to fall."
- (from http://graphpad.com/curvefit/linear_regression.htm)
- Arguments:
- - conf: desired confidence level, by default 0.95 (2 sigma)
- - xd,yd: data arrays
- - a,b: linear fit parameters as in y=ax+b
- - x: (optional) array with x values to calculate the confidence band. If none is provided, will
- by default generate 100 points in the original x-range of the data.
- Usage:
- >>> lpb,upb,x=nemmen.predband(all.kp,all.lg,a,b,conf=0.95)
- calculates the prediction bands for the given input arrays
- >>> pylab.fill_between(x, lpb, upb, alpha=0.3, facecolor='gray')
- plots a shaded area containing the prediction band
- Returns:
- Sequence (lpb,upb,x) with the arrays holding the lower and upper confidence bands
- corresponding to the [input] x array.
- References:
- 1. http://www.JerryDallal.com/LHSP/slr.htm, Introduction to Simple Linear Regression, Gerard
- E. Dallal, Ph.D.
- Rodrigo Nemmen
- v1 Dec. 2011
- v2 Jun. 2012: corrected bug in dy.
- """
- alpha=1.-conf # significance
- n=xd.size # data sample size
- if x==None: x=numpy.linspace(xd.min(),xd.max(),100)
- # Predicted values (best-fit model)
- y=a*x+b
- # Auxiliary definitions
- sd=scatterfit(xd,yd,a,b) # Scatter of data about the model
- sxd=numpy.sum((xd-xd.mean())**2)
- sx=(x-xd.mean())**2 # array
- # Quantile of Student's t distribution for p=1-alpha/2
- q=scipy.stats.t.ppf(1.-alpha/2.,n-2)
- # Prediction band
- dy=q*sd*numpy.sqrt( 1.+1./n + sx/sxd )
- upb=y+dy # Upper prediction band
- lpb=y-dy # Lower prediction band
- return lpb,upb,x
Computing the scatter of data around linear fits
The Python method below computes the scatter of data around a given linear model.
Let me know if you find any bugs.
Let me know if you find any bugs.
def scatterfit(x,y,a=None,b=None):
"""
Compute the mean deviation of the data about the linear model given if A,B
(y=ax+b) provided as arguments. Otherwise, compute the mean deviation about
the best-fit line.
x,y assumed to be Numpy arrays. a,b scalars.
Returns the float sd with the mean deviation.
Author: Rodrigo Nemmen
"""
if a==None:
# Performs linear regression
a, b, r, p, err = scipy.stats.linregress(x,y)
# Std. deviation of an individual measurement (Bevington, eq. 6.15)
N=numpy.size(x)
sd=1./(N-2.)* numpy.sum((y-a*x-b)**2); sd=numpy.sqrt(sd)
return sd
Calculating and plotting confidence bands for linear regression models
This method calculates the confidence band of an arbitrary linear regression model at a given confidence level in Python. Using this method you can get plots like this:
where the shaded area corresponds to the 2sigma confidence band of the linear fit shown in green. A script illustrating how to use the confband method below is available.
If you use it, let me know if you find any bugs.
Note: the scatterfit method called below is available here.
where the shaded area corresponds to the 2sigma confidence band of the linear fit shown in green. A script illustrating how to use the confband method below is available.
If you use it, let me know if you find any bugs.
Note: the scatterfit method called below is available here.
Tuesday, August 30, 2011
Python code to create an ASCII file from arrays
Suppose you have the Numpy arrays X, Y and Z - which can contain strings, floats, integers etc - and you want to create an ASCII file in which each column corresponds to the elements of these arrays. The code snippet below allows you to do that.
In the example below, tmp.dat corresponds to the output file and X (strings), Y (floats) and Z (floats) are the arrays you want to export.
Be sure to modify the %s and %f if you change the array data types.
Thanks to Tyrel Johnson for the trick.
In the example below, tmp.dat corresponds to the output file and X (strings), Y (floats) and Z (floats) are the arrays you want to export.
file=open('tmp.dat','w') #create file
for x,y,z in zip(X,Y,Z):
file.write('%s\t%f\t%f\n'%(x,y,z)) #assume you separate columns by tabs
file.close() #close file
Be sure to modify the %s and %f if you change the array data types.
Thanks to Tyrel Johnson for the trick.
Sunday, August 21, 2011
References for learning Python
I list in this post the reading material that I used to learn Python. When I started learning it I had considerable previous experience with several programming languages. So the reading list below is targeted at people with some experience. In addition, my goal was to apply Python to science and data analysis.
If you don't have previous programming experience, one suggested starting point is the book Think Python: How to think like a computer scientist. It is available for free.
I learned the basic Python syntax with the official tutorial at http://docs.python.org/.
Fabricio Ferrari's lecture (in portuguese) illustrates what python is capable of for scientific data analysis and plotting. Inspiring.
If you don't learn object-oriented programming (OOP) you will not be able to explore the full potential of Python. I learned OOP from
Introduction to Programming using Python, Programming Course for Biologists at the Pasteur Institute, Schuerer et al. Very clear exposition.
Tutorial on using Python for astronomical data analysis. How to plot spectra, read and manipulate FITS files and much more. Highly recommended.
Tuesday, August 16, 2011
Reading ASCII tables with Python
A common task in Python is to read tables contained in ASCII files.
then the arrays with the data will be automatically stored as t['name'], t['x'], t['y'] etc.
If you want to read the same file but using your own name convention for the arrays (or in the case the file does not have a header specifying the column names) then use the command
>>> t=asciitable.read('data/tab.dat', names=['name','redshift','ra','dec'], Reader=asciitable.NoHeader)
The arrays will be available in this case as t['name'], t['redshift'], t['ra'] etc.
Table with only numbers
If the table consists only of numbers it can be read with the method numpy.loadtxt. Suppose the table has two columns and you want to import these columns as the arrays x and y. You can use the command
>>> x,y = numpy.loadtxt(file,unpack=True,usecols=(0,1))
>>> x,y = numpy.loadtxt(file,unpack=True,usecols=(0,1))
where file is a string representing the filename with the ASCII table. If you had three columns you could use
>>> x,y,z = numpy.loadtxt(file,unpack=True,usecols=(0,1,2))
and so forth.
Unfortunately, this method does not seem to work when the input file has columns with mixed data types (e.g., strings and floats). In this case, you need to use the asciitable module.
Table with numbers and strings
Numpy.loadtxt will not work in this case and you need to use the module ASCIItable to read the ASCII data file. If you don't have it pre-installed it is straightforward to do so:
>>> easy_install asciitable
To read an ASCII file with mixed columns containing strings and numeric data and using the information in header to name the columns:
>>> t = asciitable.read('data/tab.dat', Reader=asciitable.CommentedHeader)
If the file tab.dat has the structure
#name x y z
opa 0.1 30. 50.
...
>>> easy_install asciitable
To read an ASCII file with mixed columns containing strings and numeric data and using the information in header to name the columns:
>>> t = asciitable.read('data/tab.dat', Reader=asciitable.CommentedHeader)
If the file tab.dat has the structure
#name x y z
opa 0.1 30. 50.
...
then the arrays with the data will be automatically stored as t['name'], t['x'], t['y'] etc.
If you want to read the same file but using your own name convention for the arrays (or in the case the file does not have a header specifying the column names) then use the command
>>> t=asciitable.read('data/tab.dat', names=['name','redshift','ra','dec'], Reader=asciitable.NoHeader)
The arrays will be available in this case as t['name'], t['redshift'], t['ra'] etc.
Monday, August 15, 2011
How to install python, ipython, numpy, scipy etc on Mac OS X Lion: The MacPorts way
Note added on March 7th 2012: This tutorial is deprecated. Please refer instead to this updated tutorial.
I realized that by using MacPorts to install Python, as described in this tutorial, I am mixing libraries installed via MacPorts and the ones installed with easy_install which can lead to horrible incompatibility issues.
The updated tutorial describes how to get a working Scipy + Numpy + iPython + matplotlib installation using the built-in OS X Python and pip/easy_install.
I realized that by using MacPorts to install Python, as described in this tutorial, I am mixing libraries installed via MacPorts and the ones installed with easy_install which can lead to horrible incompatibility issues.
The updated tutorial describes how to get a working Scipy + Numpy + iPython + matplotlib installation using the built-in OS X Python and pip/easy_install.
New blog
I decided to create this blog to share my experience using Python, IPython and associated modules and packages in daily scientific work. The kind of things I do with Python are: data analysis, numerical analysis, programming (structured and object-oriented), scripting, plotting (with publication quality), handling of ASCII/FITS tables, interfacing with Fortran code etc.
I started using Python 1.5 year ago and I have had an extremely good experience with it! Before that, I used to use mainly IDL for my scientific needs. Since then, I became an advocate for using Python for scientific purposes and replacing IDL with Python (when possible of course).
Expect blog updates on a weekly basis. As I discover new tricks or write interesting methods which I think should be useful to other people, I will try to keep updating this blog (when my work allows me to do so).
A clarification: this blog is not associated in any way with astropython.org.
A clarification: this blog is not associated in any way with astropython.org.
Subscribe to:
Posts (Atom)