Tuesday, August 16, 2011

Reading ASCII tables with Python

A common task in Python is to read tables contained in ASCII files.

Table with only numbers

If the table consists only of numbers it can be read with the method numpy.loadtxt. Suppose the table has two columns and you want to import these columns as the arrays x and y. You can use the command

>>> x,y = numpy.loadtxt(file,unpack=True,usecols=(0,1))

where file is a string representing the filename with the ASCII table. If you had three columns you could use 

>>> x,y,z = numpy.loadtxt(file,unpack=True,usecols=(0,1,2))

and so forth.

Unfortunately, this method does not seem to work when the input file has columns with mixed data types (e.g., strings and floats). In this case, you need to use the asciitable module.

Table with numbers and strings

Numpy.loadtxt will not work in this case and you need to use the module ASCIItable to read the ASCII data file. If you don't have it pre-installed it is straightforward to do so:

>>> easy_install asciitable

To read an ASCII file with mixed columns containing strings and numeric data and using the information in header to name the columns:

>>> t = asciitable.read('data/tab.dat', Reader=asciitable.CommentedHeader)

If the file tab.dat has the structure

#name x y z
opa 0.1 30. 50.
...

then the arrays with the data will be automatically stored as t['name'], t['x'], t['y'] etc.

If you want to read the same file but using your own name convention for the arrays (or in the case the file does not have a header specifying the column names) then use the command

>>> t=asciitable.read('data/tab.dat', names=['name','redshift','ra','dec'], Reader=asciitable.NoHeader)

The arrays will be available in this case as t['name'], t['redshift'], t['ra'] etc.

No comments:

Post a Comment