Originally published at: http://www.sitepoint.com/using-python-parse-spreadsheet-data/
When you work on web applications for large organizations and enterprises, I am sure you have faced this unique problem. Organizations often store data in spreadsheets and you need to provide an interface for them to enter this data into your web application. The general idea is to upload the file, read its contents, and store it in either files or databases that your web application uses. You may also need to export data from your web applications. For instance, how would you export the marks of all the students in a class? Again, spreadsheets are the preferred medium.
In this post, we will discuss different ways to handle these files and parse them to get the required information using Python (2.7).
A Quick Spreadsheet Primer
Before parsing spreadsheets, you must understand how they are structured. A spreadsheet file is a collection of sheets and each sheet is a collection of data cells placed in a grid, similar to a table. In a sheet, a data cell is identified by two values — its row and column numbers.
For instance, in the given screenshot, the spreadsheet contains only one sheet, “Query1”. The cell “2A” corresponds to the 2nd row and 1st column. The value of cell 2A is “12510001”.
Although programs with a GUI assign letters to the names of columns, when we parse the data, we will start row and column numbers from 0. That means, cell 2A will correspond to (1, 0), 4B to (1,3), 3C to (2, 2), and so on.
A Quick Summary of CSV Files
CSV stands for “comma-separated values” (or sometimes character-separated if the delimiter used is some character other than a comma) and the name is pretty self explanatory. A typical CSV file looks like the following:
"ID", "Name", "Age" "1", "John", "11" "2", "Mary", "12" "3", "Sherlock", "13"
You may convert spreadsheets to CSV files to ease the parsing. CSV files can be parsed easily using the csv
module in Python.