A few months ago, while a solution for the MTA budget shortfall was being debated by the New York State Senate, The Open Planning Project helped parse MTA budget data into a machine searchable format. The MTA originally published the budget as a PDF. To extract the data I used a utility called pdftohtml to convert it into an XML document. I then used the python library lxml to convert the document into a set of csv files. The results of this labor can be seen on TOPP’s data site.
Soon after I published this data I was told by a number of people that the data would be more useful if presented in another format. At first I just started creating a bunch of command line python scripts that would suck in these csv files and spit them out in different formats. I quickly realized that I could accumulate these scripts and create a quick and dirty web application.
Over a few train rides I created an application called DataIO, and this week I finally got a chance to upload it to Google App Engine. Specifically I received three requests for data in different formats. I’ll give examples using the data set containing the MTA’s annual labor expenses.
- JSON for Flot:
The “base_column” query string parameter represents the column in the CSV file that will used for the legend of the graph. The “base_row” represents the row in the CSV file that contains the values for the x-axis of the graph.
It’s not obvious how that JSON will display, so DataIO allows you to preview the graph by adding a “preview” query string argument:
- Google Charts:
which returns the URL for the following image:
- Data multiplied by a factor:
The MTA publishes all of their financial data in millions of dollars. Often it is useful to see the data in other units, such as dollars:
or in millions of Euros:
The number to multiply by is sent in via the multiplication_factor argument and the multiplication_start_row tells DataIO not to multiply the first row by the factor.