Using GAWK to Get Through CTPP Data
August 18th, 2011The 3-year CTPP website lacks a little in usability (just try getting a county-county matrix out of it).
One of the CTPP staff pointed me to the downloads, which are a double-edge sword. On one hand, you have a lot of data without an interface in the way. On the other hand, you have a lot of data.
I found it was easiest to use GAWK to get through the data, and it was pretty easy:
gawk '/.*COUNTY_CODE.*/' *.csv >Filename.txt
Where COUNTY_CODE is the code from Pn-Labels-xx.txt where n is the part number (1,2, or 3) and xx is the state abbreviation.
NOTE: Look up the county code EACH TIME. Â It changes among parts 1, 2, and 3.
This command will go through all .csv files and output any line with the county code to the new file.
UPDATE
I have multiple counties to deal with. Â There’s an easy way to start on getting a matrix:
gawk '/C4300US.*(21037|21015|21117).*32100.*/' *.csv >TotalFlowsNKY.csv
This results in a CSV table of only the total flows from three Northern Kentucky counties (21037, 21015, 21117; Campbell, Boone, and Kenton county, respectfully). Â For simplicity’s sake, I didn’t include all 11 that I used.
Finishing Up
Then, I did a little Excel magic to build a matrix for all 11 counties and externals. Â The formula is shown. Â I have an additional sheet which is basically a cross reference of the county FIPS codes to the name abbreviations I’m using. Â See the image below (click for a larger version).
After this, I built a matrix in Excel. Â The matrix uses array summation (when you build this formula, you press CTRL+Enter to set it up right, else the returned value will be 0).
Using these techniques, I was able to get a journey to work matrix fairly quickly and without a lot of manual labor.
NOTE
You need to have GNUWin32 installed to use gawk.
You must be logged in to post a comment.