Converting Cube Transit Line Files to Shapefile with Python

May 21st, 2024

I’m fairly certain this is not the first time I’ve written a script like this, but it’s the first time in a long time. This script should work out of the box for most Cube transit line files, even with differences in attributes used.

This script is a function that takes the transit line file, the line indicator (which is always going to be “LINE” when using PT format transit). There is a parameter for a key id that I think I coded out (it would have been an index in the dataframe), and a table of coordinates that should be indexed with the node number.

import pandas as pd
import numpy as np
import re
import geopandas as gpd
from shapely.geometry import Point, LineString
def read_card(input_file, group_id, key_id, nxy_table):
with open(input_file, 'r') as fr:
lines = fr.readlines()
lines = [line.rstrip('\n') for line in lines]
lines = [line for line in lines if line[0] != ';']
lines = ''.join(lines)
lines = lines.split(group_id)
lines = [dict(re.findall(r'(\S+)\s*\=\s*(.*?)\s*(?=\S+\s*\=|$)', line)) for line in lines]
out_lines = []
for line in lines:
if 'NAME' in line.keys():
x = {}
x['route_id'] = line['NAME']
for k, v in line.items():
if not k in ['NAME', 'N']:
x[k] = v.replace('"',"").replace(',','')
coords = nxy_table.loc[[abs(int(n)) for n in line['N'].replace('\n','').replace(' ', '').split(',')]]
geom = LineString([tuple(x) for x in coords.to_numpy()])
x['geometry'] = geom
out_lines.append(x)
return gpd.GeoDataFrame(out_lines)
transit_lines = read_card(r'path\to\transit.lin', 'LINE', 'NAME', nodes[['N', 'X', 'Y']].set_index('N'))
transit_lines.to_file('trn_routes.shp')

Dumping TransCAD Matrices to OMX

October 20th, 2022

I ended up in a position where I needed to dump a bunch of TransCAD matrix files to OMX files. This is a useful little script that scans all the matrices in a folder (presuming they use the .mtx file extension) and write them out as matrices.

Macro "convertMtx"(Args)
    base_folder = "C:\\your\\path\\here"
    di = GetDirectoryInfo(base_folder + "\\" + folders[fi] + "\\*.mtx", "File")
    for i = 1 to di.length do
        m = OpenMatrix(base_folder + "\\" + folders[fi] + "\\" + di[i][1], )
        matrix_info = GetMatrixInfo(m)
        parts = SplitPath(base_folder + "\\" + folders[fi] + "\\" + di[i][1])
        omx_filename = parts[1] + parts[2] + parts[3] + ".omx"
        mc = CreateMatrixCurrency(m,matrix_info[6].Tables[1],,, )
        new_mat = CopyMatrix(mc, {{"File Name", omx_filename},{"OMX", "True"}})

    end
endMacro

I’m not sure if there is a better way to do this, but this works well enough once compiled and run in TransCAD 9.

Anaconda Hacks (for ActivitySim… sort of)

August 16th, 2021

Anaconda is making me hate Python a little less… only a little. If I’m going to work with ActivitySim (which I am), it’s necessary.

1 The first hack is setting MKL_NUM_THREADS automatically. According to this page, it should be set to one thread to avoid things going nuts (that’s a technical term). Doing this is pretty easy, since Anaconda is setup to run all the scripts in a folder upon activation (there’s a deactivation folder too). To get to this, open the Anaconda prompt and activate the appropriate workspace. Then type echo %conda_prefix%. Copy (highlight and right-click) the resulting path and paste into a Windows Explorer Window. Navigate to etc – conda – activate.d. Make a file (e.g. asr-activate.bat) – this is a Windows batch file that will run when the workspace is activated.

In this batch file, add MKL_NUM_THREADS=1 (note: NO SPACES AROUND THE EQUALS SIGN!). This will set that when the workspace is activated.

2 Another hack in that same file is sending the command line to your working directory (I have a workspace for each client, since we sometimes have to use different versions of ActivitySim or other packages), so I added (each on their own line) D: and cd D:\Projects\Clients\FavoriteClient\Tasks\asim_setup. Now, when I open the Anaconda Shell and activate that workspace it automatically sends me where I want to be.

In the case that I needed to unset that MKL_NUM_THREADS, navigating ‘up a level’ and to deactivate.d (%conda_prefix%\etc\conda\deactivate.d), you can make a deactivate script (e.g. asr-deactivate.bat) and use MKL_NUM_THREADS= and it will clear the variable when the workspace is deactivated.

Note: There are probably several files already in the activate.d and deactivate.d folders, and you might notice that there’s a bunch of .sh files. These are for Linux (and probably MacOS). They work the same way, but I didn’t mess with them because I’m running on a Windows 10 computer.

Geoprocessing In R – Identify Points on a Polygon Layer From Survey Data

August 24th, 2018

What happens when you have a bunch of survey data with GPS points, and you want to do Geoprocessing?

This process sucks as bad as my picture of my pencil sketch!

THERE IS AN EASIER WAY!

To start – I’m not sure if this is truly necessary but it seems like a good idea – convert your identify layer (TAZs, for example) to the same coordinate system as your points (likely WGS 1984, which matches GPS coordinates).

Then, load the sf package in R:

install.packages("sf") #only needs to be done once
library(sf)

To read the identity layer, use:

taz = st_read("gis/tazlayer.shp")

Once that’s loaded, doing the identity process is simple:

joined_df = st_join(st_as_sf(surveydf, coords = c("LongitudeFieldName", "LatitudeFieldName"), crs = 4326, agr = "field"), taz["TAZ"])

What this does:

  • st_as_sf is the function to turn another object into an sf object
  • The surveydf is the survey dataframe (you’ll want to change this to match whatever you have)
  • coords = c(“LongitudeFieldName”, “LatitudeFieldName”) tells st_as_sf the names of the coordinate fields
  • crs = 4326 tells st_as_sf that the coordinates are in WGS1984 – if your coordinates are in another coordinate system (state plane, for example), you’ll need to change this
  • agr = “field” tells st_as_sf the attribute-to-geometry relationship. In this case, the attributes are constant throughout the geometry (because it’s a point)
  • The taz[“TAZ”] is the second part of the join – we’re joining the points to the TAZ layer and only taking the TAZ field (this could be expanded with something like taz[c(“TAZ”, “AREATYPE”)])

One caveat – the return of this (what joined_df will be after the above function is run) is a collection of geometry objects, so when joining to table data, it is best to take a data frame of the object… a la:

library(dplyr)

df_out = df_in %>%

    left_join(as.data.frame(joined_df), by = "idField")

 

This is much faster than loading ArcMap or QGIS to do this work, and it keeps everything in one script, which makes life easier.

Building Better Desire Lines in QGIS (using AequilibraE)

July 20th, 2018

Ever build desire lines that just SUCK?

This is just nasty

There’s a solution – AequilibraE’s Delaunay Triangles. Pedro’s method can turn the above into this…

Same data as above – but much more intelligible

The One-Time Startup Process

  1. Install QGIS 2.18.12 from www.qgis.org
  2. Make sure it runs correctly (see the notes below)
  3. Activate the AequilibraE plugin (Plugins - Manage and Install Plugins – Check “AequilibraE”)

The Process

  1. Import the shapefile geography – the Add Vector Layer on the left side
  2. Import the data table – same button as above (see notes below)
  3. Convert the data table to a matrix (probably optional, but a good step to do) – AequilibraE – Data – Import Matrices
    1. Select ‘Open Layer’, make sure Matrix is the data table, and that the From, To, and Flow match the fields in the data table
    2. Click on ‘Load’, it’ll load the data
    3. Click on ‘Save’ and save the data to an aem file
    4. BE PATIENT! Mine will hang on ‘Reblocking matrices’ for quite a while, and will not write to the file for several minutes, but the CPU would still be getting drilled by QGIS. The window will close itself when complete.
  4. Open AequilibraE – GIS Tools – Desire Lines
    1. Zone or Node Layer should be your zone geography
    2. ID Field should be whatever field your TAZ numbers are in
    3. Click on ‘Load Matrices’ and select your aem file
    4. Make sure ‘Delaunay Triangles’ is selected. Unless you want a mess.
    5. Click on ‘Build Desire Lines’
    6. Be patient – it can take a few
    7. Visualize the resulting desire lines (e.g. put a width on them or a color)

 

Notes

  1. I had a lot of problems with DLLs not loading and various things not being available. To remedy this, I had to fix my PATH, PYTHONHOME, and PYTHONPATH environment variables. In my case, I put @cpython@ at the end of my PATH statement (’…C:\RBuildTools\3.5\bin;%cpython%’) and I rename the cpython variable as necessary (I have a cpython, which is the current-in-use, and cpython.arcgis). I use a similar tactic with PYTHONPATH and PYTHONHOME.
  2. I had a few issues with data I was exporting from R. Make sure you have no N/A values in your data. It’s not a bad idea to check it in another program before using it in AequilibraE.

RMarkdown Reports with Good Looking Tables

February 7th, 2018

This is how a table should look


Like it says, this is *not* how a table should look!

It seems to me the only purpose of using RMarkdown is to make nice looking reports from data (and don’t get me wrong, that’s EXTREMELY IMPORTANT – what good is data if it can’t be communicated). Graphics look nice, but sometimes you need to see a table.

Enter two things: kableExtra and LaTeX.

kableExtra is simple to use:

library(kableExtra)
knitr::kable(theTableToShow, "html") %>%
kable_styling(bootstrap_options = c("striped", "hover"))

Yeah, it’s that simple! And there are options, too.

However, using Greek letters is a tad (only a tad) more difficult:

row.names(okisum) = c("Region", "$\\sum Volume$", "$\\sum AADT$", "$\\Delta$", "Pct", "$r^2$")

The “$\sum” refers to the symbols listed on this page (and there’s more, just do a web search for LaTeX Math Symbols and there’s hundreds of guides on what is available).  The one change is the extra backslash needed in R to escape the text.

The last part is the text formatting. I did this with the r format command:

format(round(as.numeric(table$AADT), 0), nsmall=0, big.mark=",")
paste(format(round(as.numeric(100 * table$Pct), 2), nsmall=0, big.mark=","), "%", sep="")

Note that this needs to be the last step, since the output of format is text. The first line rounds to no decimal points and adds a comma. The second line rounds 100*percentage to 2 decimal points and pastes a percent sign after it with no separator.

While this may seem menial, I’ve heard of many stories where management/leaders/etc didn’t want to believe something scrawled on a piece of paper, but when the same information was printed nicely on greenbar dot matrix paper, it was accepted as excellent work.

Debugging Programs on Remote Systems with Windbg

November 14th, 2016

Recently, we ended up with problems with a program that writes a report from the model, but the problem only occurs on the model server, .  The error messages were nice and descriptive

This is the primary error shown on the screen. I can assure you that nothing in the lower text area (that can be scrolled) indicates what the error actually was.

This is the primary error shown on the screen. I can assure you that nothing in the lower text area (that can be scrolled) indicates what the error actually was.

This is only slightly helpful.

This is only slightly helpful.

Enter the .Windows Debugger, windbg.

On the server (where this program was running), we had the executable file, the PDB file (which is built with the executable), and the source code (which I don’t think was used).

We opened the debugger and used it to open the executable file.

The commands generally went like this:

.sympath+ C:\Modelrun\Model81\
.srcpath C:\Temp\ModelReport76\ModelReport76
(debug menu - resolve unqualified symbols)
.reload
x ModelReport76!*
g
!analyze -v

The first two lines set the symbol path to the PDB file and to the source.  The x ModelReport76!* loaded some program stuff into memory.  ‘g’ tells the debugger to run the program.  !analyze -v dumps an analysis to the screen.

The analysis didn’t really tell us anything, but what did was what ended up on the actual program window:

Note the source code filename and the line...

Note the source code filename and the line…

On that particular line of the source code, the program is attempting to create an Excel sheet.  The model server does not have Excel/Microsoft Office, so that’s likely the problem.

Planes, Trains, and Automobiles

May 6th, 2014

It’s not Thanksgiving, it’s ITM.

Figures.  The fat guy has the mustache.

I figured that since nothing has really been making it to the blog lately (that will change starting now!), I should mention my way getting from Cincinnati (CVG) to Baltimore (BWI) and back.

First off, the flights from Cincinnati have been horrible ever since Delta decided to functionally de-hub the airport.  It seems I’m always making compromises for my DC trips, and the flights from CVG to BWI are late in the day arriving after 5:00 PM.  The workshops on Sunday were at 1:00 PM, and there were things I didn’t want to miss.

Being the problem solver I am, I looked into other options.  I figured out something, and I’m writing this the day before I leave and thinking “nothing can possibly go wrong here, right?”.  The day-of-travel additions will be in italics.  Or pictures.

Automobiles: Leg 1 - Home to Airport

This is the most control I’ll have today.  I drive my little red S-10 to the parking lot across the freeway from the airport and take their shuttle from the lot to the arrivals area.  I’ve used this service (Cincinnati Fast Park & Relax) dozens of times without issues.  They rock.  God, I hope they’re awake at 4:30 AM!!!

They were awake and I got a pretty good parking spot near the front!

Planes: Leg 2 – CVG to Washington Reagan International

That is correct.  Flights from CVG to DCA are extraordinarily early (and dear reader, as you read the rest you’ll see why this is critically important).  This is a US Airways flight.  I learned recently (for the TRB Annual Meeting in January) to print my boarding pass early and fit everything into a carry-on.  Since they are now owned by Unamerican Airlines their baggage counter will have a line stretching out to downtown Cincinnati (about the distance of a half marathon!) while every other airline has few people in line and FAR BETTER SERVICE.

Thank God I printed my boarding pass in advance.  This was the line at the US Airways check-in counter.  I realized after my last trip to the TRB Annual Meeting in Washington that pre-paying for checked baggage is of no help.  In fact, I think some of these people were in line since last January.  US Airways sucks, especially now that they are part of American Airlines, which is the WORST airline I have ever flown.  

2014-04-27 04.52.59

The front of the line has been there since January 12, 2014. I’m sure of it.

Just to show that this isn’t the norm, here is a view of the lines at the next few airlines… and yes, American is like this too!  Hooray for efficiency!… wait…. 

2014-04-27 04.53.04

United may break guitars, but they won’t let you miss your flight to check a bag or just print your boarding pass!

Trains 1: Leg 3: DCA to Union Station

I’ve almost become a DC resident, this is my third time to DC this year.  I am a card-carrying WMATA rider, as my employer makes me do paperwork to rent a vehicle and for anything to DC they’ll probably (rightly) tell me to take the train.  I’ve grown accustom to it, and WMATA will probably be happy that I’ll be bringing my card up from -$0.45 to something positive.

I’ve done the Yellow/Blue -> Red line drill many times.  This will be cake.  Famous Last Words.

Thank God they didn’t penalize me for being -$0.45… yes, that’s a negative sign!  I immediately put $5 onto the card which took care of me for the rest of the trip.

Trains 2: Union Station to Penn Station

This is interesting to me because I’ve only been on subways in Chicago, Atlanta, and DC.  The only surface trains I’ve been on were Chicago (the El) and amusement trains (one north of Cincinnati and virtually every amusement park I’ve been to).  I’ve never been on a train like the MARC train.  I don’t know what to expect.

You might be a transportation planner if you know which subways you’ve been on and you look forward to a new experience on a commuter train.  You might also be a transportation planner if you’ve navigated more than one subway system while inebriated.

It was interesting to see complaints about these trains on Twitter the day after I returned from ITM.  There’s legroom and the train was surprisingly smooth and quiet.  Definitely not what I expected!

2014-04-27 08.59.40

This was interesting to me, as this is my first time in a train station that tells you to use different doors and tracks. I definitely appreciated that they put messages on the bottom of the screen that were basically closed-captions of what was going over the difficult-to-hear loudspeaker!

2014-04-27 10.28.39

I was on the upper level of a train (cool!) looking down on a smaller train (cool!)

 

2014-04-27 10.30.09

This is as much leg and butt room as first class on an airline. Way more than cattle class!

2014-04-27 10.30.27

Legroom!!!

Trains 3: Penn Station to Camden Yards

Why stop the streak with a bus now?  I’m jumping on the light rail.  Truthfully, there are reasons why.  I won’t pause riding a bus in Cincinnati to a new area because nothing in Cincinnati is really that new, but not knowing an area, I’d prefer set stops that are announced as opposed to guessing when to pull a stop-request cable to get a bus driver to stop.

I wasn’t even that weirded out by the guy that kept talking to me despite little acknowledgement from me.  He was a vet and didn’t ask for money, so he wasn’t terrible… but he should probably consider holding off the details of getting busted by the transit police for an outstanding misdemeanor warrant!!!

The Return

Everything you read here went in reverse except the light rail.  On Tuesday (ITM ended on Wednesday), I got outside to run at about 6:15 AM, and sometime between the end of the run and me coming out of the shower and going downstairs for breakfast it started raining and never did stop.  So when a gentleman checking out of the hotel next to me asked for a cab to Penn Station, I immediately asked if he wanted to split the fare and he accepted.  

Pictures

Blog Preview

Coming up on the blog, not sure in what order:

  • Race Reports for the Little King’s 1 Mile and the Flying Pig Half Marathon
  • GPS Survey Processing and Additional Investigations
  • Innovations Conference Recap
  • ISLR Fridays starting sometime soon

Iterating Through DBFs – R Style!

March 6th, 2014

Anyone familiar with transportation modeling is familiar with processes that iterate through data.  Gravity models iterate, feedback loops iterate, assignment processes iterate (well, normally), model estimation processes iterate, gravity model calibration steps, shadow cost loops iterate… the list goes on.

Sometimes it’s good to see what is going on during those iterations, especially with calibration.  For example, in calibrating friction factors in a gravity model, I’ve frequently run around 10 iterations.  However, as an experiment I set the iterations on a step to 100 and looked at the result:

This is the mean absolute error in percentage of observed trips to modeled trips in distribution.

This is the mean absolute error in percentage of observed trips to modeled trips in distribution.  Note the oscillation that starts around iteration 25 – this was not useful nor detected.  Note also that the best point was very early in the iteration process – at iteration 8.

After telling these files to save after each iteration (an easy process), I faced the issue of trying to quickly read 99 files and get some summary statistics.  Writing that in R was not only the path of least resistance, it was so fast to run that it was probably the fastest solution.  The code I used is below, with comments:

Logsum Issues

February 11th, 2014

I’ve been working through distribution in the model, and I was having a little bit of trouble.  As I looked into things, I found one place where QC is necessary to verify that things are working right.

The Logsums.

I didn’t like the shape of the curve from the friction factors I was getting, so I started looking into a variety of inputs to the mode choice model.  Like time and distance by car:

This is a comparison of distance.  The red line is the new model, the blue and green are two different years of the old model.

This is a comparison of distance. The red line is the new model, the blue and green are two different years of the old model.

This is a comparison of zone-to-zone times.  The red line is the new model, the blue and green are different years of the old model.

This is a comparison of zone-to-zone times. The red line is the new model, the blue and green are different years of the old model.

In both cases, these are as expected.  Since there are more (smaller) zones in the new model, there are more shorter times and distances.

The problem that crept up was the logsums coming from mode choice model for use in distribution:

These are the logsums from the old model.  Notice that the curve allows for some variation.

These are the logsums from the old model. Notice that the curve allows for some variation.

These are the logsums in the new model.  This is a problem because of that 'spike'.

These are the logsums in the new model. This is a problem because of that ‘spike’.

I put all the logsums on this, notice how the curve for the old model is dwarfed by the spike in the new model.  This is bad.

I put all the logsums on this, notice how the curve for the old model is dwarfed by the spike in the new model. This is bad.

So the question remains, what went wrong?

I believe the ultimate problem was that there was no limit on Bike and Pedestrian trips in the model, so it was generating some extreme values and somewhere and an infinity was happening in those modes causing the curve shown above.  I tested this by limiting the pedestrian trips to 5 miles (a fairly extreme value) and bike trips to 15 miles and re-running.  The logsums looked very different (again, the new model is the red line):

This is a comparison between the two model versions with fixed bicycle and pedestrian utility equations.

This is a comparison between the two model versions with fixed bicycle and pedestrian utility equations.

Note that the X axis range went from 650 (in the above plots) to 1000.  I’m not too concerned that the logsums in the new model have a larger range.  In fact, as long as those ranges are in the right place distribution may be better.  This is not final data, as I am still looking at a few other things to debug.

TranspoCamp and TRB Recap

January 21st, 2014

So last week these two little get-togethers happened – Transportation Camp and the Transportation Research Board Annual Meeting.  This post is the stuff I have to talk about related to both.

Transportation Camp

  • Lots of discussion about transit.  Seems nearly all sessions had the word ‘transit’ used once.
  • There was a lot of technical discussion that were incremental improvements over current methods:
    • Object Tracking with Raspberry Pi (my big takeaway from this is to go get the latest RPi image for the Java support)
    • Transit On-Board Surveys
      • Using Nexus tablets isn’t all that different from the PDAs NuStats used on our Transit On-Board Survey in 2010
      • Their code was noted as open source… definitely worth a look
      • Their interface is an improvement over the PDAs because of the ability to show maps
        • There is a possibility that this could be used to reduce geocoding overhead – the tablet could do it on the fly and show it to the respondent for confirmation… there is a privacy issue here
      • Their tools for tracking surveys were awesome
      • This was done in the Philippines
    • Tracking Taxis
      • This was also done in the Philippines
      • They built some cool tracking tools
      • They used the data as floating car travel time surveys
    • Bicycle Integration
      • Bicycle planners love multi-day surveys – additional days means that they have more trips to analyze
        • One planner was using the NHTS for data – one day, not a lot of trips
      • CycleTracks!
      • RackSpotter – crowd-sourced bicycle rack data

TRB Annual Meeting

  • Applications
  • Data
    • Social Media took center stage for part of the sessions.  There were two I scheduled for, although one I didn’t make it to.  There is a lot of research looking in to how we can use social media in modeling, but it is not yet ripe for use.
    • There are important balancing acts among the context of data vs. the presentation of data  and the cost to collect the data vs. the cost to analyze data
    • More data makes decision making more difficult
    • As a profession, we need to concentrate on what decision is going to be made from data
      • We have a tendency to overwhelm decision makers
      • We frequently tell a decision maker how a watch is made when all they want to know is the time
    • Open data is important, but also open analysis is important
    • We always need to differentiate modeled data vs. observed data
    • Lots of lesser-quality data still has uses
      • Predictive modeling, like typing and driving
      • Sometimes lesser-quality data can be embellished with good data
    • GPS data modeling is still an emerging topic
      • Two presentations that I saw about getting the purpose of a trip
      • One presentation that I saw about getting the mode of a trip
  • Testing Models and the Next 50 Years of Modeling
    • Lots of discussion related to testing models
    • FHWA and OKI and PSRC are working on a project relating to testing models
    • I actually had a lot more written here, but unfortunately issues in my area that directly relate to my work means that it really isn’t within my best interest to post it here.  It’s unfortunate, because it is some good stuff (and it will be used moving forward in my work at OKI).

Goodbye TRB 2014

January 17th, 2014

Goodbye TRB #93.  This book of TRBs has closed and a new edition begins next year at the convention center.

Goodbye (for me) to the 1/2″ thick conference program.  I took one this year, but truthfully I never used it.  The app is *that good*.  I don’t plan on taking a book next year or beyond.

Goodbye to the Hilton staff, because even though many of us don’t care for the hotel itself, the staff has done lots to help us feel at home.  We’ll miss y’all, but we won’t miss the uncomfortable chairs, limited free WiFi, or many other physical aspects of the hotel.

Goodbye to the %&$# hill on Connecticut Avenue.  Many of us government employees are rejoicing that next year we will not be schlepping a rolling suitcase up that hill.

Goodbye to the Bier Baron.  Well maybe.  I’d be fine with going back as the service was better this year and, well, bacon lollipops!  Hopefully @e-lo doesn’t call my beer selection “toxic” if we make it back next year.

—-
I have been thinking about three things lately, and these will be topics over the next few weeks:

Recap of TRBAM and Transportation Camp.

How to blog.  I’ve been approached by a few people asking about starting a blog.  I’m going to have a post describing my process, tools, etc.

Narrowing the Research-Practice gap.  I have some ideas, and some things I’m going to put into practice here with the University of Cincinnati (whom we already have a great relationship with).

Model Testing.  It is becoming increasingly important to ensure we are testing our models, and not just calibrating and validating.  I have some new ideas that may expand what we test, even further than what TMIP will be coming out with later this year (that I am involved with)

Licensing of Government Code.  I have the feeling that we need to revisit how we license code written by MPOs and DOTs as well as code purchased by the same (and to a degree, where do we draw the line between code as an executable and code as code?)

Open Presenting.  I want to look into having presentations hosted on-line and accessible to anyone.  This is because there was a projector problem in Transportation Camp that wouldn’t have been an issue except that the presentation was a ppt/pptx and it wasn’t online.  Nearly everyone in the audience had a tablet or laptop, and I’m sure everyone had a smartphone.

Cell Phone Data.  OKI purchased cell phone data from Airsage, and I will be posting about our processing of it, and I will also post about the Cell Phone Data Symposium at TRB in February.

Decision Trees.  Among the things I learned a little bit about, this is one that I want to look more into.

I think that’s it.  I had fun this year, and it was great to talk with old friends and make new friends, too.

Loop Detectors!

August 16th, 2013

Since I haven’t done much of anything on this blog, I figured I’d add a few pictures of a loop detector site.  The Ohio Department of Transportation installed the loops, cabinet, and solar panel as part of a road project (thanks ODOT!), and I just installed the loop detector counter inside.

2013-08-15 17.05.17

This is the loop detector cabinet. The wires on the upper-right are the loop lead-in wires, the big grey box below them is a battery, the small black box in the upper-left is a solar voltage regulator, and the circuit boards below that are mystery boards.

2013-08-15 17.09.45

These are a mystery to me. There is two, one is powered, one is not.

Getting Started in R

May 24th, 2013

Setting Up

Download R from http://www.r-project.org. Install it normally (on Windows)… Double-click, next, next, next, etc.

Create a project folder with your data and with a shortcut to R (shout-out to Brian Gregor at Oregon DOT for this little trick). Also copy/move the data CSV there.

Inputting and Looking at Data

The data is in CSV, so we need to load the foreign library, and then we’ll load the data. I’m not a fan of typing in long filepaths, so I use the file.choose() function to browse for the data. Note that in many cases the

inTab<-read.csv(file.choose())
summary(inTab)

In the code above, we’ve loaded the dbf into the inTab data frame (a data object in R) and got a summary of it. There’s a few tricks to see parts of the data.

inTab$HHID (only the HHID values)
inTab[1:2] (only the first two fields)
inTab[1:10,] (only the first 10 rows)
inTab[1:10,1] (only the first field of the first 10 rows)

Data can be charted in R as well. A simple histogram is very simple to do in R.

hist(inTab$HHSize)

Sometimes data needs to be summarized. There is a function to do that, but first you’ll probably have to download a package. To download the module, go to Packages – Install Packages. From the list, find plyr and install it.

Once plyr is installed (it shouldn’t take long), you can load the module and use ddply to summarize data.

library(plyr)
inTab.Per<-ddply(inTab,.(HHID,HHSize6,Workers4,HHVEH4,INCOME,WealthClass),
AreaType=min(HomeAT,3),summarise,T.HBSH=min(sum(TP_Text=='HBSh'),6),
T.HBSC=sum(TP_Text=='HBS'),T.HBSR=sum(TP_Text=='HBSoc'),T.HBO=sum(TP_Text=='HBO'))

Where inTab is the input table, .(HHID,HHSize6,HHVEH4,INCOME,WealthClass) are input fields to summarize by, AreaType=min(HomeAT,3) is a calculated field to summarize by, and everything following ‘summarise’ are the summaries.

Conclusion

This is a crash course in R, and in the last steps, you basically computed average trip rates.  Next week’s post will be to run linear and non-linear models on this data.

Quick Notes for the Week

April 19th, 2013

I didn’t have anything big to write on this blog this week, but there’s a few things out there worth a look.

In my other life, I’m an amateur radio operator.  I wrote a piece over on my other blog about the global economy and parts, as I have been buying parts from eBay at dirt-cheap prices.   This has continued implications on freight in this country.  It’s likely to get worse, as the makers-turned-entrepreneurs are (in droves) sending things off to China for fabrication.  Designed in the USA, made in China.

Mike Spack over on his blog mentioned that the one feature every traffic counter must have is identification.  He’s 100% correct.  I’ve seen a video of the Boston bomb squad blasting a traffic counter with a water cannon many years ago, and that’s what happens when you don’t put some sort of ID on your counters.   The orginal video of the counter’s demise has long since disappeared from the Internet, but you can still see the reference on Boing Boing.

 

New Website on Open Civic Hardware

January 23rd, 2013

I’ve started up a new blog that will hopefully be more maintained than this one: www.opencivichardware.org.  The idea of civic hardware came about from a presenter from Transportation Camp DC 2013.  Civic hardware are things created to help with a city (or state, or region).  It could be things like traffic counters, data loggers, tools to help with public involvement, or infrastructure.

The idea of this site is similar in nature to Hack-A-Day, but with a focus on civic hardware.  There will probably be a lot of things that can be cross-posted to both.  Additionally, look for things on this blog to be cross-posted there.

Arduino Based Bluetooth Scanners

September 30th, 2011

This is a post about a work in progress…

If you’re in the transportation field, you’ve likely heard of the Bluetooth Scanners that cost around $4,000 each. These devices scan MAC (Media Access Control) addresses and log them (with the time of the scan) and use that for travel time studies or for origin-destination studies.

My question is, can we build something good-enough with an Arduino for much less money? Something like the concept below?

 

There’s reasons for everything:

Arduino

Controls it all and brings it together.  Turns on the GPS, Bluetooth, listens to the stream of data from both, writes to the memory card.

GPS

The Arduino has no real-time clock (meaning that unless you tell it what time it is, it doesn’t know!).  The GPS signal includes time.  It also includes position, which would be pretty useful.

Bluetooth

If we’re going to scan for Bluetooth MAC addresses, something to receive them might come in handy…

Something to Write To

Scanning the addresses would be pretty pointless without storing the data.

Initial Design

 

/*
Bluetooth Tracker
Written by Andrew Rohne (arohne@oki.org)
www.oki.org
*/

#include 
#include 

NewSoftSerial ol(10,11);

char inByte;
boolean ext=false;

void setup(){
  String btreturn;
  Serial.begin(115200);
  delay(1500);
  Serial.print("$$$");
  delay(1000);

}

void loop(){
  byte incomingByte=-1;
  byte index=0;
  char macaddys[160];

  while(Serial.available()>0){
    index=0;
    Serial.println("IN15");
    delay(16500);
    incomingByte=Serial.read();
    while(incomingByte>-1 && index<160){
      macaddys[index]=(char)incomingByte;
      index++;
      incomingByte=Serial.read();
    }
    if(macaddys!=""){
      Serial.end();
      writelog((String)millis()+":"+macaddys+"\r\n");
      Serial.begin(115200);
    }
  }
  if(Serial.available()<=0){
    delay(1000);
    Serial.begin(115200);
  }
    
}

void writelog(String line)
{
  ol.begin(9600);
  ol.print(line);
  ol.end();
}

The Results

The program wrote about 5kb of text to the file before dying after 489986 milliseconds (8 minutes). I had left it on a windowsill overnight (the windowsill is literally about 15 feet from Fort Washington Way in Cincinnati, which is 6 lanes (see below for the range centered on roughly where the setup was located).

There were 9 unique Bluetooth MAC addresses scanned. During the 8 minutes, there were 25 groups of MAC addresses written to the file. 5 MAC addresses appeared in multiple groups, with 3 of the MAC addresses appearing in 24 of the groups (and they may have appeared in the last group, it appears to have been cut off). Those same 4 have been seen in earlier tests, too, so I don't know what's going on there.

The Problems to Fix

Well, first there's the problem that I had let it run all night, and it only had 8 minutes of data. Something is causing the Arduino to stop writing or the OpenLog to stop operating.

In the output file, there are a few issues. First, some processing needs to be done, and second, it appears I am reading past the end of the serial buffer (if you look in the image below, you can see a lot of characters that look like a y with an umlaut).

In the code above, the IN15 command is sent to the Bluetooth Mate Gold, which tells it to inquire for 15 seconds, and then I delay for 16.5 seconds. This is because I THINK there is a delay after the scan finishes. I don't know how long that delay is. Vehicles traveling by at 65 MPH is 95.333 feet per second. Assuming I can get the Bluetooth device very close to the road, that 1.5 second gap SHOULD be okay, but if I have to go longer it could be a problem (the range of a Class 1 Bluetooth device is 313 feet, so a device can be scanned anytime in 626 feet (up to 313 feet before the Bluetooth Station and up to 313 feet after the Bluetooth station). A vehicle would be in range for about 6.6 seconds. However, the Bluetooth signal is at 2.4 - 2.485 Ghz, and is susceptible to some interference from the vehicle, driver, passengers, etc., so speed is key.

Conclusion

I'm on the fence as to whether or not the Bluetooth Mate Gold is the right way to do this. I will still be doing some research to see if I can get better speed out of it, or if I need to look into a different receiver that can receive the 2.4 GHz area and look for MAC addresses and stream them to the Arduino.

I also need to get the GPS up and running. That is a different story altogether, as I have been trying on that and have not been successful (despite using code that works for my personal Arduino and GPS, although the model of GPS 'chip' is different.

Using Gawk to get a SimpleTransit Loadings Table from Cube PT

September 19th, 2011

One thing that I don’t like about Cube is the transit loadings report is stuck in the big program print report.  To pull this out, the following code works pretty well:

gawk /'^REPORT LINES  UserClass=Total'/,/'^Total     '/ 63PTR00A.PRN >outputfile.txt

Where 63PTR00A.PRN is the print file. Note the spaces after ^Total. For whatever reason, using the karat (the ‘^’) isn’t working to find ‘Total’ as the first thing on the line. So, I added the spaces so it gets everything. Outputfile.txt is where this will go. It will just be the table.

NOTE: You need GNUWin32 installed to do this.

Using GAWK to Get Through CTPP Data

August 18th, 2011

The 3-year CTPP website lacks a little in usability (just try getting a county-county matrix out of it).

One of the CTPP staff pointed me to the downloads, which are a double-edge sword. On one hand, you have a lot of data without an interface in the way. On the other hand, you have a lot of data.

I found it was easiest to use GAWK to get through the data, and it was pretty easy:

gawk '/.*COUNTY_CODE.*/' *.csv >Filename.txt

Where COUNTY_CODE is the code from Pn-Labels-xx.txt where n is the part number (1,2, or 3) and xx is the state abbreviation.

NOTE: Look up the county code EACH TIME.  It changes among parts 1, 2, and 3.

This command will go through all .csv files and output any line with the county code to the new file.

UPDATE

I have multiple counties to deal with.  There’s an easy way to start on getting a matrix:

gawk '/C4300US.*(21037|21015|21117).*32100.*/' *.csv >TotalFlowsNKY.csv

This results in a CSV table of only the total flows from three Northern Kentucky counties (21037, 21015, 21117; Campbell, Boone, and Kenton county, respectfully).  For simplicity’s sake, I didn’t include all 11 that I used.

Finishing Up

Then, I did a little Excel magic to build a matrix for all 11 counties and externals.  The formula is shown.  I have an additional sheet which is basically a cross reference of the county FIPS codes to the name abbreviations I’m using.  See the image below (click for a larger version).

After this, I built a matrix in Excel.  The matrix uses array summation (when you build this formula, you press CTRL+Enter to set it up right, else the returned value will be 0).

Using these techniques, I was able to get a journey to work matrix fairly quickly and without a lot of manual labor.

NOTE

You need to have GNUWin32 installed to use gawk.

 

 

 

Using gawk to Get PT Unassigned Trips Output into a Matrix

July 15th, 2011

In the process of quality-control checking a transit on-board survey, one task that has been routinely mentioned on things like TMIP webinars is to assign your transit trip-table from your transit on-board survey.  This serves two purposes – to check the survey and to check the transit network.

PT (and TranPlan’s LOAD TRANSIT NETWORK, and probably TRNBUILD, too) will attempt to assign all trips.  Trips that are not assigned are output into the print file.  In PT (what this post will focus on), will output a line similar to this:


W(742): 1 Trips for I=211 to J=277, but no path for UserClass 1.

When a transit path is not found.  With a transit on-board survey, there may be a lot of these.  Therefore, less time spent writing code to parse them, the better.

To get this to a file that is easier to parse, start with your transit script, and add the following line near the top:


GLOBAL PAGEHEIGHT=32767

This removes the page headers. I had originally tried this with page headers in the print file, but it created problems. Really, you probably won’t print this anyway, so removing the page headers is probably a Godsend to you!

Then, open a command line, and type the following:

gawk '/(W\(742\).*)\./ {print $2,$5,$7}' TCPTR00A.PRN >UnassignedTransitTrips.PRN

Note that TCPTR00A.PRN is the transit assignment step print file, and UnassignedTransitTrips.PRN is the destination file. The {print $2,$5,$7} tells gawk to print the second, fifth, and seventh columns. Gawk figures out the columns itself based on spaces in the lines. The >UnassignedTransitTrips.PRN directs the output to that file, instead of listing it on the screen.

The UnassignedTransitTrips.PRN file should include something like:


1 I=3 J=285,
1 I=3 J=289,
1 I=3 J=292,
1 I=6 J=227,
1 I=7 J=1275,

The first column is the number of unassigned trips, the second column is the I zone, and the last column is the J zone.

This file can then be brought into two Matrix steps to move it to a matrix. The first step should include the following code:

RUN PGM=MATRIX PRNFILE="S:\USER\ROHNE\PROJECTS\TRANSIT OB SURVEY\TRAVELMODEL\MODEL\TCMAT00A.PRN"
FILEO RECO[1] = "S:\User\Rohne\Projects\Transit OB Survey\TravelModel\Model\Outputs\UnassignedAM.DBF",
 FIELDS=IZ,JZ,V
FILEI RECI = "S:\User\Rohne\Projects\Transit OB Survey\TravelModel\Model\UnassignedTransitTrips.PRN"

RO.V=RECI.NFIELD[1]
RO.IZ=SUBSTR(RECI.CFIELD[2],3,STRLEN(RECI.CFIELD[2])-2)
RO.JZ=SUBSTR(RECI.CFIELD[3],3,STRLEN(RECI.CFIELD[3])-2)
WRITE RECO=1

ENDRUN

This first step parses the I=, J=, and comma out of the file and inserts the I, J, and number of trips into a DBF file. This is naturally sorted by I then J because of the way PT works and because I am only using one user class in this case.

The second Matrix step is below:

RUN PGM=MATRIX
FILEO MATO[1] = "S:\User\Rohne\Projects\Transit OB Survey\TravelModel\Model\Outputs\UnassignedAM.MAT" MO=1
FILEI MATI[1] = "S:\User\Rohne\Projects\Transit OB Survey\TravelModel\Model\Outputs\UnassignedAM.DBF" PATTERN=IJM:V FIELDS=IZ,JZ,0,V

PAR ZONES=2425

MW[1]=MI.1.1
ENDRUN

This step simply reads the DBF file and puts it into a matrix.

At this point, you can easily draw desire lines to show the unassigned survey trips. Hopefully it looks better than mine!

Getting the 2nd Line through the Last Line of a File

June 24th, 2011

One recent work task involved compiling 244 CSV traffic count files and analyzing the data.

I didn’t want to write any sort of program to import the data into Access or FoxPro, and I didn’t want to mess with it (since it would be big) in Excel or Notepad++.

So, I took the first of the 244 files and named it CountData.csv. The remaining files all begin with ‘fifteen_min’ and they are isolated in their own folder with no subfolders.

Enter Windows PowerShell really powered up with GNUWin.

One command:
awk 'NR==2,NR<2' .\f*.csv >> CountData.csv

awk is a data extraction and reporting tool that uses a data-driven scripting language consisting of a set of actions to be taken against textual data (either in files or data streams) for the purpose of producing formatted reports (source: Wikipedia).

The first argument, NR==2 means start on record #2, or the second line in the file.
The second argument, NR<2, means end on the record less than 2. In this case, it always returns false, and thus the remainder of the file is output. The .\f*.csv means any file in this folder where the first letter is f and the last 4 letters are .csv (and anything goes between them). The ‘>> CountData.csv’ means to append to CountData.csv

Once I started this process, it ran for a good 45 minutes and created a really big file (about 420 MB).

After all this, I saw a bunch of “NUL” characters in Notepad++, roughly one every-other-letter, and it looked like the data was there (just separated by “NUL” characters).  I had to find and replace “\x00” with blank (searching as Regular Expression).  That took a while.

Acknowledgements:

The Linux Commando.  His post ultimately helped me put two and two together to do what I needed to do.

Security 102.  The NUL thing.

Tour-Based Modeling: Why is it Important?

June 12th, 2010

One thing that is constantly bounced around is why tour-based modeling is better than trip based modeling.  We’ve been using trip based modeling for 50 years, isn’t it timeless?

No.

Fifty years ago, when the trip based modeling methodologies were developed, the primary reason was to evaluate highway improvements.  While tolling was in use, the bonding requirements were likely different.  Transit, while extremely important, was not in the public realm (the streetcars were normally privately owned by the area’s electric company).

Now, there are a lot of demands on travel models:

  • Tolling/Toll Road analysis at a better level
  • Different tolling schemes (area tolling, cordon tolling)
  • Travel Demand Management (telecommuting, flex hours, flex time, alternative schedules)
  • Better freight modeling (which now is becoming commodity flow and commercial vehicle modeling)
  • Varying levels of transit (local bus, express bus, intercity bus, BRT, light rail, and commuter rail

While many of these can be done with trip based models, most of them cannot be done well with trip based models.  There are a number of reasons, but the few that come to mind are aggregation bias, modal inconsistency, and household interrelationships.

Aggregation Bias

Aggregation bias occurs when averages are used to determine an outcome.  For example, using a zonal average vehicles per household, you miss the components that form the average, such as:

20 households, average VPHH = 2.2
2 HH VPHH = 0
5 HH VPHH = 1
4 HH VPHH = 2
6 HH VPHH = 3
3 HH VPHH = 4+

The trip generation and modal choices (car, bus, bike, walk, etc.) among these households are all different, and are even more more different if you look at the number of workers per household.

Modal Inconsistency

In trip based modeling, “people” are not tracked throughout their day.  So, if someone rides the bus to work, there is nothing in the model to ensure that they don’t drive from work to get lunch.  While we don’t want to force people to use the same mode, since many people will use the bus to get to work and then walk to lunch or to go shopping during lunch, we want to make sure that there is some compatibility of modes.

Household Interrelationships

One of the features of of tour based models is determining each person’s daily activity pattern.  During this process, certain person types can determine what another person is doing.  For example, if a preschool age child is staying home, an adult (whether they are a worker or not) HAS to stay home.  Another example is if a school-non-driving-age child is going on a non-mandatory trip, an adult must accompany them.  Trip based models don’t know about the household makeup and the household interaction.

The above are only three of the many reasons why tour-based modeling is important.  There are many more, but I feel these are some of the most important and some of the easiest to understand.

Romanian street sign warns drivers of ‘drunk pedestrians’ – Telegraph

March 15th, 2010

In what is perhaps an accidental approach to reducing pedestrian crashes using the first step of “the three Es” (education, enforcement, engineering), Pecica, Romania has installed signs that warn of drunk pedestrians ahead.

While a little odd, I applaud the mayor for experimenting with a low-cost, low-impact way to handle the problem.  I hope it works.

Romanian street sign warns drivers of ‘drunk pedestrians’ – Telegraph.

Former DOT Secretary weighs in on Transportation Bill

October 15th, 2009

Reference: National Journal Online — Insider Interviews — Bush DOT Chief Discusses Reauthorization.

I agree with the thoughts of increased tolling and more fees other than the gas tax.  I also agree with $1B per year for technology, but it has to be managed right.

I’m also glad that the performance measures are measurable:

  • Congestion (we can measure that – it is the percent of a region’s network that is operating with a demand greater than its capacity)
  • Costs (we can measure that, although we have to watch how we do it, as we don’t want to have a system be considered bad if gas prices hit $4/gallon)
  • Safety (we DO measure this – it is the number of injuries and deaths on the road)

What are those little green boxes???

April 11th, 2009

It is the start of traffic counting season in Ohio. Each year, we get about 7 months to count the cars on the road. With my involvement in this type of work, I hear a lot of horror stories. First off, I wanted to discuss how these things work and how the data is used and cannot be used, and then show some of the war stories.

Traffic Counter on side of road

Traffic Counter on side of road

First off: how these things work

Those that have been around for 30 or more years may remember when some gas stations had a hose that rang a bell to call a station attendant to pump your fuel. Those that don’t should watch Back to the Future. This is the same basic concept for most traffic counters. There are hoses that go across the road, and based on what the sensors feel and the time between them, these little green (or sometimes gray) boxes calculate the number of axles, distance between them (which can be used to derive the type of vehicle), and the speed.

I know that speed is a big issue with a lot of people. After all, some of these counters are being used for speed studies to see if they want to put a cop on a road at a certain time. This does happen, despite my wishes that cops and others would use less-detectable methods for enforcement. There are two other ways that counts, with speed, can be taken. One is by RADAR (the same thing they use for active speed enforcement). Mind you, for speed sampling, RADAR is pretty useful when installed correctly, and the boxes can be hidden quite well. The other is using magnetic loops. There are portable models of these that sit in the lane and are difficult to see (and also susceptible to theft). There are also permanent models that can be completely hidden from view.

One thing I can say with ALL hose counters: WE CANNOT USE THEM FOR SPEED ENFORCEMENT! The units do not have any cameras (etc), so if you speed while going over them, we know you did, but we don’t know who you are!

Second off: How We Use The Data We Get From These Things

This one differs by jurisdiction, but most use it for traffic studies. Speed, count, and vehicle type are very useful for roadway improvement design. Another use is for travel model validation. We (specifically me, since it is my job) use this to ensure that the region’s travel model is accurate so when we use it to plan billions of dollars in improvements, we know we’re not just guessing, which would be a waste of money.

Law enforcement will use the number of speeders per unit of time to plan when to run patrols. As I indicated, I wish they wouldn’t use hose counters for this, but they do, and the data they get is accurate. However, hoses are pretty conspicuous, which is why I wish they wouldn’t use them.

We cannot use the data in court. You cannot be detected to be going 45 MPH in a 25 MPH zone based on a traffic counter. The counters do not have cameras in them, and none that I know of can connect to a camera. A camera would be required to prove who was speeding. Without the connection, it would be difficult to prove, since the times would have to be the same, the counter has to be operating perfectly, and the hoses have to be measured very precisely. Some states also forbid the use of cameras for passive law enforcement (a cop can actively use a RADAR+camera, but not mount one on a pole and get every car that is speeding).

The War Stories

I have two, both given to me by a salesperson for Jamar Tech, one of the leading traffic counter manufacturers.

City of Boston Thinks a Counter is a Bomb. This one is proof that some cops don’t use hose counters, else they would have known what this unit is.

Counter burned, likely by an accelerant. PDF from Jamar, which the salesperson sent me just after I bought 8 counters from him.

Don’t Mess With Them!

It amazes me that 1 month into the season, I’ve had to replace several hoses because of cut or stolen hoses. This is your tax dollars at work. The more hoses we have to replace, the less money we have to improve the roads.

Travel Demand Modeling 101 Part 1: Terminology

August 22nd, 2008

It occurred to me that many people likely do not understand all of the terminology of travel demand models.  Because of this, I felt the need to list many of them here. Read the rest of this post… »

Random Thought: Road Nicknames

June 4th, 2008

I’ve occasionally seen some road nicknames that are particularly good.  A few that I’ve heard:

  • Malfunction Junction (I-275 and I-4, Tampa, FL)
  • The Riddle in the Middle (Alaska Way, Seattle, WA)
  • Spaghetti Junction (I-85 and I-285, Atlanta, GA)

I’ve also started calling a strech of Columbia Parkway (Cincinnati, OH) “The Suicide Side”, which is a 45 MPH arterial that everyone goes 60 MPH.  The divider is a double-yellow line… only.

Got any more?  Add ’em in the comments.

Four Step Model Explained: Trip Generation

June 3rd, 2008

Trip generation is likely one of the easiest parts of the four step process.  Normally, the most difficult part of dealing with trip generation is getting the input socioeconomic (population and employment) data correct.  This post explains how trip generation is calculated in the model… Read the rest of this post… »

Introduction to the Four Step Travel Demand Model

May 27th, 2008

The center of most travel demand models is the “Four Step Model”.  This model was created in the 1950s to determine the demand on roadways.  The four steps include:

  1. Trip Generation
  2. Trip Distribution
  3. Mode Choice
  4. Trip Assignment

Read the rest of this post… »