Mode Choice Modeling with R
I started this post (and the work to go with it) as a companion to A Self Instructing Course in Mode Choice Modeling by Bhat and Koppelman. Â That’s because I could reproduce the work in the book in R and can (now) reproduce in R.
To continue with this, please get the CD files from my last blog post. Â You’ll specifically need “SF MTC Work MC Data.sav”, which is in SPSS format.
The first part:
library(foreign) library(mlogit)
The items above simply load the libraries. Â If any of these are not found, go to Packages (on the menu bar) – Install Packages… and select your closest mirror and select the missing package (either foreign or mlogit).
Next, read in the data and we’ll add a field, too, as there is no unique id in this dataset.
inTab<-read.spss(file.choose(),to.data.frame=T,use.value.labels=F) inTab$HHPerID=inTab$hhid*100+inTab$perid
The first line reads in the SPSS file (it asks you for the file). Â The second adds a "HHPerID" field, which is unique to each case.
The next part is to format the data for mlogit. Â This is quite a challenge because it has to be JUST RIGHT or there will be errors.
mc<-mlogit.data(inTab,choice="chosen",shape="long",chid.var="HHPerID",alt.var="altnum",drop.index=T)
The first parts of this are pretty obvious (inTab is the input table, choice="chosen" is the choice field). Â Shape="long" indicates that the data is multiple records per case. Â "Wide" would indicate each record is on its own line. Â chid.var is the case id variable. Â alt.var is the alternatives. Â drop.index drops the index field out of the resulting table.
Finally, we'll run a simple multinomial logit estimate on this.
nlb<-mlogit(chosen~cost+tvtt|hhinc,mc)
For such a short piece of code, there is a lot going on here. Â The formula is (simply) chosen=cost+tvtt+hhinc, BUT hhinc is alternative specific and cost and travel time are not. Â So the utilities for this would be something like:
$$U_{da}=\beta_{cost}*cost+\beta_{tt}*tvtt$$
$$U_{sr2}=\beta_{cost}*cost+\beta_{tt}*tvtt+\beta_{inc,sr2}*hhinc+K_{sr2}$$
$$U_{sr3}=\beta_{cost}*cost+\beta_{tt}*tvtt+\beta_{inc,sr3}*hhinc+K_{sr3}$$
$$U_{transit}=\beta_{cost}*cost+\beta_{tt}*tvtt+\beta_{inc,tranist}*hhinc+K_{transit}$$
$$U_{walk}=\beta_{cost}*cost+\beta_{tt}*tvtt+\beta_{inc,walk}*hhinc+K_{walk}$$
$$U_{bike}=\beta_{cost}*cost+\beta_{tt}*tvtt+\beta_{inc,bike}*hhinc+K_{bike}$$
The result is this:
>summary(nlb) Call: mlogit(formula = chosen ~ cost + tvtt | hhinc, data = mc, method = "nr", print.level = 0) Frequencies of alternatives: 1 2 3 4 5 6 0.7232054 0.1028037 0.0320143 0.0990257 0.0099423 0.0330086 nr method 6 iterations, 0h:0m:6s g'(-H)^-1g = 5.25E-05 successive function values within tolerance limits Coefficients : Estimate Std. Error t-value Pr(>|t|) 2:(intercept) -2.17804077 0.10463797 -20.8150 < 2.2e-16 *** 3:(intercept) -3.72512379 0.17769193 -20.9639 < 2.2e-16 *** 4:(intercept) -0.67094862 0.13259058 -5.0603 4.186e-07 *** 5:(intercept) -2.37634141 0.30450385 -7.8040 5.995e-15 *** 6:(intercept) -0.20681660 0.19410013 -1.0655 0.286643 cost -0.00492042 0.00023890 -20.5965 < 2.2e-16 *** tvtt -0.05134065 0.00309940 -16.5647 < 2.2e-16 *** 2:hhinc -0.00216998 0.00155329 -1.3970 0.162406 3:hhinc 0.00035756 0.00253773 0.1409 0.887952 4:hhinc -0.00528636 0.00182881 -2.8906 0.003845 ** 5:hhinc -0.01280827 0.00532413 -2.4057 0.016141 * 6:hhinc -0.00968627 0.00303306 -3.1936 0.001405 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Log-Likelihood: -3626.2 McFadden R^2: 0.25344 Likelihood ratio test : chisq = 2462 (p.value = < 2.22e-16)
And this matches the self-instructing course manual, page 76 (under "Base Model").
Nested Logit
R can do simple nested logit calculations, but unfortunately they have to be *very* simple (which is uncharacteristic for R). Â The best thing to do is get a copy of Biogeme and read the next post in this series.
Tags: logit, mode-choice, R