Lookups in R: The REAL Right Way!

September 9th, 2014

After waiting forever enough to get things to run, I stepped into a better way to do lookups.

mapply on matrix objects.

Basically, I do this:

TSkimLBWPk<-read.dbf("data/TSPKLBW.DBF") #Read the local bus walk skim


TSKimLBWPK_IWAIT=(acast(TSkimLBWPk,I~J,value.var="V1",drop=FALSE,fill=0)) #build a matrix from the data
TSKimLBWPK.IWAIT<-function(i,j) {

     if(i<=nrow(TSKimLBWPK_IWAIT) && j<=ncol(TSKimLBWPK_IWAIT))

          return(TSKimLBWPK_IWAIT[i,j])

     else return(0)

} #build a function to lookup, returning 0 if it is looking for a zone not found

TripsAllPk$LBW.IWAIT=mapply(TSKimLBWPK.IWAIT,TripsAllPk$PTAZ,TripsAllPk$ATAZ) #do the lookup

That’s it. This takes the input DBF (which has I, J, V1, V2, etc. fields), converts to a matrix for a quick lookup, and then applies it.

It runs in about 3 minutes.

Lookups in R: The Wrong Way and the Right Way

August 28th, 2014

I recently wrote a script that takes DBF exports of Cube matrices and prepares them for Biogeme. Â The main… well, only reason I did this in R was because I was considering using mlogit for model estimation. Â I ultimately decided to ‘go with what I know’ and changed course to use Biogeme. Mind you, the part of Stairway to Heaven applies: “There are two paths you can go by, but in the long run /Â There’s still time to change the road you’re on.”

The Wrong Way

I’ve changed my code already, so pardon that this is from memory. Â Also, these are snippets – I have a lot more code than this.

HSkimPk<-read.dbf("data/HSKIM_PK3.dbf")

for(rn in 1:nrow(TripsAll)){
HSkimPkRow<-subset(HSkimPk,I==TripsAll[rn,"PTAZ"] & J==TripsAll[rn,"ATAZ")
TripsAll$DA.IVT<-HSkimPkRow[,"V1"]
...
}

This took no less than 17 hours to complete for around 23,000 trip records and for values from 5 different tables * 2 time periods.

The Right Way

I (obviously) wanted something that wouldn't take forever, especially as I was working in Biogeme and seeing things that made me think that I wanted to change ONE LITTLE THING. Â This seems to always happen.

I took a different approach that by my calculations should be much quicker.

HSkimPk<-read.dbf("data/HSKIM_PK3.dbf")
HSkimPkD<-acast(HSkimPk,I ~ J,value.var="V2",drop=FALSE,fill=0)
HSkimPkT<-acast(HSkimPk,I ~ J,value.var="V1",drop=FALSE,fill=0)

for(rn in 1:nrow(TripsAll)){
if(I<=nrow(HSkimPkT) & J<=nrow(HSkimPkT)){
TripsAll[rn,"DA.IVT"]<-HSkimPkT[I,J]
}
}

Since this is currently running, my only metrics are to look at the time per 50 rows (in my real code, I have a line that outputs a timestamp every 50 rows), and it is taking about 0.27 seconds per record, compared to somewhere around 4.5 seconds per record. Â While not perfect, I'll take an estimated completion of 1.75 hours compared to 17 (update: 2 hours). Â However, I will say that Cube is faster in this regard and that I may not have the fastest R solution.

Lookups in R: The REAL Right Way!

Lookups in R: The Wrong Way and the Right Way

The Wrong Way

The Right Way

About this Blog

Associated Sites

Links

Pages

Categories

Subscribe

Tags

Social Networking

Recent Presentations

Photography