Getting Started in R
Setting Up
Download R from http://www.r-project.org. Install it normally (on Windows)… Double-click, next, next, next, etc.
Create a project folder with your data and with a shortcut to R (shout-out to Brian Gregor at Oregon DOT for this little trick). Also copy/move the data CSVÂ there.
Inputting and Looking at Data
The data is in CSV, so we need to load the foreign library, and then we’ll load the data. I’m not a fan of typing in long filepaths, so I use the file.choose() function to browse for the data. Note that in many cases the
inTab<-read.csv(file.choose())
summary(inTab)
In the code above, we’ve loaded the dbf into the inTab data frame (a data object in R) and got a summary of it. There’s a few tricks to see parts of the data.
inTab$HHID (only the HHID values)
inTab[1:2] (only the first two fields)
inTab[1:10,] (only the first 10 rows)
inTab[1:10,1] (only the first field of the first 10 rows)
Data can be charted in R as well. A simple histogram is very simple to do in R.
hist(inTab$HHSize)
Sometimes data needs to be summarized. There is a function to do that, but first you’ll probably have to download a package. To download the module, go to Packages – Install Packages. From the list, find plyr and install it.
Once plyr is installed (it shouldn’t take long), you can load the module and use ddply to summarize data.
library(plyr)
inTab.Per<-ddply(inTab,.(HHID,HHSize6,Workers4,HHVEH4,INCOME,WealthClass),
AreaType=min(HomeAT,3),summarise,T.HBSH=min(sum(TP_Text=='HBSh'),6),
T.HBSC=sum(TP_Text=='HBS'),T.HBSR=sum(TP_Text=='HBSoc'),T.HBO=sum(TP_Text=='HBO'))
Where inTab is the input table, .(HHID,HHSize6,HHVEH4,INCOME,WealthClass) are input fields to summarize by, AreaType=min(HomeAT,3) is a calculated field to summarize by, and everything following ‘summarise’ are the summaries.
Conclusion
This is a crash course in R, and in the last steps, you basically computed average trip rates. Â Next week’s post will be to run linear and non-linear models on this data.
Tags: R
[...] Intro to R: getting data in, making summaries [...]