Tuesday, May 4, 2010

R is for Round Two


This week in R, I chose data from the United States Department of Transportation -- Federal Highway Administration. The data included information about state motor-vehicle registration in 2008 and were separated in to vehicle registration for cars, truck and buses both private or commercial and public. It gave totals of registered vehicles for each state.
At first I was struggling with plotting something meaningful as I kept creating bargraphs that no matter what variable I chose would have a value of one for every state. I (h/t Kebonye) figured out this was because the data was loading in as a string rather than a number. The data file was fixed in Excel and we were cookin’ with gas!
Now having readable numbers in the columns, I was able to plot charts showing total numbers of registered vehicles by state. However, I felt that this type of information wasn’t helpful and therefore added two other columns of information—total population by state and regions. I added the variables of regions in the Excel file to be able to work with the data in some geographic form. This, I felt was the only was to get any sense of space beyond individual states. The population data was added so I could normalized the total number of vehicles by total population.

With the additional variables I could do some math and calculated total registered trucks and cars per person. This information was then visualized as a boxplot organized by regions for both trucks and cars.

Finally the data about cars and trucks per person were graphed in a bivariate plot to show the relationship between the two variables and a regression line was added.



Note: For the scatter plot of vehicles per person with selected states identified I had to create a separate spreadsheet in Excel to sort the data by vehicles per person. If I didn’t do this, I could not identify the state correctly.

##Working with stats from the 48 states. State motor-vehicle registration in 2008 data from US DoT Federal Highway Administration available at http://www.fhwa.dot.gov/policyinformation/statistics/2008/mv1.cfm

setwd("/Users/Erin/R_work")
#changes where the working directory is
getwd()


#read Tab deliminated files motor_veh_08
drive <- read.delim("/Users/Erin/R_work/motor_veh_08.txt", header=TRUE, na.strings="NA", dec=".")

#show working files
ls()

#make variables accessible
attach(drive)
names(drive)



##creating car and truck density per population
cars <- total_car/population
trucks <- total_truck/population

#creating box plots
boxplot(trucks~region, ylab="trucks per capita", xlab="region", main="Trucks per person by Region", data=drive, col=5)
boxplot(cars~region, ylab="cars per capita", xlab="region", main="Cars per person by Region", data=drive, col=2)

#scatter chart cars per population
plot(cars, xlab="States", ylab="registered vehicles per capita", main="2008 Registered Vehicles per Capita: selected states")
identify(cars, labels=state, cex=0.7)

#scatter chart trucks per population
plot(trucks, xlab="States", ylab="registered trucks per capita", main="2008 Registered Trucks per Capita: selected states")
identify(trucks, labels=state, cex=0.7)

#select the points on the graph to be labled, right click to stop identification

##figure 4
#cars vs. trucks
plot (cars, trucks, xlab="registered cars per capita", ylab="registered trucks per capita", main="Registered cars and trucks by state, 2008", col=3)
abline(lm(cars~trucks), col=3)

All I have to say is bring on ArcGIS!

No comments:

Post a Comment