Big Data/Analytics Zone is brought to you in partnership with:

Jonathan Callahan received his Ph.D. in Physical Chemistry from the University of Washington in 1993. After two years as a post-doc in a magnetic resonance imaging laboratory, Jonathan joined NOAA's Pacific Marine Environmental Laboratory to work on analysis and visualization software for oceanographic climate and model data. Since 2007 Jonathan has worked as an independent consultant for NOAA, NASA and the US EPA. His areas of expertise include: data management; data visualization; statistical analysis using R; interface design; data mining; web services architecture. Jonathan writes occasional articles on data management at Working With Data. Jonathan is a DZone MVB and is not an employee of DZone and has posted 13 posts at DZone. You can read more from them at their website. View Full User Profile

Using R — A Script Introduction to R

  • submit to reddit

To many people, R is like the Everglades. They’ve heard of it, they know it’s big and has amazing treasures deep inside. Articles in the media can make it look irresistible. But after a personal or even second-hand experience people also learn that R can be a big swamp where you are all but guaranteed to get soggy boots and mosquito bites before you’re done. And there is always the distinct possibility of getting lost and falling into a ‘gator hole’. Indeed, if you go in without a guide, hoping to get in and out quickly you probably won’t enjoy it much. This post contains a script that shows you some of the sights without letting you fall in. If you like to learn by example, read on. 

The rest of this post is the verbatim script with graphics embedded in the appropriate places. You can also download the script and run it yourself. The comments in this script capture a session of working with and thinking about a dataset. This script doesn’t try to cover everything. On the contrary, it pedantically reuses as few techniques as possible to show that you can do a lot with a little.

This script also demonstrates how to be systematic with respect to commenting, variable naming, setting graphical parameters, etc. One of the keys to working successfully with R is writing scripts that explain what they are doing and contain consistent, readable, verging-on-predictable code.

I look forward to any suggestions for corrections/improvements.

(Curator's note: For the R code in text format, see this article's source.)



OK, perhaps we shouldn’t expect a job offer from the Florida Department of Tourism any time soon. But I hope this short tour through the R swamp shed some light on how just a few techniques can help you begin interrogating large datasets and telling interesting stores.

Happy Exploring!

Published at DZone with permission of Jonathan Callahan, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)