January 07, 2013

Last week I started taking a new course on Coursera.org. It’s called Computing for Data Analysis and its all about using the R programming language to get a better understanding of large datasets. I work for Yottaa and large datasets are something we deal a lot with. I want to find an interesting way to work with and understand the data collected from all sorts of sites on the internet and I am hoping R is the perfect solution. Unfortunately, its a bit wierd and not all that well documented, so this training course could be just what I need.

The programming assignment from last week already taught me a few commands I hadn’t used before. We are working with a table of numbers describing ozone, temperature and more and we needed to find the mean of the Solar.R value where Ozone was greater than 31 and temperature was greater than 90. At first it took me about 7 lines of code to get this done, but after some searching, managed to get it down to a single beautiful line (which I am sure I would forget if I did not document it here):

`mean((subset(x,Ozone>31 & Temp>90))$Solar.R)`

Wow, that is cool!

I am looking forward to the second section of the course which starts on Wednesday!