That was the best MOOC I took so far — fun and engaging. Statistics and R from Trevor and Rob and Stanford — a dream coming true!
After finishing “House of Cards” I am on a binge studying of the Stanford online course “Statistical Learning”. That is the best ever class on the subject: comprehendible explanations, no rush, appropriate jokes, no excessive material, the free book for download that closely follows the material of the class, and lovely R sessions running by Trevor Hastie himself.
I have learned new tricks in R. For example, a handy matplot() function, which output is shown above. On one plot, it shows values for each variable (column) for each row (X-axes) in the data frame. It can be seen that there is strong autocorrelation between consecutive rows, or about 10-20 repeats of every data point — so the sample size is in effect much smaller than the number of rows. That would result in a serious underestimate of the standard error (s.e.) for regression coefficients, if to model y (values are shown in black on the plot) as a linear function of x1 (in red) and x2 (in blue): y = b0 + b1*x1 + b2* x2. Thus, for b1, the standard bootstrap gave s.e. of 0.028, while the block bootstrap (by blocks of 100 rows), 0.196.
While I was satisfying my mental curiosity and was busy with launching a new project at work (implementation of a new analytical tool for IEDB.org to be launched in summer), a lot has happened in the world of Big Data that I crave for comprehension and reporting.
The course has officially started. It is free for everyone and is available at https://class.stanford.edu/courses/HumanitiesScience/StatLearning/Winter2014/. It is in R. A new book from the authors An Introduction to Statistical Learning, with Applications in R (James, Witten, Hastie, Tibshirani – Springer 2013) is provided for free download on the course website.
Everyone completing all problems by March 21 can get a Statement of Accomplishment. I am thinking about getting one, while I am not new to the subject and took a somewhat similar graduate course at UCSD given by Charles Elkan, but I found out that setting up the goal of getting a certificate makes it easier to keep with the course.
Hastie and Tibshirani start the introduction (of the actual class, not on the above video) joking how they were statisticians, then became machine learning guys, and now they proudly call themselves Data Scientists. Watch this! I am looking forward to learn from these gurus. I am sure it will be fun!