R Basics: Introduction to 'R' Analytics Tool


-What is it?
Learning R?

To some people R is just the 18th letter of the alphabet. To others, it’s the rating on racy movies, a measure of an attic’s insulation or what pirates in movies say.

R is also the name of a popular programming language used by a growing number of data analysts inside corporations and academia. It is becoming their lingua franca partly because data mining has entered a golden age, whether being used to set ad prices, find new drugs more quickly or fine-tune financial models. Companies as diverse as Google, Pfizer, Merck, Bank of America, the InterContinental Hotels Group and Shell use it.
Read the entire article published by New York Times here: (http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html )


Why 'R' ?
R is a statistical tool/platform/programming language which is free and open-source. It is built by academicians/statisticians who continuously provide libraries (customized functions) for new and emerging statistical techniques.
  • Cutting-edge statistics with 'R'- It is important to take notice of quite a few packages that are not available ih any other statistical tool in the market currently. Read about the new technique called ‘glmnet’ which is a state-of-the-art modeling package that handles the prediction of interval and categorical dependent variables efficiently.
  • R has powerful graphics capabilities. A graph gallery of R’s graphics possibilities can be seen here (http://addictedtor.free.fr/graphiques/thumbs.php?sort=votes)
  • R is already being used by some Bigshots- Companies like Google and Facebook (to name a few) are already using it for in-house analytics. An event was organized some time back where representatives from Google and Facebook discussed their experiences with R. The video of this event can be found here (http://www.lecturemaker.com/2009/02/r-kickoff-video/)



Make 'R' work for you
  • Getting  'R'-R is available for Windows, Linux and Mac OS. Windows version of R is available here (http://cran.cnr.berkeley.edu/bin/windows/base/R-2.9.0-win32.exe)
  • Using 'R'-R is an intuitive statistical tool which requires a user to spend some time (a day or half!) in the beginning to understand its structure. Once this is done, subsequent use of R becomes effortless. Numerous well written tutorials are available at the end of this article
  • 'R' is a great tool for creating complicated plots and charts which are publication quality. A Sales dashboard made using R graphics. For avid EXCEL users, a video tutorial shows how R can be of help.
  • 'R' also allows data input from SAS, EXCEL , SPSS, STATA and SYSTAT to name a few.

Industry acceptance of  'R'
  • SAS GLOBAL FORUM, WASHINGTON  (Mar. 23, 2009)  –  SAS, the leader in business analytics, is expanding analytical options for its customers with a new interface to R open-source statistical software. SAS’ initial integration with R is included in SAS/IML Studio 3.2 (formerly SAS Stat Studio)
  • R is used in Government regulated clinical research in the US and has approval of the FDA. A SAS spokesman has this to say about R (http://blogs.sas.com/hls/index.php?/archives/25-The-Many-Faces-of-R.html)

What is so cool about 'R'?
  • R  can be used to provide analytics on the Web- Embeddings for Apache server are available which embed R on a server and a user can interact with it via a web browser. (http://data.vanderbilt.edu/~hornerj/brew/useR2007.rhtml) is an example of such a setup.
  • R can be embedded in EXCEL- This may require an intermediate-advanced degree of familiarity with R and VBA, but for seasoned VBA users, embedding R in EXCEL will provide them with a whole new range of statistics and graphic capabilities not seen in current versions of EXCEL.
  • 'R' can be used to display animations- (http://www.yihui.name/r/stat/multivariate_stat/kmeans/index.htm) This site displays animations made using R to illustrate key statistical concepts.
  • Community support in R is very mature and most newbies get a great deal of help by posting to the r-help list. Responses come from  professors/ statisticians/ programmers/ advanced users which fosters learning in a more holistic manner, since most newbies are instructed in how to solve their R programming issues with relevant insights into the statistics involved.

Essential 'R' Links
  • home page:http://www.r-project.org/index.html
  • Packages in R are written for specific purposes. CRAN is the repository where all packages are contained. These packages are segmented in terms of their target domain (Machine Learning, Spatial Statistics, Genetics, Graphics, etc.). Get the full list of topics here. (http://cran.cnr.berkeley.edu/web/views/)
  • 'R' code with examples for performing a host of basic (regression, diagnostics, ANOVA) and advanced statistics (Classification trees, Random Forests, Neural Networks). (http://www.statmethods.net/stats/index.html)
  • UCLA provides annotated R examples for many statistical techniques. (http://www.ats.ucla.edu/stat/r/dae/default.htm)
  • Revolution Computing is an enterprise R provider. They have created Parallel R, R 64 bit for Windows and Revolution R.  Their blog discusses latest developments in R, R usage in diverse industries and all the “cool” things that R is being used for (to run an entire bank). This blog has is mandatory reading. (http://blog.revolution-computing.com/ )









Blog Widget by LinkWithin

Search this blog..

Loading