Tutorial: Base R Graphics in AzureML

February 26, 2015

1700 views
401 downloads


Report Abuse
This experiment serves as a tutorial for creating R graphics inside of Azure ML Studio.
# Data The dataset we are using is an example dataset packaged with the R library ggplot2. It contains 43930 rows and 10 variables where each row is a series of attributes of a particular diamond. The variables are: price, carat weight, quality of cut, color, clarity, length, width, depth, total depth percentage, and width of top diamond. # Set Up We will use the traditional Azure ML workflow for this experiment. Although unconventional in this case because the data is built into an R package, utilizing Azure ML’s workflow allows for easy dataset substation and experiment expansion. Also, it’s great practice for future experiments! Here are our steps: * Use an **Execute R Script** Module to Load the ggplot2 library. Save the diamonds dataset to a variable and then output it to Azure ML. * Identify categorical attributes and cast them into categorical features using the **metadata editor** module. These attributes were cast into categorical values: color, clarity, cut. * Drag in another **Execute R Script** module, which will contain our R code for graphing. # R Graphs in Azure A great feature of the **Execute R Script** module is its ability to render R graphics. Any graph made in the module will automatically be output to the bottom right node labeled “R Device”. These can be saved to your computer with only two clicks. # Graphing in R We created three different graphs: a [histogram](http://en.wikipedia.org/wiki/Histogram), a [box plot](http://en.wikipedia.org/wiki/Box_plot), and a [scatter plot](http://en.wikipedia.org/wiki/Scatter_plot). To make a histogram in R, use the function [hist(x)](http://stat.ethz.ch/R-manual/R-devel/library/graphics/html/hist.html) where x is a column from a data frame or vector. Adding a title and x-label are made easy by including hist(x, main = “Title”, xlab = “x-label”). The y-label is automatically called Frequency but can be customized by adding ylab = ‘y-label’. We choose to make a histogram that displays the different carat values. The histogram shows that there were far more smaller diamonds used in this study. Here is the code: > hist(diamonds$carat, main = "Carat Histogram", xlab = "Carat") [box plots](http://en.wikipedia.org/wiki/Box_plot) are created using the function [boxplot(x)](https://stat.ethz.ch/R-manual/R-devel/library/graphics/html/boxplot.html) and use the same syntax as histograms for titles and labels. We substituted price for carat in order to draw a box plot of the different prices using the code below. The box plot shows that most of the prices are lower than $5,000, and there are many outliers in the higher price range. > boxplot(diamonds$price, main = "Boxplot of Diamond Carat", ylab = 'carat') [scatter plots](http://en.wikipedia.org/wiki/Scatter_plot) can be created using [plot(x,y)](https://stat.ethz.ch/R-manual/R-devel/library/graphics/html/plot.html) and the same syntax for labels as the previous graphs. The only difference is plot(x,y) takes in two different vectors or data frame columns. In our experiment, we used the formula notation plot(y ~ x). The y ~ x implies we are graphing y against x. Our example graphs price against carat weight. As expected, price increases as carat weight increases. > plot(price ~ carat, data = diamonds, main = "Price vs Carat") # Related 1. [Tutorial: Using R package ggplot2 in Azure ML. Histograms, density plots and violin plots](https://gallery.azureml.net/Details/b1c26728eb6c4e4d80dddceae992d653) 2. [Tutorial: Building a classification model in Azure ML](https://gallery.azureml.net/Details/01b2765fa75147ce99679e18482d280f) 3. [Tutorial: Creating a random forest regression model in R and using it for scoring](https://gallery.azureml.net/Details/b729c21014a34955b20fa94dc13390e5)