Tutorial: ggplot2 - Layered Scatter Plots

February 28, 2015

545 views
174 downloads


Report Abuse
This tutorial explains how to create a colorful faceted multi-layered graphics ggplot2 inside of Azure ML.
# Data The diamonds dataset is an example dataset packaged with the R library ggplot2. It contains 43930 rows and 10 variables where each row is a series of attributes of a particular diamond. The variables are: price, carat weight, quality of cut, color, clarity, length, width, depth, total depth percentage, and width of top diamond. # Set Up We will use the traditional Azure ML workflow for this experiment. Although unconventional because the data is built into an R package, utilizing Azure ML’s workflow allows for easy data substitution. Also, it’s great practice for future experiments! Here are our steps: * Use an Execute R Script Module to Load the ggplot2 library. Save the diamonds dataset to a variable and then output it to AzureML. * Identify categorical attributes and cast them into categorical features using the metadata editor module. These attributes were cast into categorical values: color, clarity, cut. * Here we use multiple Execute R Script modules, which will contain R code for our different ggplots. # R Graphs in Azure A great feature of the **Execute R Script** module is its ability to render R graphics. Any graph made in the module will automatically be outputted to the bottom right node labeled “R Device”. These can be saved to your computer by right clicking and selecting save as. # Faceted Colorful Scatter Plot There are a few ways to implement ggplot2 graphics. This tutorial will use the ggplot() + layer() approach as it gives the most flexibility, which is great for layering. There are three steps to creating a graph using this syntax. First, we create a plot with the ggplot() function. Since our layers use the same dataset, we included our diamond dataset inside ggplot(). * ggplot(data=diamonds) Next, we add features specific to the plot. Here, we partitioned our graphs by diamond cut using facet_wrap(). We could also specify scale, coordinate system, and color scheme, however the default settings are correct and look great in this case. * facet_wrap(~cut) The final step is adding our layer with the layer() function. We specified the variables and colors using mapping = aes(x=carat, y=price, color=color) and type of graph with geom = “point”. Since there are many overlapping data points, we used position_jitter() to spread them out and make the graph clearer. * mapping=aes(x=carat, y=price, color=color), * geom="point", * position=position_jitter() # Layering Now, we will add another layer to our existing plot. To do this, we simply need to call another layer() function with the new specifications. In this example, we chose to add a smoothing function with stat = “smooth” and change the color of the line to black with geom_params = list(color = “black”). * mapping=aes(x=carat, y=price), * stat="smooth", * geom_params=list(color="black") # Related 1. [Tutorial: Using R package ggplot2 in Azure ML. Histograms, density plots and violin plots](https://gallery.azureml.net/Details/b1c26728eb6c4e4d80dddceae992d653) 2. [Tutorial: Building a classification model in Azure ML](https://gallery.azureml.net/Details/01b2765fa75147ce99679e18482d280f) 3. [Tutorial: Creating a random forest regression model in R and using it for scoring](https://gallery.azureml.net/Details/b729c21014a34955b20fa94dc13390e5)