The Microsoft Data Science Team invites you to an in-depth 3-day workshop on using Microsoft R. Location: München, Germany. December 19 - December 21, 2016.
# About the Course The course is designed to help analysts integrate Microsoft R Server (MRS) into their data science toolbox, and integrate it with other tools in Azure and the Cortana Intelligence Suite. After completion, participants will be able to: * Explore and visualize data with R * Manipulate data that is too large to fit into memory with MRS * Train and test statistical models with high performance parallel external memory algorithms * Access data stored in Azure Blob Storage using Microsoft R Server (MRS) * Deploy Models as AzureML web services In these sessions, you’ll gain hands-on experience with conducting scalable data analysis with Microsoft R Server. You will learn the fundamentals of R, and understand how Microsoft R Server addresses the major scalability and operationalization challenges associated with open source R. # Prerequisites There are a few things you will need in order to properly follow the course materials: * There are a few things you will need in order to take full advantage of the course: * An Azure subscription * A terminal emulator with openSSH or bash, e.g., Putty, or Cygwin/MobaXterm * I use MobaXterm and the Ubuntu Bash Subsystem within Windows * Some R IDE. Some reasonable choices: * RStudio * Visual Studio 2015 with RTVS (Community Edition is sufficient) * Jupyter/JupyterLab with IRKernel * Microsoft R Server 8.0.3 or later * Installation instructions * I will assume you have already taken the following courses, or have the background provided by these courses: * Implementing Real-Time Analytics with Hadoop in Azure HDInsight (or general knowledge of the Apache Hadoop ecosystem and HDInsight) * Data Science and Machine Learning Essentials (or general understanding of machine learning and predictive modeling) * Introduction to R for Data Science * Intermediate knowledge of R would be ideal, at the level of either of the following two courses: * Programming with R for Data Science * R for SAS Users Course * I will not assume any background knowledge about Microsoft R Server, but for those that are eager, you can find an online video series about MRS here: * Course Website * Video Lectures on Channel 9 * Lab Exercises * A useful overview and comparison of MRS and MRO is available here. # Modules The course is divided into the following modules: 1. Each Training Module guides you through a logical progression with hands-on tasks in do-verb form. Each day is broken up into 1-4 hour Modules, where you will learn and perform labs on your own. Some material that is out of scope for hands-on labs will instead be demonstrated by instructor led labs. Participants will receive a copy of the lab material to try on their own, but are not required to run the analysis during the training time. The modules, broken up into a general agenda are as follows. The specific modules may bleed across sessions depending on engagement of the audience 2. Part I - Functional-Object Based Computing with R 3. Day One - Morning Session 4. Overview of the R Project and CRAN 5. Exploring the Microsoft R Data Stack 6. Functional Programming for Data Manipulation with the dplyr package 7. Day One - Afternoon Session 8. Understanding dplyr's symantics and the magrittr pipe 9. Data Visualization and Exploratory Data Analysis 10. Using the broom package for Modeling and Summarization 11. Part II - Breaking the Memory Barrier with RevoScaleR 12. Day Two - Morning Sesion 13. Overview of the Microsoft R Data Ecosystem 14. Modeling and Scoring with High-Performance ScaleR Algorithms 15. Data Manipulation with the dplyrXdf Package 16. Day Two - Afternoon Session 17. Summarizing Data with RevoScaleR 18. Performance Considerations with RevoScaleR 19. Parallel Computing and Disributed Computing with Microsoft R Server 20. Deploying R and ScaleR algorithms to Azure with the AzureML package 21. Part III - Microsoft R Server with Spark 22. Day 3 – Morning Session 23. Overview of the Apache Spark Project 24. Ingesting Data into Azure Blob Storage 25. Creating Spark DataFrames and Spark Contexts 26. Manipulating HDFS data with the sparklyr package 27. Day 3 – Afternoon Session 28. Creating Distributed eXternal DataFrames in HDFS 29. Preparing Data for Modeling with Microsoft R Server 30. Training Statistical Models with Microsoft R Server and the Spark Compute Context 31. Scoring and Deploying Models 32. Performance Considerations on Hadoop # Concepts Covered * Functional-Object Based Computing with R * Breaking the Memory Barrier with RevoScaleR * Microsoft R Server with Spark # Technologies Covered * Microsoft R Server