In session three of this four-part series, Matt Parker introduces Microsoft R Server's key functions for the standard data management tasks: creating and modifying variables, sorting, subsetting, deduplication, and merging datasets. Essential viewing for understanding how MRS operates on chunks of data, and how to write R code that works even when only part of the data is visible at any given time.
#About the Course Get a quick start with open-source R and its powerful extension for big data, Microsoft R Server. In the first four-hour session, participants will learn the essential R workflow for importing and cleaning data, exploring and visualizing variables, and building predictive models. In the next session, participants will learn how to build the same workflow using the parallel processing and big-data capacity of Microsoft R Server. #Prerequisites There are a few things you will need in order to properly follow the course materials: * A laptop * Basic programming experience, in any language * A basic understanding of statistics and the data analysis process #Modules The course is divided into the following modules: 1. Introduction to R and Microsoft R Server 2. Hands-on R tutorial #Concepts Covered 1. Strengths and weaknesses of open-source R 2. How Microsoft R extends open-source R for big data 3. How to read data into open-source R 4. How to read data into the Microsoft R XDF file format 5. How to clean data with open-source R functions 6. How to clean data with MRS functions 7. How to build predictive models with open-source R 8. How to build predictive models with MRS #Technologies Covered 1. Open-source R 2. Microsoft R Server #Skills Taught At the end of the course you will havev acquired the following skills: 1. Identify the strengths and weaknesses of open-source R. 2. Explain how Microsoft R Server extends R's capabilities. 3. Load data into R (in-memory) and into a Microsoft R XDF file. 4. Clean data with R and MRS functions. 5. Build a simple predictive model using R's lm() function and MRS' rxLinMod(), and understand the differences in the returned model objects.