This course takes a use-case-based approach to walk through the knowledge discovery and data mining process using MRS in a local compute context (on a single server). Location: Banglore, India. December 1 - December 2, 2016.
#About the Course Microsoft R Server (MRS) for Analysts is designed to help analyst’s familiar with other environments migrate their knowledge of data preparation and analysis to Microsoft R Server. This course takes a use-case-based approach to walk through the knowledge discovery and data mining process using MRS in a local compute context (on a single server). This course assumes sufficient knowledge of fundamental concepts in R (as laid out by the course prerequisites), and it allows an experienced analyst (and intermediate R user) to transition to using MRS’s set of tools and capabilities for scalable big data-processing and analytics.. #Prerequisites There are a few things you will need in order to properly follow the course materials: * Solid understanding of R data structures (vectors, matrices, lists, data frames, environments): for example, students should confidently tell the difference between a list and a data frame, or what each object is generally a good representation for and how to subset it. * Understanding of how to write R functions: for example, students should be able to write functions that process data in bulk (multiple columns), be able to debug functions, know how R deals with variables that are out of scope, or how to use the ellipsis to pass arguments. * Good understanding of data manipulation and data processing in R: students should be familiar with functions such as merge, transform, subset, cbind, rbind, lapply, apply and how these functions can be used to work with a data.frame; moreover, familiarity with 3rd party packages such as dplyr is also helpful * Good understanding of control flow and other basic programming concepts: students should know for example what loops are, and how we can use the apply family of functions to rewrite loops, be familiar with functions such as do.call, assign, etc # Modules The course is divided into the following modules: 1. Introduction 2. Business Case discussion 3. MRS Overview 4. Load large dataset for analysis by MRS 5. Understanding how to choose between CSV vs XDF 6. Importing data into MRS 7. Cleaning and preparing data for analysis using MRS 8. Basic data transformations (cleaning missing values, normalizing, rescaling) 9. Passing custom transformation functions to MRS to leverage existing R code 10. Visualize, explore, and summarize data using MRS 11. Summarizing numeric data (five-point summary, correlations, histograms and line plots) and categorical data (cross-tabulations and barplots) 13. Benchmarking performance on different data types (XDF vs CSV) 14. Estimate models using MRS 15. Linear and Generalized Linear models 16. Model tuning and cross-validation 17. k-means clustering 18. Deploying 19. Model scoring # Concepts Covered * Read from and write to large files using MRS (both flat files such as CSV and MRS’s XDF distributed data format). * Prepare data for analysis * Visualize, explore, and summarize data * Estimate and tune basic statistical models * Deploy models through scoring functions # Technologies Covered * Microsoft R Server # Skills Taught At the end of the course you will havev acquired the following skills: * Read from and write to large files using MRS (both flat files such as CSV and MRS’s XDF distributed data format) * Prepare data for analysis * Visualize, explore, and summarize data * Estimate and tune basic statistical models * Deploy models through scoring functions