This course will be of primary interest to cluster administrators, but also to developers, architects and even data scientists, eager to ensure their applications are able to gleam the maximum performance and security out of their clusters.
# About the Course Hadoop has proved itself to be a scalable solution for the enterprise, providing a large ecosystem of advanced analytics and big data tools in a unified framework. However, managing this diverse ecosystem and ensuring that it's users are able to obtain maximum performance out of it's clusters is a difficult task. This course will be of primary interest to cluster administrators, but also to developers, architects and even data scientists, eager to ensure their applications are able to gleam the maximum performance and security out of their clusters. We will teach administrators the fundamentals of HDInsight's design and architecture and how to ensure their clusters are secure and meet the requirements of it's users. We will discuss configuration, administration, command line tools for debugging, and tips on achieving maximum performance in a variety of common Hadoop big data and advanced analytics workflows, particularly, Spark and Hive. By the end of the course, participants will have a solid understanding of the behind-the-scenes adminstration mechanisms in Hadoop using Hadoop configuration files, and will know how to secure their clusters, enable and manage unique application workloads, and set groups and permissions for users and applications. # Skills Taught At the end of the course you will have acquired the following skills: - Administrating Hadoop Clusters - Managing HDInsight Clusters - Configuring Clusters - Monitoring and Troubleshooting HDInsight Clusters - Optimizing and Tuning Hadoop Applications in HDInsight # Agenda Day One - HDInsight and Hadoop Fundamentals - HDInsight Cluster Options - Programmatic Provisioning of HDInsight Hadoop Clusters - Moving Data in and Out of Azure Storage - Management and Configuration for Hadoop Services and HTTP Web Services Day Two - Overview Apache Ambari - Manage Hadoop Applications with YARN CLI - Managing Ambari Users and Groups - Securing Hadoop with Apache Ranger - Manage Alert Groups with Ambari - Manage and Configure Storage - Resource Allocation and Configuration - Manage and Configure Queues, Capacity Scheduler, and Node Access Day Three - Create and Modify YARN Node Labels Using Ambari - Define YARN Containers - Monitoring Hadoop Jobs and Applications - Optimizing Hadoop Jobs and Troubleshooting - Tuning Hive Jobs in Hadoop - Spark Job Execution, Performance Tuning, Tracking and Debugging (time permitting) # Technologies Covered - HDInsight (Hadoop & Spark) - Hadoop - YARN - HDInsight - Hive - Spark # Materials - https://github.com/Azure/HDI-Admin-Config-Security