Overview
Audience:
This course is intended to demystify big data/hadoop technology and to show it is not difficult to understand.
Requirements
- Basic knowledge of Linux FS
- Basic Java
- Knowledge of Apache Hadoop (recommended)
Course Outline
Big Data Overview:
- What is Big Data
- Why Big Data is gaining popularity
- Big Data Case Studies
- Big Data Characteristics
- Solutions to work on Big Data.
Hadoop & Its components:
- What is Hadoop and what are its components.
- Hadoop Architecture and its characteristics of Data it can handle /Process.
- Brief on Hadoop History, companies using it and why they have started using it.
- Hadoop Frame work & its components- explained in detail.
- What is HDFS and Reads -Writes to Hadoop Distributed File System.
- How to Setup Hadoop Cluster in different modes- Stand- alone/Pseudo/Multi Node cluster.
(This includes setting up a Hadoop cluster in VirtualBox/KVM/VMware, Network configurations that need to be carefully looked into, running Hadoop Daemons and testing the cluster).
- What is Map Reduce frame work and how it works.
- Running Map Reduce jobs on Hadoop cluster.
- Understanding Replication , Mirroring and Rack awareness in context of Hadoop clusters.
Hadoop Cluster Planning:
- How to plan your hadoop cluster.
- Understanding hardware-software to plan your hadoop cluster.
- Understanding workloads and planning cluster to avoid failures and perform optimum.
What is MapR and why MapR :
- Overview of MapR and its architecture.
- Understanding & working of MapR Control System, MapR Volumes , snapshots & Mirrors.
- Planning a cluster in context of MapR.
- Comparison of MapR with other distributions and Apache Hadoop.
- MapR installation and cluster deployment.
Cluster Setup & Administration:
- Managing services, nodes ,snapshots, mirror volumes and remote clusters.
- Understanding and managing Nodes.
- Understanding of Hadoop components, Installing Hadoop components alongside MapR Services.
- Accessing Data on cluster including via NFS Managing services & nodes.
- Managing data by using volumes, managing users and groups, managing & assigning roles to nodes, commissioning decommissioning of nodes, cluster administration and performance monitoring, configuring/ analyzing and monitoring metrics to monitor performance, configuring and administering MapR security.
- Understanding and working with M7- Native storage for MapR tables.
- Cluster configuration and tuning for optimum performance.
Cluster upgrade and integration with other setups:
- Upgrading software version of MapR and types of upgrade.
- Configuring Mapr cluster to access HDFS cluster.
- Setting up MapR cluster on Amazon Elastic Mapreduce.
All the above topics include Demonstrations and practice sessions for learners to have hands on experience of the technology.