This four-day developer training course delivers the key concepts and expertise participants need to create robust data processing applications using Apache Hadoop.
Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the Hadoop ecosystem, learning topics such as:
- The internals of MapReduce and HDFS and how to write MapReduce code
- Best practices for Hadoop development, debugging, and implementation of workflows and common algorithms
- How to leverage Hive, Pig, Sqoop, Flume, Oozie, Mahout, and other Hadoop ecosystem projects
- Optimal hardware configurations and network considerations for integrating a Hadoop cluster with the data center
- Writing and executing joins to link data sets in MapReduce
- Advanced Hadoop API topics required for real-world data analysis
Who Should Attend-
– Data Scientists
– Data Analysts
– For people who want to learn big data and Hadoop
Prerequisite
Course Content
Day 1
Module 1 Hadoop Introduction
- Why we need Hadoop
- Why Hadoop is in demand in market now a days
- Where expensive SQL based tools are failing
- Key points , Why Hadoop is leading tool in current It Industry Definition of BigData
- Hadoop nodes
- Introduction to Hadoop Release-1
- Hadoop Daemons in Hadoop Release-1
- Introduction to Hadoop Release-2
- Hadoop Daemons in Hadoop Release-2
- Hadoop Cluster and Racks
- Hadoop Cluster Demo
- New projects on Hadoop
- How Open Source tools is capable to run jobs in lesser time Hadoop Storage – HDFS (Hadoop Distributed file system) Hadoop Processing Framework (Map Reduce / YARN) Alternates of Map Reduce
- Why NOSQL is in much demand instead of SQL
- Distributed warehouse for HDFS
- Hadoop Ecosystem and its usages
- Data import/Export tools
Module 2 : Hadoop Installation and Hands-on on Hadoop machine
- Hadoop installation
- Introduction to Hadoop FS and Processing Environment’s UIs How to read and write files
- Basic Unix commands for Hadoop
- Hadoop FS shell
- Hadoop releases practical
- Hadoop daemons practical
Day 2
Module 3: ETL Tool (Pig) Introduction Level-1
- Pig Introduction
- Why Pig if Map Reduce is there?
- How Pig is different from Programming languages Pig Data flow Introduction
- How Schema is optional in Pig
- Pig Data types
- Pig Commands – Load, Store , Describe , Dump Map Reduce job started by Pig Commands
- Execution plan
Module 4 :ETL Tool (Pig) Level-2
- Pig- UDFs
- Pig Use cases
- Pig Assignment
- Complex Use cases on Pig
- Real time scenarios on Pig
- When we should use Pig
- When we shouldn’t use Pig
Day 3
Module 5: Hive Warehouse
- Hive Introduction
- Meta storage and meta store
- Introduction to Derby Database
- Hive Data types
- HQL
- DDL, DML and sub languages of Hive
- Internal , external and Temp tables in Hive
- Differentiation between SQL based Datawarehouse and Hive
Module 6 : Hive Level-2
- Hive releases
- Why Hive is not best solution for OLTP OLAP in Hive
- Partitioning
- Bucketing
- Hive Architecture
- Thrift Server
- Hue Interface for Hive
- How to analyze data using Hive script Differentiation between Hive and Impala UDFs in Hive
- Complex Use cases in Hive
- Hive Advanced Assignment
Module 7: Introduction to Map Reduce
- How Map Reduce works as Processing Framework End to End execution flow of Map Reduce job Different tasks in Map Reduce job
- Why Reducer is optional while Mapper is mandatory? Introduction to Combiner
- Introduction to Partitioner
- Programming languages for Map Reduce
- Why Java is preferred for Map Reduce programming
Module 8 : NOSQL Databases and Introduction to HBase
- Introduction to NOSQL
- Why NOSQL if SQL is in market since several years
- Databases in market based on NOSQL
- CAP Theorem
- ACID Vs. CAP
- OLTP Solutions with different capabilities
- Which Nosql based solution is capable to handle specific requirements Examples of companies that uses NOSQL based databases
- HBase Architecture of column families
Day 4
Module 9: Zookeeper and SQOOP
- Introduction to Zookeeper
- How Zookeeper helps in Hadoop Ecosystem
- How to load data from Relational storage in Hadoop Sqoop basics
- Sqoop practical implementation
- Sqoop alternative
- Sqoop connector
Module 9 : Flume , Oozie and YARN
- How to load data streaming data without fixe schema
- How to load unstructured and semi structured data in Hadoop Introduction to Flume
- Hands-on on Flume
- How to load Twitter data in HDFS using Hadoop
- Introduction to Oozie
- How to schedule jobs using Oozie
- What kind of jobs can be scheduled using Oozie
- How to schedule jobs which are time based
- Hadoop releases
- From where to get Hadoop and other components to install
- Introduction to YARN
- Significance of YARN
Module 10 : Apache Spark Basics
- Introduction to Spark
- Basics Features of SPARK and Scala available in Hue Why Spark demand is increasing in market
- How can we use Spark with Hadoop Eco System Datasets for practice purpose
Module 11 : Emerging Trends of Big Data
- YARN
- Emerging Technologies of Big Data
- Emerging use cases e.g IoT, Industrial Internet, New Applications
- Certifications and
- Job Opportunities