Big Data Hadoop

Big Data Hadoop

Courses Info

Course Highlights :-

  • All the training would be provided by Industry Experts who already works on Big Data Hadoop platform.
  • Backup Class in case you miss any session.
  • Theory + Practical Training along with case studies in order to get better understanding of concepts.
  • Complete course material with no extra cost.
  • Free doubt clearing session after completion of the training.
  • Resume building by experts.
  • Feedback form filled by candidates after every class in order to maintain highest level of quality standards.

 

  • Introduction to Hadoop and Big-data
  • Introduction to Big-data
  • Introduction to Hadoop
  • Business problems / Challenges with Big data
  • Scenarios where Hadoop is used
  • Overview of batch Processing and real-time data analytics using Hadoop
  • Hadoop vendors – Apache, Cloudera, Hortonworks
  • Hadoop versions – Hadoop 1.x and Hadoop 2.x
  • Hadoop services – HDFS, MapReduce, YARN
  • Introduction to Hadoop ecosystem components (Hive, HBase, Pig, Sqoop, Flume, Zookeeper, Oozie, Kafka, Spark)

 

  • Cluster setup (Hadoop 1.x)
  • Linux VM installation on system for Hadoop cluster using Oracle Virtual Box
  • Preparing nodes for Hadoop and VM settings
  • Install Java and configure passwordless SSH across nodes
  • Basic Linux commands
  • Hadoop 1.x Single node deployment
  • Hadoop Daemons – NameNode, JobTracker, DataNode, TaskTracker, Secondary NameNode
  • Hadoop configuration files and running
  • Important Web URls and Logs for Hadoop
  • Run HDFS and Linux commands
  • Hadoop 1.x mutli-mode deployment
  • Run sample jobs in Hadoop single and multi-node clusters

 

  • HDFS Concepts
  • HDFS Design Goals
  • Understand Blocks and how to configure block size
  • Block replication and replication factor

 

  • MapReduce Concepts
  • Introduction to MapReduce
  • MapReduce Architecture
  • Understanding the concept of Mappers & Reducers
  • Anatomy of MapReduce Program
  • Phases of a MapReduce program
  • Data-types in Hadoop MapReduce
  • Driver, Mapper and Reducer classes
  • InputSplit and RecordReader
  • InputFormat and OutputFormat in Hadoop
  • Concepts of Combiner and Partitioner
  • Running and Monitoring MapReduce jobs
  • Writing your own MapReduce job using MapReduce API

 

  • Cluster setup (Hadoop 2.x)
  • Hadoop 1.x Limitations
  • Design Goals for HAdoop 2.x
  • Introduction to Hadoop 2.x
  • Introduction to YARN
  • Components of YARN – ResourceManager, NodeManager, ApplicationMaster
  • Deprecated properties
  • Hadoop 2.x Single node deployment
  • Hadoop 2.x mutli-mode deployment

 

  • HDFS High Availability and Federation
  • Introduction to HDFS Federation
  • Understand Nameservice ID and block pools
  • Introduction to HDFS High Availability
  • Failover mechanisms in Hadoop 1.x
  • Concept of Active and StandBy NameNode
  • Configuring Journal Nodes and avoiding split brain scenario
  • Automatic and manual fail-over techniques in HA using Zookeeper and ZKFC
  • HDFS HAadmin commands

 

  • YARN – Yet Another Resource Negotiator
  • YARN Architecture
  • YARN Components – ResourceManager, NodeManager, JobHistoryServer, Application TimelineServer, MRApplicationMaster
  • YARN Application execution flow
  • Running and Monitoring YARN Applications
  • Understand and configure Capacity/Fair Schedulers in YARN
  • Define and configure Queues
  • JobHistory Server / Application Timeline server
  • YARN REST API
  • Writing and executing YARN applications

 

  • Apache Zookeeper
  • Introduction to Apache Zookeeper
  • Zookeeper stand-alone installation
  • Zookeeper clustered installation
  • Understand Znode and Ephemeral nodes
  • Manage Znodes using Java API
  • Zookeeper four letter word commands

 

  • Apache Hive
  • Introduction to Hive
  • Hvie Architecture
  • Components – Metastore, HiveServer2, Beeline, HiveCli, Hive WebInterface
  • Installation and configuration
  • Metastore service
  • DDLs and DMLs
  • SQL – Select, Filter, Join, Group By
  • Hive Partitions and buckets in Hive
  • Hive User Defined Funcitons
  • Introduction to HCatalog
  • Install and configure HCatalog services

 

  • Apache Pig
  • Introduction to Pig
  • Pig installation
  • Accessing Pig Grunt shell
  • Pig Data Types
  • Pig commands
  • Pig Relational Operators
  • Pig User Defined Funcitons
  • Configure Pig to use HCatalog

 

  • Apache Sqoop
  • Introduction to Sqoop
  • Sqoop Architecture and Installation
  • Import data using Sqoop in HDFS
  • Import all tables in Sqoop
  • Import tables directly in Hive
  • Export data from HDFS

 

  • Apache Flume
  • Introduction to Flume
  • Flume Architecture and Installation
  • Define Flume agent – Sink, Source and Channel
  • Flume Use Cases

 

  • Apache Oozie
  • Introduction to Oozie
  • Oozie Architecture
  • Oozie server installation and configurations
  • Design Workflows, Coordinator Jobs, Bundle Jobs in Oozie

 

  • Apache HBase
  • Introduction to HBase
  • HBase Architecture
  • HBase components — HBase Master and RegionServers
  • HBase installation and configurations
  • Create sample tables and queries on HBase

 

  • Apache Spark / Storm / Kafka
  • Real-time data Analytics
  • Introduction to Spark / Storm / Kafka

 

  • Cluster Monitoring and Management tools
  • Cloudera Manager
  • Apache Ambari
  • Ganglia
  • JMX monitoring and Jconsole
  • Hadoop User Experience (HUE)