DataFlair
DataFlair Logo

Big Data and Hadoop Certification 

  • Offered byDataFlair

Big Data and Hadoop Certification
 at 
DataFlair 
Overview

Duration

40 hours

Total fee

Free

Mode of learning

Online

Official Website

Explore Free Course External Link Icon

Credential

Certificate

Big Data and Hadoop Certification
 at 
DataFlair 
Highlights

  • 40 Hrs of Instructor led session
  • 100+Hrs of Practicals and assignments
  • 5 Real time Big data projects
  • Industry reowned Big data certification
  • Life time access to course with support
  • Job oriented course with job assistance
  • Discussion Forum for queries and Interaction
  • Personalized Career discussion with trainer
Read more
Details Icon

Big Data and Hadoop Certification
 at 
DataFlair 
Course details

Who should do this course?
  • Software developers, project managers, and architects
  • BI, ETL iconBI, ETL and Data Warehousing professionals
  • Mainframe and Testing logoMainframe and testing professionals
  • Business analysts logoBusiness analysts and analytics professionals
  • DBAs and DB professionals
  • Data Science iconProfessionals willing to learn Data Science techniques
  • Big Data career logoAny graduate focusing to build a career in Apache Spark and Scala
What are the course deliverables?
  • Shape your career as Big Data shapes the IT World
  • Grasp concepts of HDFS and MapReduce
  • Become adept in the latest version of Apache Hadoop
  • Develop a complex game-changing MapReduce application
  • Perform data analysis using Pig and Hive
  • Play with the NoSQL database Apache HBase
  • Acquire an understanding of the ZooKeeper service
  • Load data using Apache Sqoop and Flume
  • Enforce best practices for Hadoop development and deployment
  • Master handling of large datasets using the Hadoop ecosystem
  • Work on live Big Data projects for hands-on experience
  • Comprehend other Big Data technologies like Apache Spark
More about this course
  • The Certified Big Data and Hadoop course by DataFlair is a perfect blend of in-depth theoretical knowledge and strong practical skills via implementation of real life projects to give a headstart and enable to bag top Big Data jobs in the industry.

Big Data and Hadoop Certification
 at 
DataFlair 
Curriculum

The big picture of Big Data

What is Big Data

Necessity of Big Data and Hadoop in the industry

Paradigm shift - why the industry is shifting to Big Data tools

Different dimensions of Big Data

Data explosion in the Big Data industry

Various implementations of Big Data

Different technologies to handle Big Data

Traditional systems and associated problems

Future of Big Data in the IT industry

Demystifying Hadoop

Why Hadoop is at the heart of every Big Data solution

Introduction to the Big Data Hadoop framework

Hadoop architecture and design principles

Ingredients of Hadoop

Hadoop characteristics and data-flow

Components of the Hadoop ecosystem

Hadoop Flavors – Apache, Cloudera, Hortonworks, and more

Setup and Installation of Hadoop

SETUP AND INSTALLATION OF SINGLE-NODE HADOOP CLUSTER

SETUP AND INSTALLATION OF HADOOP MULTI-NODE CLUSTER

HDFS – The Storage Layer

What is HDFS (Hadoop Distributed File System)

HDFS daemons and architecture

HDFS data flow and storage mechanism

Hadoop HDFS characteristics and design principles

Responsibility of HDFS Master – NameNode

Storage mechanism of Hadoop meta-data

Work of HDFS Slaves – DataNodes

Data Blocks and distributed storage

Replication of blocks, reliability, and high availability

Rack-awareness, scalability, and other features

Different HDFS APIs and terminologies

Commissioning of nodes and addition of more nodes

Expanding clusters in real-time

Hadoop HDFS Web UI and HDFS explorer

HDFS best practices and hardware discussion

A Deep Dive into MapReduce

What is MapReduce, the processing layer of Hadoop

The need for a distributed processing framework

Issues before MapReduce and its evolution

List processing concepts

Components of MapReduce – Mapper and Reducer

MapReduce terminologies- keys, values, lists, and more

Hadoop MapReduce execution flow

Mapping and reducing data based on keys

MapReduce word-count example to understand the flow

Execution of Map and Reduce together

Controlling the flow of mappers and reducers

Optimization of MapReduce Jobs

Fault-tolerance and data locality

Working with map-only jobs

Introduction to Combiners in MapReduce

How MR jobs can be optimized using combiner

MapReduce - Advanced Concepts

Anatomy of MapReduce

Hadoop MapReduce data types

Developing custom data types using Writable & WritableComparable

InputFormats in MapReduce

InputSplit as a unit of work

How Partitioners partition data

Customization of RecordReader

Moving data from mapper to reducer – shuffling & sorting

Distributed cache and job chaining

Different Hadoop case-studies to customize each component

Job scheduling in MapReduce

Hive – Data Analysis Tool

The need for an adhoc SQL based solution – Apache Hive

Introduction to and architecture of Hadoop Hive

Playing with the Hive shell and running HQL queries

Hive DDL and DML operations

Hive execution flow

Schema design and other Hive operations

Schema-on-Read vs Schema-on-Write in Hive

Meta-store management and the need for RDBMS

Limitations of the default meta-store

Using SerDe to handle different types of data

Optimization of performance using partitioning

Different Hive applications and use cases

Pig – Data Analysis Tool

The need for a high level query language - Apache Pig

How Pig complements Hadoop with a scripting language

What is Pig

Pig execution flow

Different Pig operations like filter and join

Compilation of Pig code into MapReduce

Comparison - Pig vs MapReduce

NoSQL Database - HBase

NoSQL databases and their need in the industry

Introduction to Apache HBase

Internals of the HBase architecture

The HBase Master and Slave Model

Column-oriented, 3-dimensional, schema-less datastores

Data modeling in Hadoop HBase

Storing multiple versions of data

Data high-availability and reliability

Comparison - HBase vs HDFS

Comparison - HBase vs RDBMS

Data access mechanisms

Work with HBase using the shell

Data Collection using Sqoop

The need for Apache Sqoop

Introduction and working of Sqoop

Importing data from RDBMS to HDFS

Exporting data to RDBMS from HDFS

Conversion of data import/export queries into MapReduce jobs

Data Collection using Flume

What is Apache Flume

Flume architecture and aggregation flow

Understanding Flume components like data Sources and Sinks

Flume channels to buffer events

Reliable & scalable data collection tools

Aggregating streams using Fan-in

Separating streams using Fan-out

Internals of the agent architecture

Production architecture of Flume

Collecting data from different sources to Hadoop HDFS

Multi-tier Flume flow for collection of volumes of data using AVRO

Apache YARN & advanced concepts in the latest version

The need for and the evolution of YARN

YARN and its eco-system

YARN daemon architecture

Master of YARN – Resource Manager

Slave of YARN – Node Manager

Requesting resources from the application master

Dynamic slots (containers)

Application execution flow

MapReduce version 2 application over Yarn

Hadoop Federation and Namenode HA

Processing data with Apache Spark

Introduction to Apache Spark

Comparison - Hadoop MapReduce vs Apache Spark

Spark key features

RDD and various RDD operations

RDD abstraction, interfacing, and creation of RDDs

Fault Tolerance in Spark

The Spark Programming Model

Data flow in Spark

The Spark Ecosystem, Hadoop compatibility, & integration

Installation & configuration of Spark

Processing Big Data using Spark

Real-Life Project on Big Data

Other courses offered by DataFlair

Free
10 hours
– / –
Free
20 hours
– / –
View Other 1 CoursesRight Arrow Icon
qna

Big Data and Hadoop Certification
 at 
DataFlair 

Student Forum

chatAnything you would want to ask experts?
Write here...