Big Data and Hadoop Certification
- Offered byDataFlair
Big Data and Hadoop Certification at DataFlair Overview
Duration | 40 hours |
Total fee | Free |
Mode of learning | Online |
Official Website | Explore Free Course |
Credential | Certificate |
Big Data and Hadoop Certification at DataFlair Highlights
- 40 Hrs of Instructor led session
- 100+Hrs of Practicals and assignments
- 5 Real time Big data projects
- Industry reowned Big data certification
- Life time access to course with support
- Job oriented course with job assistance
- Discussion Forum for queries and Interaction
- Personalized Career discussion with trainer
Big Data and Hadoop Certification at DataFlair Course details
- Software developers, project managers, and architects
- BI, ETL iconBI, ETL and Data Warehousing professionals
- Mainframe and Testing logoMainframe and testing professionals
- Business analysts logoBusiness analysts and analytics professionals
- DBAs and DB professionals
- Data Science iconProfessionals willing to learn Data Science techniques
- Big Data career logoAny graduate focusing to build a career in Apache Spark and Scala
- Shape your career as Big Data shapes the IT World
- Grasp concepts of HDFS and MapReduce
- Become adept in the latest version of Apache Hadoop
- Develop a complex game-changing MapReduce application
- Perform data analysis using Pig and Hive
- Play with the NoSQL database Apache HBase
- Acquire an understanding of the ZooKeeper service
- Load data using Apache Sqoop and Flume
- Enforce best practices for Hadoop development and deployment
- Master handling of large datasets using the Hadoop ecosystem
- Work on live Big Data projects for hands-on experience
- Comprehend other Big Data technologies like Apache Spark
- The Certified Big Data and Hadoop course by DataFlair is a perfect blend of in-depth theoretical knowledge and strong practical skills via implementation of real life projects to give a headstart and enable to bag top Big Data jobs in the industry.
Big Data and Hadoop Certification at DataFlair Curriculum
The big picture of Big Data
What is Big Data
Necessity of Big Data and Hadoop in the industry
Paradigm shift - why the industry is shifting to Big Data tools
Different dimensions of Big Data
Data explosion in the Big Data industry
Various implementations of Big Data
Different technologies to handle Big Data
Traditional systems and associated problems
Future of Big Data in the IT industry
Demystifying Hadoop
Why Hadoop is at the heart of every Big Data solution
Introduction to the Big Data Hadoop framework
Hadoop architecture and design principles
Ingredients of Hadoop
Hadoop characteristics and data-flow
Components of the Hadoop ecosystem
Hadoop Flavors – Apache, Cloudera, Hortonworks, and more
Setup and Installation of Hadoop
SETUP AND INSTALLATION OF SINGLE-NODE HADOOP CLUSTER
SETUP AND INSTALLATION OF HADOOP MULTI-NODE CLUSTER
HDFS – The Storage Layer
What is HDFS (Hadoop Distributed File System)
HDFS daemons and architecture
HDFS data flow and storage mechanism
Hadoop HDFS characteristics and design principles
Responsibility of HDFS Master – NameNode
Storage mechanism of Hadoop meta-data
Work of HDFS Slaves – DataNodes
Data Blocks and distributed storage
Replication of blocks, reliability, and high availability
Rack-awareness, scalability, and other features
Different HDFS APIs and terminologies
Commissioning of nodes and addition of more nodes
Expanding clusters in real-time
Hadoop HDFS Web UI and HDFS explorer
HDFS best practices and hardware discussion
A Deep Dive into MapReduce
What is MapReduce, the processing layer of Hadoop
The need for a distributed processing framework
Issues before MapReduce and its evolution
List processing concepts
Components of MapReduce – Mapper and Reducer
MapReduce terminologies- keys, values, lists, and more
Hadoop MapReduce execution flow
Mapping and reducing data based on keys
MapReduce word-count example to understand the flow
Execution of Map and Reduce together
Controlling the flow of mappers and reducers
Optimization of MapReduce Jobs
Fault-tolerance and data locality
Working with map-only jobs
Introduction to Combiners in MapReduce
How MR jobs can be optimized using combiner
MapReduce - Advanced Concepts
Anatomy of MapReduce
Hadoop MapReduce data types
Developing custom data types using Writable & WritableComparable
InputFormats in MapReduce
InputSplit as a unit of work
How Partitioners partition data
Customization of RecordReader
Moving data from mapper to reducer – shuffling & sorting
Distributed cache and job chaining
Different Hadoop case-studies to customize each component
Job scheduling in MapReduce
Hive – Data Analysis Tool
The need for an adhoc SQL based solution – Apache Hive
Introduction to and architecture of Hadoop Hive
Playing with the Hive shell and running HQL queries
Hive DDL and DML operations
Hive execution flow
Schema design and other Hive operations
Schema-on-Read vs Schema-on-Write in Hive
Meta-store management and the need for RDBMS
Limitations of the default meta-store
Using SerDe to handle different types of data
Optimization of performance using partitioning
Different Hive applications and use cases
Pig – Data Analysis Tool
The need for a high level query language - Apache Pig
How Pig complements Hadoop with a scripting language
What is Pig
Pig execution flow
Different Pig operations like filter and join
Compilation of Pig code into MapReduce
Comparison - Pig vs MapReduce
NoSQL Database - HBase
NoSQL databases and their need in the industry
Introduction to Apache HBase
Internals of the HBase architecture
The HBase Master and Slave Model
Column-oriented, 3-dimensional, schema-less datastores
Data modeling in Hadoop HBase
Storing multiple versions of data
Data high-availability and reliability
Comparison - HBase vs HDFS
Comparison - HBase vs RDBMS
Data access mechanisms
Work with HBase using the shell
Data Collection using Sqoop
The need for Apache Sqoop
Introduction and working of Sqoop
Importing data from RDBMS to HDFS
Exporting data to RDBMS from HDFS
Conversion of data import/export queries into MapReduce jobs
Data Collection using Flume
What is Apache Flume
Flume architecture and aggregation flow
Understanding Flume components like data Sources and Sinks
Flume channels to buffer events
Reliable & scalable data collection tools
Aggregating streams using Fan-in
Separating streams using Fan-out
Internals of the agent architecture
Production architecture of Flume
Collecting data from different sources to Hadoop HDFS
Multi-tier Flume flow for collection of volumes of data using AVRO
Apache YARN & advanced concepts in the latest version
The need for and the evolution of YARN
YARN and its eco-system
YARN daemon architecture
Master of YARN – Resource Manager
Slave of YARN – Node Manager
Requesting resources from the application master
Dynamic slots (containers)
Application execution flow
MapReduce version 2 application over Yarn
Hadoop Federation and Namenode HA
Processing data with Apache Spark
Introduction to Apache Spark
Comparison - Hadoop MapReduce vs Apache Spark
Spark key features
RDD and various RDD operations
RDD abstraction, interfacing, and creation of RDDs
Fault Tolerance in Spark
The Spark Programming Model
Data flow in Spark
The Spark Ecosystem, Hadoop compatibility, & integration
Installation & configuration of Spark
Processing Big Data using Spark
Real-Life Project on Big Data