Big Data and Hadoop Spark Developer Training
- Offered bySimplilearn
- Private Institute
- Estd. 2010
Big Data and Hadoop Spark Developer Training at Simplilearn Overview
Duration | 35 hours |
Mode of learning | Online |
Difficulty level | Intermediate |
Credential | Certificate |
Future job roles | CRUD, .Net, CSR, Credit risk, Senior Software Developer |
Big Data and Hadoop Spark Developer Training at Simplilearn Highlights
- Audio-video Lectures along with Chapter-level Quizzes
- Aligned to Cloudera CCA175 certification exam
- A great course for learning Big Data
- Certification Course
Big Data and Hadoop Spark Developer Training at Simplilearn Course details
- This Big Data Hadoop Certification course is designed to give you an in-depth knowledge of the big data framework using Hadoop and Spark
- In this hands-on big data course, students will execute real-life, industry-based projects using Simplilearn's integrated labs
Big Data and Hadoop Spark Developer Training at Simplilearn Curriculum
Lesson 1 Course Introduction
Course Introduction
Accessing Practice Lab
Lesson 2 Introduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
Introduction to Big Data
Big Data Analytics
What is Big Data
Four Vs Of Big Data
Case Study: Royal Bank of Scotland
Challenges of Traditional System
Distributed Systems
Introduction to Hadoop
Components of Hadoop Ecosystem: Part One
Components of Hadoop Ecosystem: Part Two
Components of Hadoop Ecosystem: Part Three
Commercial Hadoop Distributions
Demo: Walkthrough of Simplilearn Cloudlab
Key Takeaways
Knowledge Check
Lesson 3 Hadoop Architecture,Distributed Storage (HDFS) and YARN
Hadoop Architecture Distributed Storage (HDFS) and YARN
What Is HDFS
Need for HDFS
Regular File System vs HDFS
Characteristics of HDFS
HDFS Architecture and Components
High Availability Cluster Implementations
HDFS Component File System Namespace
Data Block Split
Data Replication Topology
HDFS Command Line
Demo: Common HDFS Commands
HDFS Command Line
YARN Introduction
YARN Use Case
YARN and Its Architecture
Resource Manager
How Resource Manager Operates
Application Master03:29
How YARN Runs an Application
Tools for YARN Developers
Demo: Walkthrough of Cluster Part One
Demo: Walkthrough of Cluster Part Two
Key Takeaways
Knowledge Check
Hadoop Architecture, Distributed Storage (HDFS) and YARN
Lesson 4 Data Ingestion into Big Data Systems and ETL
Data Ingestion into Big Data Systems and ETL
Data Ingestion Overview Part One
Data Ingestion
Apache Sqoop
Sqoop and Its Uses
Sqoop Processing
Sqoop Import Process
Assisted Practice: Import into Sqoop
Sqoop Connectors
Demo: Importing and Exporting Data from MySQL to HDFS
Apache Sqoop
Apache Flume
Flume Model
Scalability in Flume
Components in Flume's Architecture
Configuring Flume Components
Demo: Ingest Twitter Data
Apache Kafka
Aggregating User Activity Using Kafka
Kafka Data Model
Partitions
Apache Kafka Architecture
Producer Side API Example
Consumer Side API
Demo: Setup Kafka Cluster
Consumer Side API Example
Kafka Connect
Key Takeaways
Demo: Creating Sample Kafka Data Pipeline using Producer and Consumer
Knowledge Check
Data Ingestion into Big Data Systems and ETL
Lesson 5 Distributed Processing - MapReduce Framework and Pig
Distributed Processing MapReduce Framework and Pig
Distributed Processing in MapReduce
Word Count Example
Map Execution Phases
Map Execution Distributed Two Node Environment
MapReduce Jobs
Hadoop MapReduce Job Work Interaction
Setting Up the Environment for MapReduce Development
Set of Classes
Creating a New Project
Advanced MapReduce
Data Types in Hadoop
OutputFormats in MapReduce
Using Distributed Cache
Joins in MapReduce
Replicated Join
Introduction to Pig
Components of Pig
Pig Data Model
Pig Interactive Modes
Pig Operations
Various Relations Performed by Developers
Demo: Analyzing Web Log Data Using MapReduce
Demo: Analyzing Sales Data and Solving KPIs using PIG
Apache Pig
Demo: Wordcount
Key takeaways
Knowledge Check
Distributed Processing - MapReduce Framework and Pig
Lesson 6 Apache Hive
Apache Hive
Hive SQL over Hadoop MapReduce
Hive Architecture
Interfaces to Run Hive Queries
Running Beeline from Command Line
Hive Metastore
Hive DDL and DML
Creating New Table
Data Types
Validation of Data
File Format Types
Data Serialization
Hive Table and Avro Schema
Hive Optimization Partitioning Bucketing and Sampling
Non Partitioned Table
Data Insertion
Dynamic Partitioning in Hive
Bucketing
What Do Buckets Do
Hive Analytics UDF and UDAF
Assisted Practice: Synchronization
Other Functions of Hive
Demo: Real-Time Analysis and Data Filteration
Demo: Real-World Problem
Demo: Data Representation and Import using Hive
Key Takeaways
Knowledge Check
Apache Hive