Big Data Hadoop and Spark Developer

4.4 /5

(5 Ratings)

Offered bySimplilearn
Private Institute
Estd. 2010

Big Data Hadoop and Spark Developer
at
Simplilearn
Overview

Big Data Hadoop and Spark Developer is most-sought after skill today. As per Hadoop data analysts there were 1.9 million jobs in the US in 2021

Duration	11 hours
Total fee	Free
Mode of learning	Online
Difficulty level	Intermediate
Official Website	Explore Free Course
Credential	Certificate

Big Data Hadoop and Spark Developer
at
Simplilearn
Highlights

Gain a potential to earn 3.4L- 30L Per Annum
Earn a certificate and avail a 90 Days of access to this free course by Simplilearn
Big Data Hadoop and Spark Developer professionals are hired by companies like Amazon, Accenture, Linkedin, Cognizant, IBM

Big Data Hadoop and Spark Developer
at
Simplilearn
Course details

Who should do this course?

For IT Professionals, BI Professionals, Analytics Professionals, Software Developers, Senior IT Professionals, Project Managers and Aspiring Data Scientists

What are the course deliverables?

Realtime data processing
Functional programming
Spark applications
Parallel processing
Spark RDD optimization techniques
Spark SQL

More about this course

Designed to give you in-depth knowledge of Spark basics
This Hadoop framework program prepares you for success in your role as a big data developer
Learn Hadoop to understand how multiple elements of the Hadoop ecosystem fit in big data processing cycle
Common careers for Big Data Hadoop and Spark Developer professionals are Big data analytics, Business analyst, Big data engineer, Analytics manager and Data architect

Big Data Hadoop and Spark Developer
at
Simplilearn
Curriculum

Lesson 1 Course Introduction

1.1 Course Introduction

1.2 Accessing Practice Lab

Lesson 2 Introduction to Big Data and Hadoop

1.1 Introduction to Big Data and Hadoop

1.2 Introduction to Big Data

1.3 Big Data Analytics

1.4 What is Big Data

1.5 Four Vs Of Big Data

1.6 Case Study: Royal Bank of Scotland

1.7 Challenges of Traditional System

1.8 Distributed Systems

1.9 Introduction to Hadoop

1.10 Components of Hadoop Ecosystem: Part One

1.11 Components of Hadoop Ecosystem: Part Two

1.12 Components of Hadoop Ecosystem: Part Three

1.13 Commercial Hadoop Distributions

1.14 Demo: Walkthrough of Simplilearn Cloudlab

1.15 Key Takeaways

Knowledge Check

Lesson 3 Hadoop Architecture,Distributed Storage (HDFS) and YARN

2.1 Hadoop Architecture Distributed Storage (HDFS) and YARN

2.2 What Is HDFS

2.3 Need for HDFS

2.4 Regular File System vs HDFS

2.5 Characteristics of HDFS

2.6 HDFS Architecture and Components

2.7 High Availability Cluster Implementations

2.8 HDFS Component File System Namespace

2.9 Data Block Split

2.10 Data Replication Topology

2.11 HDFS Command Line

2.12 Demo: Common HDFS Commands

HDFS Command Line

2.13 YARN Introduction

2.14 YARN Use Case

2.15 YARN and Its Architecture

2.16 Resource Manager

2.17 How Resource Manager Operates

2.18 Application Master

2.19 How YARN Runs an Application

2.20 Tools for YARN Developers

2.21 Demo: Walkthrough of Cluster Part One

2.22 Demo: Walkthrough of Cluster Part Two

2.23 Key Takeaways

Knowledge Check

Hadoop Architecture,Distributed Storage (HDFS) and YARN

Lesson 4 Data Ingestion into Big Data Systems and ETL

3.1 Data Ingestion into Big Data Systems and ETL

3.2 Data Ingestion Overview Part One

3.3 Data Ingestion

3.4 Apache Sqoop

3.5 Sqoop and Its Uses

3.6 Sqoop Processing

3.7 Sqoop Import Process

Assisted Practice: Import into Sqoop

3.8 Sqoop Connectors

3.9 Demo: Importing and Exporting Data from MySQL to HDFS

Apache Sqoop

3.9 Apache Flume

3.10 Flume Model

3.11 Scalability in Flume

3.12 Components in Flume's Architecture

3.13 Configuring Flume Components

3.15 Demo: Ingest Twitter Data

3.14 Apache Kafka

3.15 Aggregating User Activity Using Kafka

3.16 Kafka Data Model

3.17 Partitions

3.18 Apache Kafka Architecture

3.19 Producer Side API Example

3.20 Consumer Side API

3.21 Demo: Setup Kafka Cluster

3.21 Consumer Side API Example

3.22 Kafka Connect

3.23 Key Takeaways

3.26 Demo: Creating Sample Kafka Data Pipeline using Producer and Consumer

Knowledge Check

Data Ingestion into Big Data Systems and ETL

Lesson 5 Distributed Processing - MapReduce Framework and Pig

4.1 Distributed Processing MapReduce Framework and Pig

4.2 Distributed Processing in MapReduce

4.3 Word Count Example

4.4 Map Execution Phases

4.5 Map Execution Distributed Two Node Environment

4.6 MapReduce Jobs

4.7 Hadoop MapReduce Job Work Interaction

4.8 Setting Up the Environment for MapReduce Development

4.9 Set of Classes

4.10 Creating a New Project

4.11 Advanced MapReduce

4.12 Data Types in Hadoop

4.13 OutputFormats in MapReduce

4.14 Using Distributed Cache

4.15 Joins in MapReduce

4.16 Replicated Join

4.17 Introduction to Pig

4.18 Components of Pig

4.19 Pig Data Model

4.20 Pig Interactive Modes

4.21 Pig Operations

4.22 Various Relations Performed by Developers

4.23 Demo: Analyzing Web Log Data Using MapReduce

4.24 Demo: Analyzing Sales Data and Solving KPIs using PIG

Apache Pig

4.25 Demo: Wordcount

4.26 Key takeaways

Knowledge Check

Distributed Processing - MapReduce Framework and Pig

Lesson 6 Apache Hive

5.1 Apache Hive

5.2 Hive SQL over Hadoop MapReduce

5.3 Hive Architecture

5.4 Interfaces to Run Hive Queries

5.5 Running Beeline from Command Line

5.6 Hive Metastore

5.7 Hive DDL and DML

5.8 Creating New Table

5.9 Data Types

5.10 Validation of Data

5.11 File Format Types

5.12 Data Serialization

5.13 Hive Table and Avro Schema

5.14 Hive Optimization Partitioning Bucketing and Sampling

5.15 Non Partitioned Table

5.16 Data Insertion

5.17 Dynamic Partitioning in Hive

5.18 Bucketing

5.19 What Do Buckets Do

5.20 Hive Analytics UDF and UDAF

Assisted Practice: Synchronization

5.21 Other Functions of Hive

5.22 Demo: Real-Time Analysis and Data Filteration

5.23 Demo: Real-World Problem

5.24 Demo: Data Representation and Import using Hive

5.25 Key Takeaways

Knowledge Check

Apache Hive

Lesson 7 NoSQL Databases - HBase

6.1 NoSQL Databases HBase

6.2 NoSQL Introduction

Demo: Yarn Tuning

6.3 HBase Overview

6.4 HBase Architecture

6.5 Data Model

6.6 Connecting to HBase

HBase Shell

6.7 Key Takeaways

Knowledge Check

NoSQL Databases - HBase

Lesson 8 Basics of Functional Programming and Scala

7.1 Basics of Functional Programming and Scala

7.2 Introduction to Scala

7.3 Demo: Scala Installation

7.3 Functional Programming

7.4 Programming with Scala

Demo: Basic Literals and Arithmetic Operators

Demo: Logical Operators

7.5 Type Inference Classes Objects and Functions in Scala

Demo: Type Inference Functions Anonymous Function and Class

7.6 Collections

7.7 Types of Collections

Demo: Five Types of Collections

Demo: Operations on List

7.8 Scala REPL

Assisted Practice: Scala REPL

Demo: Features of Scala REPL

7.9 Key Takeaways

Knowledge Check

Basics of Functional Programming and Scala

Lesson 9 Apache Spark Next Generation Big Data Framework

8.1 Apache Spark Next Generation Big Data Framework

8.2 History of Spark

8.3 Limitations of MapReduce in Hadoop

8.4 Introduction to Apache Spark

8.5 Components of Spark

8.6 Application of In-Memory Processing

8.7 Hadoop Ecosystem vs Spark

8.8 Advantages of Spark

8.9 Spark Architecture

8.10 Spark Cluster in Real World

8.11 Demo: Running a Scala Programs in Spark Shell

8.12 Demo: Setting Up Execution Environment in IDE

8.13 Demo: Spark Web UI

8.11 Key Takeaways

Knowledge Check

Apache Spark Next Generation Big Data Framework

Lesson 10 Spark Core Processing RDD

9.1 Processing RDD

9.1 Introduction to Spark RDD

9.2 RDD in Spark

9.3 Creating Spark RDD

9.4 Pair RDD

9.5 RDD Operations

9.6 Demo: Spark Transformation Detailed Exploration Using Scala Examples

9.7 Demo: Spark Action Detailed Exploration Using Scala

9.8 Caching and Persistence

9.9 Storage Levels

9.10 Lineage and DAG

9.11 Need for DAG

9.12 Debugging in Spark

9.13 Partitioning in Spark

9.14 Scheduling in Spark

9.15 Shuffling in Spark

9.16 Sort Shuffle

9.17 Aggregating Data with Pair RDD

9.18 Demo: Spark Application with Data Written Back to HDFS and Spark UI

9.19 Demo: Changing Spark Application Parameters

9.20 Demo: Handling Different File Formats

9.21 Demo: Spark RDD with Real-World Application

9.22 Demo: Optimizing Spark Jobs

Assisted Practice: Changing Spark Application Params

9.23 Key Takeaways

Knowledge Check

Spark Core Processing RDD

Lesson 11 Spark SQL - Processing DataFrames

10.1 Spark SQL Processing DataFrames

10.2 Spark SQL Introduction

10.3 Spark SQL Architecture

10.4 DataFrames

10.5 Demo: Handling Various Data Formats

10.6 Demo: Implement Various DataFrame Operations

10.7 Demo: UDF and UDAF

10.8 Interoperating with RDDs

10.9 Demo: Process DataFrame Using SQL Query

10.10 RDD vs DataFrame vs Dataset

Processing DataFrames

10.11 Key Takeaways

Knowledge Check

Spark SQL - Processing DataFrames

Lesson 12 Spark MLLib - Modelling BigData with Spark

11.1 Spark MLlib Modeling Big Data with Spark

11.2 Role of Data Scientist and Data Analyst in Big Data

11.3 Analytics in Spark

11.4 Machine Learning

11.5 Supervised Learning

11.6 Demo: Classification of Linear SVM

11.7 Demo: Linear Regression with Real World Case Studies

11.8 Unsupervised Learning

11.9 Demo: Unsupervised Clustering K-Means

Assisted Practice: Unsupervised Clustering K-means

11.10 Reinforcement Learning

11.11 Semi-Supervised Learning

11.12 Overview of MLlib

11.13 MLlib Pipelines

11.14 Key Takeaways

Knowledge Check

Spark MLLib - Modeling BigData with Spark

Lesson 13 Stream Processing Frameworks and Spark Streaming

12.1 Stream Processing Frameworks and Spark Streaming

12.1 Streaming Overview

12.2 Real-Time Processing of Big Data

12.3 Data Processing Architectures

12.4 Demo: Real-Time Data Processing

12.5 Spark Streaming

12.6 Demo: Writing Spark Streaming Application

12.7 Introduction to DStreams

12.8 Transformations on DStreams

12.9 Design Patterns for Using ForeachRDD

12.10 State Operations

12.11 Windowing Operations

12.12 Join Operations stream-dataset Join

12.13 Demo: Windowing of Real-Time Data Processing

12.14 Streaming Sources

12.15 Demo: Processing Twitter Streaming Data

12.16 Structured Spark Streaming

12.17 Use Case Banking Transactions

12.18 Structured Streaming Architecture Model and Its Components

12.19 Output Sinks

12.20 Structured Streaming APIs

12.21 Constructing Columns in Structured Streaming

12.22 Windowed Operations on Event-Time

12.23 Use Cases

12.24 Demo: Streaming Pipeline

Spark Streaming

12.25 Key Takeaways

Knowledge Check

Stream Processing Frameworks and Spark Streaming

Lesson 14 Spark GraphX

13.1 Spark GraphX

13.2 Introduction to Graph

13.3 Graphx in Spark

13.4 Graph Operators

13.5 Join Operators

13.6 Graph Parallel System

13.7 Algorithms in Spark

13.8 Pregel API

13.9 Use Case of GraphX

13.10 Demo: GraphX Vertex Predicate

13.11 Demo: Page Rank Algorithm

13.12 Key Takeaways

Knowledge Check

Spark GraphX

13.14 Project Assistance

Practice Projects

Car Insurance Analysis

Transactional Data Analysis

K-Means clustering for telecommunication domain

Big Data Hadoop and Spark Developer
at
Simplilearn
Faculty details

Ronald van Loon

Named by Onalytica as one of the three most influential people in Big Data, Ronald is also an author of a number of leading Big Data and Data Science websites, including Datafloq, Data Science Central, and The Guardian. He also regularly speaks at renowned events.

Big Data Hadoop and Spark Developer
at
Simplilearn
Entry Requirements

Eligibility criteria Up Arrow Icon

Prerequisite: Knowledge of Core Java and SQL

Other courses offered by Simplilearn

Data Analyst

SimplilearnCertificate

Total Fees

– / –

Duration

6 months

Difficulty level

– / –

Professional Certificate Course In Generative AI And Machine Learning

IIT KanpurCertificate

Total Fees

₹1.53 L

Duration

11 months

Difficulty level

– / –

Skills

Python Django Statistics

Cyber Security Expert

SimplilearnCertificate

Total Fees

– / –

Duration

4 days

Difficulty level

– / –

Applied Generative AI Specialization

Purdue UniversityCertificate

Total Fees

₹1.5 L

Duration

4 months

Difficulty level

– / –

Skills

Python Risk Management

View Other 312 Courses

Big Data Hadoop and Spark Developer
at
Simplilearn
Students Ratings & Reviews

4.4/5

5 Ratings

4-5
2
3-4
3

Hemanthkumar PC

Big Data Hadoop and Spark Developer

Offered by Simplilearn

Learning Experience: Course content are advanced and good, platform is virtual which is great to learn from home, i got trained in Data Engineering, this course made a great impact, good for beginners and no cons.

Faculty: Faculty approach is good with hand gestures, quality of sound and video is good, they are well versed in knowledge. Course resources were usefull ppts, they are not updated, course is structured well and assessment we have completed for certificate

Course Support: Being a Mechanical Engineer I don't have a IT knowledge but through this course i got IT knowledge.

Reviewed on 4 Mar 2023Read More

Prachi Shasane

Big Data Hadoop and Spark Developer

Offered by Simplilearn

Learning Experience: Excellent platform, good material and course content. Best part is you get self paced learning opportunities

Faculty: Good support from faculty and also knowledgeable Course content was up to date as per the current market and also assignment were created. Project was also created for practice

Reviewed on 11 Dec 2022Read More

Maheshkumar Shivraj Badmera

Big Data Hadoop and Spark Developer

Offered by Simplilearn

Learning Experience: It was great because of trainer Mr. Sarvesh Sir.

Faculty: Excellent. Thorough knowledge of big data. Course is properly structured an executed

Course Support: NO yet

Reviewed on 29 Jul 2022Read More

SURYAKUMAR RAVI

Big Data Hadoop and Spark Developer

Offered by Simplilearn

Learning Experience: Big data tools like spark, hive, sqoop and hbase