Big Data Hadoop and Spark Developer
- Offered bySimplilearn
- Private Institute
- Estd. 2010
Big Data Hadoop and Spark Developer at Simplilearn Overview
Duration | 11 hours |
Total fee | Free |
Mode of learning | Online |
Difficulty level | Intermediate |
Official Website | Explore Free Course |
Credential | Certificate |
Big Data Hadoop and Spark Developer at Simplilearn Highlights
- Gain a potential to earn 3.4L- 30L Per Annum
- Earn a certificate and avail a 90 Days of access to this free course by Simplilearn
- Big Data Hadoop and Spark Developer professionals are hired by companies like Amazon, Accenture, Linkedin, Cognizant, IBM
Big Data Hadoop and Spark Developer at Simplilearn Course details
- For IT Professionals, BI Professionals, Analytics Professionals, Software Developers, Senior IT Professionals, Project Managers and Aspiring Data Scientists
- Realtime data processing
- Functional programming
- Spark applications
- Parallel processing
- Spark RDD optimization techniques
- Spark SQL
- Designed to give you in-depth knowledge of Spark basics
- This Hadoop framework program prepares you for success in your role as a big data developer
- Learn Hadoop to understand how multiple elements of the Hadoop ecosystem fit in big data processing cycle
- Common careers for Big Data Hadoop and Spark Developer professionals are Big data analytics, Business analyst, Big data engineer, Analytics manager and Data architect
Big Data Hadoop and Spark Developer at Simplilearn Curriculum
Lesson 1 Course Introduction
1.1 Course Introduction
1.2 Accessing Practice Lab
Lesson 2 Introduction to Big Data and Hadoop
1.1 Introduction to Big Data and Hadoop
1.2 Introduction to Big Data
1.3 Big Data Analytics
1.4 What is Big Data
1.5 Four Vs Of Big Data
1.6 Case Study: Royal Bank of Scotland
1.7 Challenges of Traditional System
1.8 Distributed Systems
1.9 Introduction to Hadoop
1.10 Components of Hadoop Ecosystem: Part One
1.11 Components of Hadoop Ecosystem: Part Two
1.12 Components of Hadoop Ecosystem: Part Three
1.13 Commercial Hadoop Distributions
1.14 Demo: Walkthrough of Simplilearn Cloudlab
1.15 Key Takeaways
Knowledge Check
Lesson 3 Hadoop Architecture,Distributed Storage (HDFS) and YARN
2.1 Hadoop Architecture Distributed Storage (HDFS) and YARN
2.2 What Is HDFS
2.3 Need for HDFS
2.4 Regular File System vs HDFS
2.5 Characteristics of HDFS
2.6 HDFS Architecture and Components
2.7 High Availability Cluster Implementations
2.8 HDFS Component File System Namespace
2.9 Data Block Split
2.10 Data Replication Topology
2.11 HDFS Command Line
2.12 Demo: Common HDFS Commands
HDFS Command Line
2.13 YARN Introduction
2.14 YARN Use Case
2.15 YARN and Its Architecture
2.16 Resource Manager
2.17 How Resource Manager Operates
2.18 Application Master
2.19 How YARN Runs an Application
2.20 Tools for YARN Developers
2.21 Demo: Walkthrough of Cluster Part One
2.22 Demo: Walkthrough of Cluster Part Two
2.23 Key Takeaways
Knowledge Check
Hadoop Architecture,Distributed Storage (HDFS) and YARN
Lesson 4 Data Ingestion into Big Data Systems and ETL
3.1 Data Ingestion into Big Data Systems and ETL
3.2 Data Ingestion Overview Part One
3.3 Data Ingestion
3.4 Apache Sqoop
3.5 Sqoop and Its Uses
3.6 Sqoop Processing
3.7 Sqoop Import Process
Assisted Practice: Import into Sqoop
3.8 Sqoop Connectors
3.9 Demo: Importing and Exporting Data from MySQL to HDFS
Apache Sqoop
3.9 Apache Flume
3.10 Flume Model
3.11 Scalability in Flume
3.12 Components in Flume's Architecture
3.13 Configuring Flume Components
3.15 Demo: Ingest Twitter Data
3.14 Apache Kafka
3.15 Aggregating User Activity Using Kafka
3.16 Kafka Data Model
3.17 Partitions
3.18 Apache Kafka Architecture
3.19 Producer Side API Example
3.20 Consumer Side API
3.21 Demo: Setup Kafka Cluster
3.21 Consumer Side API Example
3.22 Kafka Connect
3.23 Key Takeaways
3.26 Demo: Creating Sample Kafka Data Pipeline using Producer and Consumer
Knowledge Check
Data Ingestion into Big Data Systems and ETL
Lesson 5 Distributed Processing - MapReduce Framework and Pig
4.1 Distributed Processing MapReduce Framework and Pig
4.2 Distributed Processing in MapReduce
4.3 Word Count Example
4.4 Map Execution Phases
4.5 Map Execution Distributed Two Node Environment
4.6 MapReduce Jobs
4.7 Hadoop MapReduce Job Work Interaction
4.8 Setting Up the Environment for MapReduce Development
4.9 Set of Classes
4.10 Creating a New Project
4.11 Advanced MapReduce
4.12 Data Types in Hadoop
4.13 OutputFormats in MapReduce
4.14 Using Distributed Cache
4.15 Joins in MapReduce
4.16 Replicated Join
4.17 Introduction to Pig
4.18 Components of Pig
4.19 Pig Data Model
4.20 Pig Interactive Modes
4.21 Pig Operations
4.22 Various Relations Performed by Developers
4.23 Demo: Analyzing Web Log Data Using MapReduce
4.24 Demo: Analyzing Sales Data and Solving KPIs using PIG
Apache Pig
4.25 Demo: Wordcount
4.26 Key takeaways
Knowledge Check
Distributed Processing - MapReduce Framework and Pig
Lesson 6 Apache Hive
5.1 Apache Hive
5.2 Hive SQL over Hadoop MapReduce
5.3 Hive Architecture
5.4 Interfaces to Run Hive Queries
5.5 Running Beeline from Command Line
5.6 Hive Metastore
5.7 Hive DDL and DML
5.8 Creating New Table
5.9 Data Types
5.10 Validation of Data
5.11 File Format Types
5.12 Data Serialization
5.13 Hive Table and Avro Schema
5.14 Hive Optimization Partitioning Bucketing and Sampling
5.15 Non Partitioned Table
5.16 Data Insertion
5.17 Dynamic Partitioning in Hive
5.18 Bucketing
5.19 What Do Buckets Do
5.20 Hive Analytics UDF and UDAF
Assisted Practice: Synchronization
5.21 Other Functions of Hive
5.22 Demo: Real-Time Analysis and Data Filteration
5.23 Demo: Real-World Problem
5.24 Demo: Data Representation and Import using Hive
5.25 Key Takeaways
Knowledge Check
Apache Hive
Lesson 7 NoSQL Databases - HBase
6.1 NoSQL Databases HBase
6.2 NoSQL Introduction
Demo: Yarn Tuning
6.3 HBase Overview
6.4 HBase Architecture
6.5 Data Model
6.6 Connecting to HBase
HBase Shell
6.7 Key Takeaways
Knowledge Check
NoSQL Databases - HBase
Lesson 8 Basics of Functional Programming and Scala
7.1 Basics of Functional Programming and Scala
7.2 Introduction to Scala
7.3 Demo: Scala Installation
7.3 Functional Programming
7.4 Programming with Scala
Demo: Basic Literals and Arithmetic Operators
Demo: Logical Operators
7.5 Type Inference Classes Objects and Functions in Scala
Demo: Type Inference Functions Anonymous Function and Class
7.6 Collections
7.7 Types of Collections
Demo: Five Types of Collections
Demo: Operations on List
7.8 Scala REPL
Assisted Practice: Scala REPL
Demo: Features of Scala REPL
7.9 Key Takeaways
Knowledge Check
Basics of Functional Programming and Scala
Lesson 9 Apache Spark Next Generation Big Data Framework
8.1 Apache Spark Next Generation Big Data Framework
8.2 History of Spark
8.3 Limitations of MapReduce in Hadoop
8.4 Introduction to Apache Spark
8.5 Components of Spark
8.6 Application of In-Memory Processing
8.7 Hadoop Ecosystem vs Spark
8.8 Advantages of Spark
8.9 Spark Architecture
8.10 Spark Cluster in Real World
8.11 Demo: Running a Scala Programs in Spark Shell
8.12 Demo: Setting Up Execution Environment in IDE
8.13 Demo: Spark Web UI
8.11 Key Takeaways
Knowledge Check
Apache Spark Next Generation Big Data Framework
Lesson 10 Spark Core Processing RDD
9.1 Processing RDD
9.1 Introduction to Spark RDD
9.2 RDD in Spark
9.3 Creating Spark RDD
9.4 Pair RDD
9.5 RDD Operations
9.6 Demo: Spark Transformation Detailed Exploration Using Scala Examples
9.7 Demo: Spark Action Detailed Exploration Using Scala
9.8 Caching and Persistence
9.9 Storage Levels
9.10 Lineage and DAG
9.11 Need for DAG
9.12 Debugging in Spark
9.13 Partitioning in Spark
9.14 Scheduling in Spark
9.15 Shuffling in Spark
9.16 Sort Shuffle
9.17 Aggregating Data with Pair RDD
9.18 Demo: Spark Application with Data Written Back to HDFS and Spark UI
9.19 Demo: Changing Spark Application Parameters
9.20 Demo: Handling Different File Formats
9.21 Demo: Spark RDD with Real-World Application
9.22 Demo: Optimizing Spark Jobs
Assisted Practice: Changing Spark Application Params
9.23 Key Takeaways
Knowledge Check
Spark Core Processing RDD
Lesson 11 Spark SQL - Processing DataFrames
10.1 Spark SQL Processing DataFrames
10.2 Spark SQL Introduction
10.3 Spark SQL Architecture
10.4 DataFrames
10.5 Demo: Handling Various Data Formats
10.6 Demo: Implement Various DataFrame Operations
10.7 Demo: UDF and UDAF
10.8 Interoperating with RDDs
10.9 Demo: Process DataFrame Using SQL Query
10.10 RDD vs DataFrame vs Dataset
Processing DataFrames
10.11 Key Takeaways
Knowledge Check
Spark SQL - Processing DataFrames
Lesson 12 Spark MLLib - Modelling BigData with Spark
11.1 Spark MLlib Modeling Big Data with Spark
11.2 Role of Data Scientist and Data Analyst in Big Data
11.3 Analytics in Spark
11.4 Machine Learning
11.5 Supervised Learning
11.6 Demo: Classification of Linear SVM
11.7 Demo: Linear Regression with Real World Case Studies
11.8 Unsupervised Learning
11.9 Demo: Unsupervised Clustering K-Means
Assisted Practice: Unsupervised Clustering K-means
11.10 Reinforcement Learning
11.11 Semi-Supervised Learning
11.12 Overview of MLlib
11.13 MLlib Pipelines
11.14 Key Takeaways
Knowledge Check
Spark MLLib - Modeling BigData with Spark
Lesson 13 Stream Processing Frameworks and Spark Streaming
12.1 Stream Processing Frameworks and Spark Streaming
12.1 Streaming Overview
12.2 Real-Time Processing of Big Data
12.3 Data Processing Architectures
12.4 Demo: Real-Time Data Processing
12.5 Spark Streaming
12.6 Demo: Writing Spark Streaming Application
12.7 Introduction to DStreams
12.8 Transformations on DStreams
12.9 Design Patterns for Using ForeachRDD
12.10 State Operations
12.11 Windowing Operations
12.12 Join Operations stream-dataset Join
12.13 Demo: Windowing of Real-Time Data Processing
12.14 Streaming Sources
12.15 Demo: Processing Twitter Streaming Data
12.16 Structured Spark Streaming
12.17 Use Case Banking Transactions
12.18 Structured Streaming Architecture Model and Its Components
12.19 Output Sinks
12.20 Structured Streaming APIs
12.21 Constructing Columns in Structured Streaming
12.22 Windowed Operations on Event-Time
12.23 Use Cases
12.24 Demo: Streaming Pipeline
Spark Streaming
12.25 Key Takeaways
Knowledge Check
Stream Processing Frameworks and Spark Streaming
Lesson 14 Spark GraphX
13.1 Spark GraphX
13.2 Introduction to Graph
13.3 Graphx in Spark
13.4 Graph Operators
13.5 Join Operators
13.6 Graph Parallel System
13.7 Algorithms in Spark
13.8 Pregel API
13.9 Use Case of GraphX
13.10 Demo: GraphX Vertex Predicate
13.11 Demo: Page Rank Algorithm
13.12 Key Takeaways
Knowledge Check
Spark GraphX
13.14 Project Assistance
Practice Projects
Car Insurance Analysis
Transactional Data Analysis
K-Means clustering for telecommunication domain