Big Data Hadoop & Spark Developer

5.0 /5

(3 Ratings)

Offered bySkillsoft

Big Data Hadoop & Spark Developer
at
Skillsoft
Overview

Duration	26 hours
Total fee	₹15,425
Mode of learning	Online
Difficulty level	Intermediate
Credential	Certificate
Future job roles	FTA, CRUD, .Net, CSR, Senior Software Developer

Big Data Hadoop & Spark Developer
at
Skillsoft
Highlights

Certification from Naukri Learning, Content aligned with most Certifying bodies
400mn+ users & used by Professionals in 70% of Fortune 500 companies

Big Data Hadoop & Spark Developer
at
Skillsoft
Course details

Skills you will learn

Spark Hbase HDFS Hadoop Flume Hive Big Data

Who should do this course?

Technical personnel with a background in Linux, SQL, and programming who intend to join a Hadoop Engineering team in roles such as Hadoop developer, data architect, or data engineer or roles related to technical project management, cluster operations, or data analysis

What are the course deliverables?

Unlimited Access to Online Content for six months
Course Completion certificate - renowned globally
400mn+ users, World's No 1 & trained 70% of Fortune 500 companies
Career boost for students and professionals

More about this course

Spark Core provides basic I/O functionalities, distributed task dispatching, and scheduling. Resilient Distributed Datasets (RDDs) are logical collections of data partitioned across machines. RDDs can be created by referencing datasets in external storage systems, or by applying transformations on existing RDDs. In this course, you will learn how to improve Spark's performance and work with Data Frames and Spark SQL. Spark Streaming leverages Spark's language-integrated API to perform streaming analytics. This design enables the same set of application code written for batch processing to join streams against historical data, or run ad-hoc queries on stream state. In this course, you will learn how to work with different input streams, perform transformations on streams, and tune up performance. MLlib is Spark's machine learning library. GraphX is Spark's API for graphs and graph-parallel computation. SparkR exposes the API and allows users to run jobs from the R shell on a cluster. In this course, you will learn how to work with each of these libraries.

Big Data Hadoop & Spark Developer
at
Skillsoft
Curriculum

Big Data & Hadoop

Ecosystem for Hadoop

Installation of Hadoop

Data Repository with HDFS and Hbase

Data Repository with Flume

Data Refinery with YARN and MapReduce

Data Factory with Hive

Data Factory with Oozie and Hue

Data Flow for the Hadoop Ecosystem

Apache Spark

Spark Core

start the course

recall what is included in the Spark Stack

define lazy evaluation as it relates to Spark

recall that RDD is an interface comprised of a set of partitions, list of dependencies, and functions to compute

pre-partition an RDD for performance

store RDDS in serialized form

perform numeric operations on RDDs

create custom accumulators

use broadcast functionality for optimization

pipe to external applications

adjust garbage collection settings

perform batch import on a Spark cluster

determine memory consumption

tune data structures to reduce memory consumption

use Spark's different shuffle operations to minimize memory usage of reduce tasks

set the levels of parallelism for each operation

create DataFrames

interoperate with RDDs

describe the generic load and save functions

read and write Parquet files

use JSON Dataset as a DataFrame

read and write data in Hive tables

read and write data using JDBC

run the Thrift JDBC/OCBC server

show the different ways to tune up Spark for better performance

Spark Streaming

start the course

describe what a DStream is

recall how TCP socket input streams are ingested

describe how file input streams are read

recall how Akka Actor input streams are received

describe how Kafka input streams are consumed

recall how Flume input streams are ingested

set up Kinesis input streams

configure Twitter input streams

implement custom input streams

describe receiver reliability

use the UpdateStateByKey operation

perform transform operations

perform Window operations

perform join operations

use output operations on Streams

use DataFrame and SQL operations on streaming data

use learning algorithms with MLlib

persist stream data in memory

enable and configure checkpointing

deploy applications

monitor applications

reduce batch processing times

set the right batch interval

tune memory usage

describe fault tolerance semantics

perform transformations on Dstreams

MLlib, GraphX, and R

start the course

describe data types

recall the basic statistics

describe linear SVMs

perform logistic regression

use naïve bayes

create decision trees

use collaborative filtering with ALS

perform clustering with K-means

perform clustering with LDA

perform analysis with frequent pattern mining

describe the property graph

describe the graph operators

perform analytics with neighborhood aggregation

perform messaging with Pregel API

build graphs

describe vertex and edge RDDs

optimize representation through partitioning

measure vertices with PageRank

install SparkR

run SparkR

use existing R packages

expose RDDs as distributed lists

convert existing RDDs into DataFrames

read and write parquet files

run SparkR on a cluster

use the algorithms and utilities in MLlib

Other courses offered by Skillsoft

ISTQB-BCS Certified Tester Foundation Level

SkillsoftCertificate

4.7

Total Fees

₹5.03 K

Duration

6 hours

Difficulty level

Intermediate

Skills

Integration testing System testing Test Management White box testing +4 more

Frontline Call Center Skills

SkillsoftCertificate

Total Fees

₹6.01 K

Duration

3 hours

Difficulty level

Intermediate

Skills

Business Development Sales BPO Body Language

Customer Advocacy

SkillsoftCertificate

Total Fees

₹6.01 K

Duration

3 hours

Difficulty level

Intermediate

Skills

Business Development Sales Customer Relationship

Inbound Call Center Management

SkillsoftCertificate

Total Fees

₹11.83 K

Duration

1 hours

Difficulty level

Intermediate

View Other 249 Courses

Big Data Hadoop & Spark Developer
at
Skillsoft
Students Ratings & Reviews

5/5

3 Ratings

4-5
3

Ajay Singh

Big Data Hadoop & Spark Developer

Offered by Skillsoft

Nice Course

Other: A crisp list for users looking out of Big Data courses.

Reviewed on 2 Apr 2019Read More

Ashlesha Gupta

Big Data Hadoop & Spark Developer

Offered by Skillsoft

Good Course, In-depth content

Other: Good courses on e-learning page. I found this page on Big Data courses quite helpful. I was able to do a Big Data Engineering course. The curriculum of the course is helpful as it lets you understand your needs and the deliverables and make a wise decision.

Reviewed on 2 Apr 2019Read More

View All 2 Reviews