Skillsoft
Skillsoft Logo

Big Data Hadoop & Spark Developer 

  • Offered bySkillsoft

Big Data Hadoop & Spark Developer
 at 
Skillsoft 
Overview

Duration

26 hours

Total fee

15,425

Mode of learning

Online

Difficulty level

Intermediate

Credential

Certificate

Future job roles

FTA, CRUD, .Net, CSR, Senior Software Developer

Big Data Hadoop & Spark Developer
 at 
Skillsoft 
Highlights

  • Certification from Naukri Learning, Content aligned with most Certifying bodies
  • 400mn+ users & used by Professionals in 70% of Fortune 500 companies
Details Icon

Big Data Hadoop & Spark Developer
 at 
Skillsoft 
Course details

Who should do this course?
  • Technical personnel with a background in Linux, SQL, and programming who intend to join a Hadoop Engineering team in roles such as Hadoop developer, data architect, or data engineer or roles related to technical project management, cluster operations, or data analysis
What are the course deliverables?
  • Unlimited Access to Online Content for six months
  • Course Completion certificate - renowned globally
  • 400mn+ users, World's No 1 & trained 70% of Fortune 500 companies
  • Career boost for students and professionals
More about this course
  • Spark Core provides basic I/O functionalities, distributed task dispatching, and scheduling. Resilient Distributed Datasets (RDDs) are logical collections of data partitioned across machines. RDDs can be created by referencing datasets in external storage systems, or by applying transformations on existing RDDs. In this course, you will learn how to improve Spark's performance and work with Data Frames and Spark SQL. Spark Streaming leverages Spark's language-integrated API to perform streaming analytics. This design enables the same set of application code written for batch processing to join streams against historical data, or run ad-hoc queries on stream state. In this course, you will learn how to work with different input streams, perform transformations on streams, and tune up performance. MLlib is Spark's machine learning library. GraphX is Spark's API for graphs and graph-parallel computation. SparkR exposes the API and allows users to run jobs from the R shell on a cluster. In this course, you will learn how to work with each of these libraries.
Read more

Big Data Hadoop & Spark Developer
 at 
Skillsoft 
Curriculum

Big Data & Hadoop

Ecosystem for Hadoop

Installation of Hadoop

Data Repository with HDFS and Hbase

Data Repository with Flume

Data Refinery with YARN and MapReduce

Data Factory with Hive

Data Factory with Oozie and Hue

Data Flow for the Hadoop Ecosystem

Apache Spark

Spark Core

start the course

recall what is included in the Spark Stack

define lazy evaluation as it relates to Spark

recall that RDD is an interface comprised of a set of partitions, list of dependencies, and functions to compute

pre-partition an RDD for performance

store RDDS in serialized form

perform numeric operations on RDDs

create custom accumulators

use broadcast functionality for optimization

pipe to external applications

adjust garbage collection settings

perform batch import on a Spark cluster

determine memory consumption

tune data structures to reduce memory consumption

use Spark's different shuffle operations to minimize memory usage of reduce tasks

set the levels of parallelism for each operation

create DataFrames

interoperate with RDDs

describe the generic load and save functions

read and write Parquet files

use JSON Dataset as a DataFrame

read and write data in Hive tables

read and write data using JDBC

run the Thrift JDBC/OCBC server

show the different ways to tune up Spark for better performance

Spark Streaming

start the course

describe what a DStream is

recall how TCP socket input streams are ingested

describe how file input streams are read

recall how Akka Actor input streams are received

describe how Kafka input streams are consumed

recall how Flume input streams are ingested

set up Kinesis input streams

configure Twitter input streams

implement custom input streams

describe receiver reliability

use the UpdateStateByKey operation

perform transform operations

perform Window operations

perform join operations

use output operations on Streams

use DataFrame and SQL operations on streaming data

use learning algorithms with MLlib

persist stream data in memory

enable and configure checkpointing

deploy applications

monitor applications

reduce batch processing times

set the right batch interval

tune memory usage

describe fault tolerance semantics

perform transformations on Dstreams

MLlib, GraphX, and R

start the course

describe data types

recall the basic statistics

describe linear SVMs

perform logistic regression

use naïve bayes

create decision trees

use collaborative filtering with ALS

perform clustering with K-means

perform clustering with LDA

perform analysis with frequent pattern mining

describe the property graph

describe the graph operators

perform analytics with neighborhood aggregation

perform messaging with Pregel API

build graphs

describe vertex and edge RDDs

optimize representation through partitioning

measure vertices with PageRank

install SparkR

run SparkR

use existing R packages

expose RDDs as distributed lists

convert existing RDDs into DataFrames

read and write parquet files

run SparkR on a cluster

use the algorithms and utilities in MLlib

Other courses offered by Skillsoft

5.03 K
6 hours
Intermediate
6.01 K
3 hours
Intermediate
6.01 K
3 hours
Intermediate
11.83 K
1 hours
Intermediate
View Other 249 CoursesRight Arrow Icon

Big Data Hadoop & Spark Developer
 at 
Skillsoft 
Students Ratings & Reviews

5/5
Verified Icon3 Ratings
A
Ajay Singh
Big Data Hadoop & Spark Developer
Offered by Skillsoft
5
Nice Course
Other: A crisp list for users looking out of Big Data courses.
Reviewed on 2 Apr 2019Read More
Thumbs Up IconThumbs Down Icon
A
Ashlesha Gupta
Big Data Hadoop & Spark Developer
Offered by Skillsoft
5
Good Course, In-depth content
Other: Good courses on e-learning page. I found this page on Big Data courses quite helpful. I was able to do a Big Data Engineering course. The curriculum of the course is helpful as it lets you understand your needs and the deliverables and make a wise decision.
Reviewed on 2 Apr 2019Read More
Thumbs Up IconThumbs Down Icon
View All 2 ReviewsRight Arrow Icon
qna

Big Data Hadoop & Spark Developer
 at 
Skillsoft 

Student Forum

chatAnything you would want to ask experts?
Write here...