Skillsoft
Skillsoft Logo

Big Data Hadoop Expert 

  • Offered bySkillsoft

Big Data Hadoop Expert
 at 
Skillsoft 
Overview

Duration

29 hours

Total fee

21,796

Mode of learning

Online

Difficulty level

Intermediate

Credential

Certificate

Future job roles

CRUD, .Net, CSR, Credit risk, Senior Software Developer

Big Data Hadoop Expert
 at 
Skillsoft 
Highlights

  • Multiple Certificates - Hadoop, Spark, Apache Kafka OR Apache Storm
  • 400mn+ users & used by Professionals in 70% of Fortune 500 companies
Details Icon

Big Data Hadoop Expert
 at 
Skillsoft 
Course details

Who should do this course?
  • Technical personnel with a background in Linux, SQL, and programming who intend to join a Hadoop Engineering team in roles such as Hadoop developer, data architect, or data engineer or roles related to technical project management, cluster operations, or data analysis
What are the course deliverables?
  • Unlimited Access to Online Content for six months
  • Course Completion certificate - renowned globally
  • 400mn+ users, World's No 1 & trained 70% of Fortune 500 companies
  • Career boost for students and professionals
More about this course
  • Spark Core provides basic I/O functionalities, distributed task dispatching, and scheduling. Resilient Distributed Datasets (RDDs) are logical collections of data partitioned across machines. RDDs can be created by referencing datasets in external storage systems, or by applying transformations on existing RDDs. In this course, you will learn how to improve Spark's performance and work with Data Frames and Spark SQL. Spark Streaming leverages Spark's language-integrated API to perform streaming analytics. This design enables the same set of application code written for batch processing to join streams against historical data, or run ad-hoc queries on stream state. In this course, you will learn how to work with different input streams, perform transformations on streams, and tune up performance. MLlib is Spark's machine learning library. GraphX is Spark's API for graphs and graph-parallel computation. SparkR exposes the API and allows users to run jobs from the R shell on a cluster. In this course, you will learn how to work with each of these libraries.
Read more

Big Data Hadoop Expert
 at 
Skillsoft 
Curriculum

Big Data & Hadoop

Ecosystem for Hadoop

Installation of Hadoop

Data Repository with HDFS and Hbase

Data Repository with Flume

Data Refinery with YARN and MapReduce

Data Factory with Hive

Data Factory with Oozie and Hue

Data Flow for the Hadoop Ecosystem

Data Flow for the Hadoop Ecosystem

Spark Core

start the course

recall what is included in the Spark Stack

define lazy evaluation as it relates to Spark

recall that RDD is an interface comprised of a set of partitions, list of dependencies, and functions to compute

pre-partition an RDD for performance

store RDDS in serialized form

perform numeric operations on RDDs

create custom accumulators

use broadcast functionality for optimization

pipe to external applications

adjust garbage collection settings

perform batch import on a Spark cluster

determine memory consumption

tune data structures to reduce memory consumption

use Spark's different shuffle operations to minimize memory usage of reduce tasks

set the levels of parallelism for each operation

create DataFrames

interoperate with RDDs

describe the generic load and save functions

read and write Parquet files

use JSON Dataset as a DataFrame

read and write data in Hive tables

read and write data using JDBC

run the Thrift JDBC/OCBC server

show the different ways to tune up Spark for better performance

Spark Streaming

start the course

describe what a DStream is

recall how TCP socket input streams are ingested

describe how file input streams are read

recall how Akka Actor input streams are received

describe how Kafka input streams are consumed

recall how Flume input streams are ingested

set up Kinesis input streams

configure Twitter input streams

implement custom input streams

describe receiver reliability

use the UpdateStateByKey operation

perform transform operations

perform Window operations

perform join operations

use output operations on Streams

use DataFrame and SQL operations on streaming data

use learning algorithms with MLlib

persist stream data in memory

enable and configure checkpointing

deploy applications

monitor applications

reduce batch processing times

set the right batch interval

tune memory usage

describe fault tolerance semantics

perform transformations on Dstreams

MLlib, GraphX, and R

start the course

describe data types

recall the basic statistics

describe linear SVMs

perform logistic regression

use naïve bayes

create decision trees

use collaborative filtering with ALS

perform clustering with K-means

perform clustering with LDA

perform analysis with frequent pattern mining

describe the property graph

describe the graph operators

perform analytics with neighborhood aggregation

perform messaging with Pregel API

build graphs

describe vertex and edge RDDs

optimize representation through partitioning

measure vertices with PageRank

install SparkR

run SparkR

use existing R packages

expose RDDs as distributed lists

convert existing RDDs into DataFrames

read and write parquet files

run SparkR on a cluster

use the algorithms and utilities in MLlib

Apache Kafka

start the course

describe the function of Apache Kafka

describe the architecture of Apache Kafka

describe Apache Kafka topics

describe Apache Kafka partitions

describe Apache Kafka replicas

describe Apache Kafka producers

describe Apache Kafka consumers

describe Apache Kafka brokers

describe common hardware and OS specifications and their impact in Apache Kafka

describe the main options to deploy Apache Kafka

deploy Apache Kafka to Red Hat and CentOS

deploy Apache Kafka to Puppet

add and remove a broker in Apache Kafka

move data and partitions in Apache Kafka for performance purposes

add a new topic in Apache Kafka

scale a producer in Apache Kafka

scale a consumer in Apache Kafka

monitor Apache Kafka using the web console

monitor Apache Kafka using the offset monitor

monitor Apache Kafka using Graphite

monitor Apache Kafka using JMX

monitor Apache Kafka using the log files

tune the Linux kernel for Apache Kafka

tune Linux systems disk throughput for Apache Kafka

tune the Java VM for Apache Kafka

configure and manage Apache Kafka

APACHE KAFKA DEVELOPMENT

start the course

describe the high-level consumer API for reading from Apache Kafka

describe the simple consumer API for reading from Apache Kafka

describe the Hadoop consumer API for reading from Apache Kafka

configure Apache Kafka brokers

configure Apache Kafka consumers

configure Apache Kafka producers

configure compression in Apache Kafka

describe the producer API in Apache Kafka

describe the SyncProducer API in Apache Kafka

describe the AsyncProducer API in Apache Kafka

configure message acknowledgement, or acking, in Apache Kafka

batch messages in Apache Kafka

specify keyed and non-keyed messages in Apache Kafka

configure broker discovery in Apache Kafka

use Apache Kafka test suites for testing

configure serialization and deserialization in Apache Kafka

build a custom serializer in Apache Kafka

configure a broker and create a producer and a consumer in Apache Kafka

OR

Apache Storm Introduction

ARCHITECTURE AND INSTALLATION

start the course

describe in a higher scope, Apache Storm and its characteristics

describe why Apache Storm is used

describe the Apache Storm Architecture

identify a tuple and a bolt and their use in Storm

identify a spout and its use in Storm

identify streams and their use in Storm

describe the different operation modes of Storm

identify Storm components and their functionality in the source code for an example Storm application

describe the setup process for an Integrated Storm development environment

use Maven to compile and run a Storm application

describe the installation and setup process for ZooKeeper as a standalone server

install and set up ZooKeeper on a development machine

deploy a ZooKeeper server in standalone mode and test it with a ZooKeeper client connection

describe the process for setting up and deploying a ZooKeeper cluster

demonstrate the process of setting up a production Storm cluster

describe the process of configuring the parallelism of a topology

configure the parallelism of spout and bolt components in a Storm topology

describe briefly stream groupings and their types

use stream groupings in a Storm topology

describe the Guaranteed Messaging Process

describe the fault-tolerant characteristics of Storm

describe briefly what Trident is and how it's used

describe Trident's data model and its use

describe several operations of Trident

test your knowledge of Apache Storm and the components of the system

Other courses offered by Skillsoft

5.03 K
6 hours
Intermediate
6.01 K
3 hours
Intermediate
6.01 K
3 hours
Intermediate
11.83 K
1 hours
Intermediate
View Other 249 CoursesRight Arrow Icon

Big Data Hadoop Expert
 at 
Skillsoft 
Students Ratings & Reviews

5/5
Verified Icon2 Ratings
A
Aatish Inamdar
Big Data Hadoop Expert
Offered by Skillsoft
5
Great Content, Good Course
Other: While searching for big data courses, I landed here. The courses here have extensive learning material and I had to go nowhere else. The course content is great and with the support of the team, carrying out the big data course was hassle free.
Reviewed on 2 Apr 2019Read More
Thumbs Up IconThumbs Down Icon
K
Karan Jain
Big Data Hadoop Expert
Offered by Skillsoft
5
Nice Course, easy to understand
Other: I have enrolled in the Big Data Hadoop program for gaining an insight knowledge of this topic. The course has an advance level learning content that helped me understand the basics and as well as gave in-depth knowledge.
Reviewed on 2 Apr 2019Read More
Thumbs Up IconThumbs Down Icon
View All 2 ReviewsRight Arrow Icon
qna

Big Data Hadoop Expert
 at 
Skillsoft 

Student Forum

chatAnything you would want to ask experts?
Write here...