Big Data Hadoop Expert
- Offered bySkillsoft
Big Data Hadoop Expert at Skillsoft Overview
Duration | 29 hours |
Total fee | ₹21,796 |
Mode of learning | Online |
Difficulty level | Intermediate |
Credential | Certificate |
Future job roles | CRUD, .Net, CSR, Credit risk, Senior Software Developer |
Big Data Hadoop Expert at Skillsoft Highlights
- Multiple Certificates - Hadoop, Spark, Apache Kafka OR Apache Storm
- 400mn+ users & used by Professionals in 70% of Fortune 500 companies
Big Data Hadoop Expert at Skillsoft Course details
- Technical personnel with a background in Linux, SQL, and programming who intend to join a Hadoop Engineering team in roles such as Hadoop developer, data architect, or data engineer or roles related to technical project management, cluster operations, or data analysis
- Unlimited Access to Online Content for six months
- Course Completion certificate - renowned globally
- 400mn+ users, World's No 1 & trained 70% of Fortune 500 companies
- Career boost for students and professionals
- Spark Core provides basic I/O functionalities, distributed task dispatching, and scheduling. Resilient Distributed Datasets (RDDs) are logical collections of data partitioned across machines. RDDs can be created by referencing datasets in external storage systems, or by applying transformations on existing RDDs. In this course, you will learn how to improve Spark's performance and work with Data Frames and Spark SQL. Spark Streaming leverages Spark's language-integrated API to perform streaming analytics. This design enables the same set of application code written for batch processing to join streams against historical data, or run ad-hoc queries on stream state. In this course, you will learn how to work with different input streams, perform transformations on streams, and tune up performance. MLlib is Spark's machine learning library. GraphX is Spark's API for graphs and graph-parallel computation. SparkR exposes the API and allows users to run jobs from the R shell on a cluster. In this course, you will learn how to work with each of these libraries.
Big Data Hadoop Expert at Skillsoft Curriculum
Big Data & Hadoop
Ecosystem for Hadoop
Installation of Hadoop
Data Repository with HDFS and Hbase
Data Repository with Flume
Data Refinery with YARN and MapReduce
Data Factory with Hive
Data Factory with Oozie and Hue
Data Flow for the Hadoop Ecosystem
Data Flow for the Hadoop Ecosystem
Spark Core
start the course
recall what is included in the Spark Stack
define lazy evaluation as it relates to Spark
recall that RDD is an interface comprised of a set of partitions, list of dependencies, and functions to compute
pre-partition an RDD for performance
store RDDS in serialized form
perform numeric operations on RDDs
create custom accumulators
use broadcast functionality for optimization
pipe to external applications
adjust garbage collection settings
perform batch import on a Spark cluster
determine memory consumption
tune data structures to reduce memory consumption
use Spark's different shuffle operations to minimize memory usage of reduce tasks
set the levels of parallelism for each operation
create DataFrames
interoperate with RDDs
describe the generic load and save functions
read and write Parquet files
use JSON Dataset as a DataFrame
read and write data in Hive tables
read and write data using JDBC
run the Thrift JDBC/OCBC server
show the different ways to tune up Spark for better performance
Spark Streaming
start the course
describe what a DStream is
recall how TCP socket input streams are ingested
describe how file input streams are read
recall how Akka Actor input streams are received
describe how Kafka input streams are consumed
recall how Flume input streams are ingested
set up Kinesis input streams
configure Twitter input streams
implement custom input streams
describe receiver reliability
use the UpdateStateByKey operation
perform transform operations
perform Window operations
perform join operations
use output operations on Streams
use DataFrame and SQL operations on streaming data
use learning algorithms with MLlib
persist stream data in memory
enable and configure checkpointing
deploy applications
monitor applications
reduce batch processing times
set the right batch interval
tune memory usage
describe fault tolerance semantics
perform transformations on Dstreams
MLlib, GraphX, and R
start the course
describe data types
recall the basic statistics
describe linear SVMs
perform logistic regression
use naïve bayes
create decision trees
use collaborative filtering with ALS
perform clustering with K-means
perform clustering with LDA
perform analysis with frequent pattern mining
describe the property graph
describe the graph operators
perform analytics with neighborhood aggregation
perform messaging with Pregel API
build graphs
describe vertex and edge RDDs
optimize representation through partitioning
measure vertices with PageRank
install SparkR
run SparkR
use existing R packages
expose RDDs as distributed lists
convert existing RDDs into DataFrames
read and write parquet files
run SparkR on a cluster
use the algorithms and utilities in MLlib
Apache Kafka
start the course
describe the function of Apache Kafka
describe the architecture of Apache Kafka
describe Apache Kafka topics
describe Apache Kafka partitions
describe Apache Kafka replicas
describe Apache Kafka producers
describe Apache Kafka consumers
describe Apache Kafka brokers
describe common hardware and OS specifications and their impact in Apache Kafka
describe the main options to deploy Apache Kafka
deploy Apache Kafka to Red Hat and CentOS
deploy Apache Kafka to Puppet
add and remove a broker in Apache Kafka
move data and partitions in Apache Kafka for performance purposes
add a new topic in Apache Kafka
scale a producer in Apache Kafka
scale a consumer in Apache Kafka
monitor Apache Kafka using the web console
monitor Apache Kafka using the offset monitor
monitor Apache Kafka using Graphite
monitor Apache Kafka using JMX
monitor Apache Kafka using the log files
tune the Linux kernel for Apache Kafka
tune Linux systems disk throughput for Apache Kafka
tune the Java VM for Apache Kafka
configure and manage Apache Kafka
APACHE KAFKA DEVELOPMENT
start the course
describe the high-level consumer API for reading from Apache Kafka
describe the simple consumer API for reading from Apache Kafka
describe the Hadoop consumer API for reading from Apache Kafka
configure Apache Kafka brokers
configure Apache Kafka consumers
configure Apache Kafka producers
configure compression in Apache Kafka
describe the producer API in Apache Kafka
describe the SyncProducer API in Apache Kafka
describe the AsyncProducer API in Apache Kafka
configure message acknowledgement, or acking, in Apache Kafka
batch messages in Apache Kafka
specify keyed and non-keyed messages in Apache Kafka
configure broker discovery in Apache Kafka
use Apache Kafka test suites for testing
configure serialization and deserialization in Apache Kafka
build a custom serializer in Apache Kafka
configure a broker and create a producer and a consumer in Apache Kafka
OR
Apache Storm Introduction
ARCHITECTURE AND INSTALLATION
start the course
describe in a higher scope, Apache Storm and its characteristics
describe why Apache Storm is used
describe the Apache Storm Architecture
identify a tuple and a bolt and their use in Storm
identify a spout and its use in Storm
identify streams and their use in Storm
describe the different operation modes of Storm
identify Storm components and their functionality in the source code for an example Storm application
describe the setup process for an Integrated Storm development environment
use Maven to compile and run a Storm application
describe the installation and setup process for ZooKeeper as a standalone server
install and set up ZooKeeper on a development machine
deploy a ZooKeeper server in standalone mode and test it with a ZooKeeper client connection
describe the process for setting up and deploying a ZooKeeper cluster
demonstrate the process of setting up a production Storm cluster
describe the process of configuring the parallelism of a topology
configure the parallelism of spout and bolt components in a Storm topology
describe briefly stream groupings and their types
use stream groupings in a Storm topology
describe the Guaranteed Messaging Process
describe the fault-tolerant characteristics of Storm
describe briefly what Trident is and how it's used
describe Trident's data model and its use
describe several operations of Trident
test your knowledge of Apache Storm and the components of the system
Other courses offered by Skillsoft
Big Data Hadoop Expert at Skillsoft Students Ratings & Reviews
- 4-52