Big Data Hadoop & Spark Developer
- Offered bySkillsoft
Big Data Hadoop & Spark Developer at Skillsoft Overview
Duration | 26 hours |
Total fee | ₹15,425 |
Mode of learning | Online |
Difficulty level | Intermediate |
Credential | Certificate |
Future job roles | FTA, CRUD, .Net, CSR, Senior Software Developer |
Big Data Hadoop & Spark Developer at Skillsoft Highlights
- Certification from Naukri Learning, Content aligned with most Certifying bodies
- 400mn+ users & used by Professionals in 70% of Fortune 500 companies
Big Data Hadoop & Spark Developer at Skillsoft Course details
- Technical personnel with a background in Linux, SQL, and programming who intend to join a Hadoop Engineering team in roles such as Hadoop developer, data architect, or data engineer or roles related to technical project management, cluster operations, or data analysis
- Unlimited Access to Online Content for six months
- Course Completion certificate - renowned globally
- 400mn+ users, World's No 1 & trained 70% of Fortune 500 companies
- Career boost for students and professionals
- Spark Core provides basic I/O functionalities, distributed task dispatching, and scheduling. Resilient Distributed Datasets (RDDs) are logical collections of data partitioned across machines. RDDs can be created by referencing datasets in external storage systems, or by applying transformations on existing RDDs. In this course, you will learn how to improve Spark's performance and work with Data Frames and Spark SQL. Spark Streaming leverages Spark's language-integrated API to perform streaming analytics. This design enables the same set of application code written for batch processing to join streams against historical data, or run ad-hoc queries on stream state. In this course, you will learn how to work with different input streams, perform transformations on streams, and tune up performance. MLlib is Spark's machine learning library. GraphX is Spark's API for graphs and graph-parallel computation. SparkR exposes the API and allows users to run jobs from the R shell on a cluster. In this course, you will learn how to work with each of these libraries.
Big Data Hadoop & Spark Developer at Skillsoft Curriculum
Big Data & Hadoop
Ecosystem for Hadoop
Installation of Hadoop
Data Repository with HDFS and Hbase
Data Repository with Flume
Data Refinery with YARN and MapReduce
Data Factory with Hive
Data Factory with Oozie and Hue
Data Flow for the Hadoop Ecosystem
Apache Spark
Spark Core
start the course
recall what is included in the Spark Stack
define lazy evaluation as it relates to Spark
recall that RDD is an interface comprised of a set of partitions, list of dependencies, and functions to compute
pre-partition an RDD for performance
store RDDS in serialized form
perform numeric operations on RDDs
create custom accumulators
use broadcast functionality for optimization
pipe to external applications
adjust garbage collection settings
perform batch import on a Spark cluster
determine memory consumption
tune data structures to reduce memory consumption
use Spark's different shuffle operations to minimize memory usage of reduce tasks
set the levels of parallelism for each operation
create DataFrames
interoperate with RDDs
describe the generic load and save functions
read and write Parquet files
use JSON Dataset as a DataFrame
read and write data in Hive tables
read and write data using JDBC
run the Thrift JDBC/OCBC server
show the different ways to tune up Spark for better performance
Spark Streaming
start the course
describe what a DStream is
recall how TCP socket input streams are ingested
describe how file input streams are read
recall how Akka Actor input streams are received
describe how Kafka input streams are consumed
recall how Flume input streams are ingested
set up Kinesis input streams
configure Twitter input streams
implement custom input streams
describe receiver reliability
use the UpdateStateByKey operation
perform transform operations
perform Window operations
perform join operations
use output operations on Streams
use DataFrame and SQL operations on streaming data
use learning algorithms with MLlib
persist stream data in memory
enable and configure checkpointing
deploy applications
monitor applications
reduce batch processing times
set the right batch interval
tune memory usage
describe fault tolerance semantics
perform transformations on Dstreams
MLlib, GraphX, and R
start the course
describe data types
recall the basic statistics
describe linear SVMs
perform logistic regression
use naïve bayes
create decision trees
use collaborative filtering with ALS
perform clustering with K-means
perform clustering with LDA
perform analysis with frequent pattern mining
describe the property graph
describe the graph operators
perform analytics with neighborhood aggregation
perform messaging with Pregel API
build graphs
describe vertex and edge RDDs
optimize representation through partitioning
measure vertices with PageRank
install SparkR
run SparkR
use existing R packages
expose RDDs as distributed lists
convert existing RDDs into DataFrames
read and write parquet files
run SparkR on a cluster
use the algorithms and utilities in MLlib
Other courses offered by Skillsoft
Big Data Hadoop & Spark Developer at Skillsoft Students Ratings & Reviews
- 4-53