Skillsoft
Skillsoft Logo

Apache Spark Getting Started 

  • Offered bySkillsoft

Apache Spark Getting Started
 at 
Skillsoft 
Overview

A foundational understanding of how to leverage this powerful data processing engine for large-scale data analytics

Duration

1 hour

Mode of learning

Online

Difficulty level

Beginner

Official Website

Go to Website External Link Icon

Credential

Certificate

Future job roles

CRUD, .Net, CSR, Credit risk, Senior Software Developer

Apache Spark Getting Started
 at 
Skillsoft 
Highlights

  • Earn a digital badge after completion of course from Skillsoft
Details Icon

Apache Spark Getting Started
 at 
Skillsoft 
Course details

What are the course deliverables?
Recognize where spark fits in with hadoop and its components
Describe spark rdds and their characteristics, including what makes them resilient and distributed
Identify the types of operations which are permitted on an rdd and describe how rdd transformations are lazily evaluated
Distinguish between rdds and dataframes and describe the relationship between the two
List the crucial components of spark and the relationships between them and recognize the functions of the spark session, master and worker nodes
Install pyspark and initialize a spark context
Create and load data into an rdd
Initialize a spark dataframe from the contents of an rdd
Work with spark dataframes containing both primitive and structured data types
Define the contents of a dataframe using the sqlcontext
Apply the map() function on an rdd to configure a dataframe with column headers
Retrieve required data from within a dataframe and define and apply transformations on a dataframe
Convert spark dataframes to pandas dataframes and vice versa
Describe basic spark concepts
Read more
More about this course

Explore the basics of Apache Spark, an analytics engine used for big data processing,it's an open source, cluster computing framework built on top of Hadoop

Discover how it allows operations on data with both its own library methods and with SQL, while delivering great performance

Learn the characteristics, components, and functions of Spark, Hadoop, RDDS, the spark session, and master and worker notes. Install PySpark

Apache Spark Getting Started
 at 
Skillsoft 
Curriculum

Course Overview

Introduction to Spark and Hadoop

Resilient Distributed Datasets (RDDs)

RDD Operations

Spark DataFrames

Spark Architecture

Spark Installation

Working with RDDs

Creating DataFrames from RDDs

Contents of a DataFrame

The SQLContext

The map() Function of an RDD

Accessing the Contents of a DataFrame

DataFrames in Spark and Pandas

Exercise: Working with Spark

Other courses offered by Skillsoft

5.03 K
6 hours
Intermediate
6.01 K
3 hours
Intermediate
6.01 K
3 hours
Intermediate
11.83 K
1 hours
Intermediate
View Other 249 CoursesRight Arrow Icon
qna

Apache Spark Getting Started
 at 
Skillsoft 

Student Forum

chatAnything you would want to ask experts?
Write here...