Databricks - Apache Spark (TM) SQL for Data Analysts
- Offered byCoursera
Apache Spark (TM) SQL for Data Analysts at Coursera Overview
Duration | 14 hours |
Start from | Start Now |
Total fee | Free |
Mode of learning | Online |
Difficulty level | Intermediate |
Official Website | Explore Free Course |
Credential | Certificate |
Apache Spark (TM) SQL for Data Analysts at Coursera Highlights
- Earn a shareable certificate upon completion.
- Flexible deadlines according to your schedule.
Apache Spark (TM) SQL for Data Analysts at Coursera Course details
- Apache Spark is one of the most widely used technologies in big data analytics. In this course, you will learn how to leverage your existing SQL skills to start working with Spark immediately. You will also learn how to work with Delta Lake, a highly performant, open-source storage layer that brings reliability to data lakes. By the end of this course, you will be able to use Spark SQL and Delta Lake to ingest, transform, and query data to extract valuable insights that can be shared with your team.
Apache Spark (TM) SQL for Data Analysts at Coursera Curriculum
Welcome to Apache Spark SQL for Data Analysts
Course goals
Before you begin
End of module knowledge check
Spark makes big data easy
Introduction to module 2
What is big data?
Common struggles with big data
Big Data Needs
Apache Spark Intro
Spark SQL
Module 2 Concept Review
Using Spark SQL on Databricks
Introduction to Module 3
Signing up for Databricks Community Edition
Preparing your workspace
Working with notebooks
Using course materials
Basic queries with Spark SQL reading introduction
Data Visualization on Databricks reading introduction
Data visualization tools
Exploratory Data Analysis lab introduction
Course Materials
Basic Queries reading activity
Data Visualization reading activity
Your turn! Exploratory Data Analysis lab
Module 3 Concept Review
3.3 Exploratory Data Analysis Quiz
Spark Under the Hood
Introduction to module 4
Understanding optimizations
The physical cluster
The SparkUI and SQL tab
Optimizing query logic
Impact of Caching
Optimizing with selective data loading
Module 4 Concept Review
Complex Queries
Introduction to module 5
What is nested data?
Introduction to managing nested data
Introduction to Manipulating Data
Introduction to Data Munging
Managing Nested Data reading activity
Manipulating Data reading activity
5.3 Data Munging Lab
Module 5 Concept Review
Lab 5.3 Quiz
Applied Spark SQL
Introduction to module 6
Complex data - common strategies
About higher-order functions
Higher-order functions introduction
Introducing Aggregating and Summarizing Data
Partitioning Tables Introduction
Sharing Insights Lab Introduction
Higher Order Functions reading activity
Aggregating and Summarizing Data reading activity
Partitioning Tables
Sharing Insights
Module 6 concept review
Lab 6.4 Quiz
Data Storage and Optimization
Introduction to module 7
A quick refresher
Introducing a new data management paradigm
Introduction to the lesson
What is Delta Lake
Data Warehouses
Data Lakes
Data Lakes vs Data Warehouses
The Lakehouse
Delta Lake with Spark SQL
Introduction to the module
Intro to Using Delta reading
Managing Records in a Delta table
Delta Engine Optimization Introduction
Delta Lake Lab Introduction
8.1 Using Delta
8.2 Managing records
8.3 Optimizing Delta
Delta Lab
8.4 Delta Lab
SQL Coding Challenges
SQL coding challenges
Final Exam
Apache Spark (TM) SQL for Data Analysts at Coursera Admission Process
Important Dates
Other courses offered by Coursera
Apache Spark (TM) SQL for Data Analysts at Coursera Students Ratings & Reviews
- 4-51