UDEMY
UDEMY Logo

Azure Databricks and Spark SQL (Python) 

  • Offered byUDEMY

Azure Databricks and Spark SQL (Python)
 at 
UDEMY 
Overview

Master Azure Databricks with PySpark: Your Hands-On Guide to Advanced Data Engineering and Analysis (DP203)

Duration

13 hours

Total fee

449

Mode of learning

Online

Credential

Certificate

Azure Databricks and Spark SQL (Python)
 at 
UDEMY 
Highlights

  • 30-Day Money-Back Guarantee
  • Certificate of completion
  • Full lifetime access
  • Learn from 7 downloadable resources and 12 articles
Read more
Details Icon

Azure Databricks and Spark SQL (Python)
 at 
UDEMY 
Course details

What are the course deliverables?
  • Azure Databricks
  • Data Lakehouse
  • Delta Lake
  • Spark SQL
  • PySpark
  • Big Data
  • Real World Scenarios
  • CI/CD on Databricks
  • Source Control with Databricks Repos
More about this course
  • Databricks is one of the most in demand big data tools around. It is a fast, easy, and collaborative Spark based big data analytics service designed for data science, ML and data engineering workflows.The course is packed with lectures, code-along videos and dedicated challenge sections. This should be more than enough to keep you engaged and learning! As an added bonus you will also have lifetime access to all the lectures... and I have provided detailed notebooks as a downloadable asset, the notebooks will contain step by step documentation with additional resources and links.I have ensured that the delivery of the course is engaging and concise, the curriculum is extensive yet delivered in an efficient way. The course will provide you with hands-on training utilising a variety of different data sets.The course is aimed at teaching you PySpark, Spark SQL in Python and the Databricks Lakehouse Architecture.You will primarily be using Databricks on Microsoft Azure in addition to other services such as Azure Data Lake Storage Gen 2, Azure Repos and Azure DevOps.The course will cover a variety of areas including:Set Up and OverviewAzure Databricks NotebooksSpark SQLReading and Writing DataData Analysis and Transformation with Spark SQL in PythonCharts and Dashboards in Databricks NotebooksDatabricks Medallion ArchitectureAccessing Data in Cloud Object StorageHive MetastoreDatabases, Tables and Views in DatabricksDelta Lake / Databricks Lakehouse ArchitectureSpark Structured StreamingDelta Live TablesDatabricks JobsAccess Control Lists (ACLs)Databricks CLISource Control with Databricks ReposCI/CD on Databricks
Read more

Azure Databricks and Spark SQL (Python)
 at 
UDEMY 
Curriculum

Course Overview / Introduction to Spark and Databricks

Course Introduction

Big Data

Hadoop, Spark and Databricks

Apache Spark Architecture

Spark vs Databricks Comparison

Resource: Comparing Apache Spark vs Databricks

Azure and Databricks Set Up

Azure Account Set Up

Azure UI Overview

Resource: Azure Resources

Creating your Databricks Service

Databricks UI Overview

Clusters

Resource: Pricing, Cluster Pools and Runtime Versions

How to use Databricks Notebooks

User Interface Changes

Mix Languages and add Markdown text in your Notebook

Databricks Utilities Module and FileStore Utilities

Resource: How to use Notebooks

IMPORTANT - Download Course Resource Notebooks

Cost Management and Cancelling your Subscription

Resource: Cancelling your Azure Subscription

Reading and Writing Data

Dataset Download

Databricks FileStore (DBFS Browser)

Resource: File Types

Reading Data

Writing Data

Parquet Files

Deleting Files and Folders

Data Analysis and Transformation with SparkSQL

Selecting and Renaming Columns

Adding New Columns

Changing Data Types

Math Functions and Simple Arithmetic

Sort Functions

String Functions

Datetime Functions

Filtering DataFrames

Conditional Statements

Using SQL Expressions with expr()

Removing Columns

Grouping your DataFrame

Pivot your DataFrame

Joining DataFrames

Union

Unpivot your DataFrame

Pandas

Utilising the Medallion Architecture in Databricks

Medallion Architecture

Resource: Medallion Architecture

Challenge Section: Customer Orders

Dataset Download and DBFS Upload

Assignment 1: Bronze to Silver

Assignment 1 Solutions Walkthrough

Assignment 2: Silver to Gold

Assignment 2 Solutions Walkthrough

Visualizations and Dashboards

Visualizations and Dashboards

Accessing Data from Azure Data Lake Storage (ADLS) with Databricks

Creating an ADLS Gen2 Account

(Optional) Storage Explorer

Accessing via Access Keys

Accessing via SAS Token

Mounting ADLS to DBFS Overview

Mounting ADLS to DBFS Demo

Secret Scopes

End to End Walkthrough Example

Hive Metastore, Databases, Tables and Views

Running SQL on DataFrames

Hive Metastore and Creating Databases

Managed Tables

Specifying a Location for your Underlying Managed Table Data

Unmanaged (External) Tables

Permanent Views

Challenge Section: Employees

Dataset Download and ADLS Upload

Assignment: Employees

Assignment Solutions Walkthrough

Databricks Data Lakehouse / Delta Lake

Databricks Data Lakehouse / Delta Lake Overview

Delta Lake Data Files

Deleting and Updating Records

Merge Into

Table Utility Commands

Modularize Code and Link Notebooks

Running a Notebook from another Notebook

Text Widgets

Defining Functions to Store Reusable Logic

Registering Functions for SQL

Python User Defined Functions (UDFs)

Limitations of UDFs

Challenge Section: Health Updates

Dataset Download and Overview

Assignment 1 Overview

Assignment 1 Solutions Walthrough

Assignment 2 Overview (Difficult!)

Assignment 2 Solutions Walkthrough

Spark Structured Streaming and Auto Loader

Spark Structured Streaming Overview

ADLS Preparation for this Section

Streaming Dataset "Simulator" Notebook

Reading a Data Stream

Reminder to Manually Cancel your Data Streams

Writing to a Data Stream

Additional Options

Auto Loader

Delta Live Tables

Delta Live Overview

Databricks Premium Resource Creation

ADLS Preparation for this Section

Demo 1: Live Tables

Table Data and Pipeline Metadata

Demo 2: Data Quality Checks

Streaming Dataset "Simulator"

Demo 3: Streaming Live Tables

Demo 4: Additional Properties and Views

Databricks Jobs

Orchestrating Tasks with Databricks Jobs

Task Failures

Parameters and Task Values

Job Scheduling with CRON Syntax

Access Control Lists (ACLs)

Overview of ACLs

Adding a New User to our Workspace

Workspace Access Control

Cluster Access Control

Groups

Wrap Up

Resource: Access Control

Databricks CLI (Command Line Interface)

Databricks CLI Overview

Command Line Basics (Navigating Directories)

Installing, Setting Up and Authenticating the Databricks CLI

Demo 1 (Cluster CLI Commands)

Demo 2 (Workspace CLI Commands)

Resource: CLI Commands

Source Control with Databricks Repos and Azure DevOps

Source Control on Databricks

Azure DevOps Account Set Up

Parallelism Request

Connecting our Azure DevOps Repo to Databricks

Databricks Repos Demo 1

Demo 2

Demo 3

Running Jobs on Remote Git Notebooks

Resources: Additional Information

CI/CD with Databricks

CI/CD and Solution Overview

Reminder: Parallelism Request Form

Build Pipeline (CI)

Deploy to Test (CD)

Deploy to Production (CD)

Resources: Additional Information on CI CD

Continue learning with me!

BONUS: Check out my other courses

Faculty Icon

Azure Databricks and Spark SQL (Python)
 at 
UDEMY 
Faculty details

Malvik Vaghadia
Designation : Data and BI specialist

Other courses offered by UDEMY

549
50 hours
– / –
3 K
10 hours
– / –
549
4 hours
– / –
599
10 hours
– / –
View Other 2344 CoursesRight Arrow Icon
qna

Azure Databricks and Spark SQL (Python)
 at 
UDEMY 

Student Forum

chatAnything you would want to ask experts?
Write here...