Azure Databricks and Spark SQL (Python)
- Offered byUDEMY
Azure Databricks and Spark SQL (Python) at UDEMY Overview
Duration | 13 hours |
Total fee | ₹449 |
Mode of learning | Online |
Credential | Certificate |
Azure Databricks and Spark SQL (Python) at UDEMY Highlights
- 30-Day Money-Back Guarantee
- Certificate of completion
- Full lifetime access
- Learn from 7 downloadable resources and 12 articles
Azure Databricks and Spark SQL (Python) at UDEMY Course details
- Azure Databricks
- Data Lakehouse
- Delta Lake
- Spark SQL
- PySpark
- Big Data
- Real World Scenarios
- CI/CD on Databricks
- Source Control with Databricks Repos
- Databricks is one of the most in demand big data tools around. It is a fast, easy, and collaborative Spark based big data analytics service designed for data science, ML and data engineering workflows.The course is packed with lectures, code-along videos and dedicated challenge sections. This should be more than enough to keep you engaged and learning! As an added bonus you will also have lifetime access to all the lectures... and I have provided detailed notebooks as a downloadable asset, the notebooks will contain step by step documentation with additional resources and links.I have ensured that the delivery of the course is engaging and concise, the curriculum is extensive yet delivered in an efficient way. The course will provide you with hands-on training utilising a variety of different data sets.The course is aimed at teaching you PySpark, Spark SQL in Python and the Databricks Lakehouse Architecture.You will primarily be using Databricks on Microsoft Azure in addition to other services such as Azure Data Lake Storage Gen 2, Azure Repos and Azure DevOps.The course will cover a variety of areas including:Set Up and OverviewAzure Databricks NotebooksSpark SQLReading and Writing DataData Analysis and Transformation with Spark SQL in PythonCharts and Dashboards in Databricks NotebooksDatabricks Medallion ArchitectureAccessing Data in Cloud Object StorageHive MetastoreDatabases, Tables and Views in DatabricksDelta Lake / Databricks Lakehouse ArchitectureSpark Structured StreamingDelta Live TablesDatabricks JobsAccess Control Lists (ACLs)Databricks CLISource Control with Databricks ReposCI/CD on Databricks
Azure Databricks and Spark SQL (Python) at UDEMY Curriculum
Course Overview / Introduction to Spark and Databricks
Course Introduction
Big Data
Hadoop, Spark and Databricks
Apache Spark Architecture
Spark vs Databricks Comparison
Resource: Comparing Apache Spark vs Databricks
Azure and Databricks Set Up
Azure Account Set Up
Azure UI Overview
Resource: Azure Resources
Creating your Databricks Service
Databricks UI Overview
Clusters
Resource: Pricing, Cluster Pools and Runtime Versions
How to use Databricks Notebooks
User Interface Changes
Mix Languages and add Markdown text in your Notebook
Databricks Utilities Module and FileStore Utilities
Resource: How to use Notebooks
IMPORTANT - Download Course Resource Notebooks
Cost Management and Cancelling your Subscription
Resource: Cancelling your Azure Subscription
Reading and Writing Data
Dataset Download
Databricks FileStore (DBFS Browser)
Resource: File Types
Reading Data
Writing Data
Parquet Files
Deleting Files and Folders
Data Analysis and Transformation with SparkSQL
Selecting and Renaming Columns
Adding New Columns
Changing Data Types
Math Functions and Simple Arithmetic
Sort Functions
String Functions
Datetime Functions
Filtering DataFrames
Conditional Statements
Using SQL Expressions with expr()
Removing Columns
Grouping your DataFrame
Pivot your DataFrame
Joining DataFrames
Union
Unpivot your DataFrame
Pandas
Utilising the Medallion Architecture in Databricks
Medallion Architecture
Resource: Medallion Architecture
Challenge Section: Customer Orders
Dataset Download and DBFS Upload
Assignment 1: Bronze to Silver
Assignment 1 Solutions Walkthrough
Assignment 2: Silver to Gold
Assignment 2 Solutions Walkthrough
Visualizations and Dashboards
Visualizations and Dashboards
Accessing Data from Azure Data Lake Storage (ADLS) with Databricks
Creating an ADLS Gen2 Account
(Optional) Storage Explorer
Accessing via Access Keys
Accessing via SAS Token
Mounting ADLS to DBFS Overview
Mounting ADLS to DBFS Demo
Secret Scopes
End to End Walkthrough Example
Hive Metastore, Databases, Tables and Views
Running SQL on DataFrames
Hive Metastore and Creating Databases
Managed Tables
Specifying a Location for your Underlying Managed Table Data
Unmanaged (External) Tables
Permanent Views
Challenge Section: Employees
Dataset Download and ADLS Upload
Assignment: Employees
Assignment Solutions Walkthrough
Databricks Data Lakehouse / Delta Lake
Databricks Data Lakehouse / Delta Lake Overview
Delta Lake Data Files
Deleting and Updating Records
Merge Into
Table Utility Commands
Modularize Code and Link Notebooks
Running a Notebook from another Notebook
Text Widgets
Defining Functions to Store Reusable Logic
Registering Functions for SQL
Python User Defined Functions (UDFs)
Limitations of UDFs
Challenge Section: Health Updates
Dataset Download and Overview
Assignment 1 Overview
Assignment 1 Solutions Walthrough
Assignment 2 Overview (Difficult!)
Assignment 2 Solutions Walkthrough
Spark Structured Streaming and Auto Loader
Spark Structured Streaming Overview
ADLS Preparation for this Section
Streaming Dataset "Simulator" Notebook
Reading a Data Stream
Reminder to Manually Cancel your Data Streams
Writing to a Data Stream
Additional Options
Auto Loader
Delta Live Tables
Delta Live Overview
Databricks Premium Resource Creation
ADLS Preparation for this Section
Demo 1: Live Tables
Table Data and Pipeline Metadata
Demo 2: Data Quality Checks
Streaming Dataset "Simulator"
Demo 3: Streaming Live Tables
Demo 4: Additional Properties and Views
Databricks Jobs
Orchestrating Tasks with Databricks Jobs
Task Failures
Parameters and Task Values
Job Scheduling with CRON Syntax
Access Control Lists (ACLs)
Overview of ACLs
Adding a New User to our Workspace
Workspace Access Control
Cluster Access Control
Groups
Wrap Up
Resource: Access Control
Databricks CLI (Command Line Interface)
Databricks CLI Overview
Command Line Basics (Navigating Directories)
Installing, Setting Up and Authenticating the Databricks CLI
Demo 1 (Cluster CLI Commands)
Demo 2 (Workspace CLI Commands)
Resource: CLI Commands
Source Control with Databricks Repos and Azure DevOps
Source Control on Databricks
Azure DevOps Account Set Up
Parallelism Request
Connecting our Azure DevOps Repo to Databricks
Databricks Repos Demo 1
Demo 2
Demo 3
Running Jobs on Remote Git Notebooks
Resources: Additional Information
CI/CD with Databricks
CI/CD and Solution Overview
Reminder: Parallelism Request Form
Build Pipeline (CI)
Deploy to Test (CD)
Deploy to Production (CD)
Resources: Additional Information on CI CD
Continue learning with me!
BONUS: Check out my other courses