Microsoft
Microsoft Logo

Data engineering with Azure Databricks 

  • Offered byMicrosoft

Data engineering with Azure Databricks
 at 
Microsoft 
Overview

Learn how to harness the power of Apache Spark and powerful clusters running on the Azure Databricks platform

Duration

10 hours

Total fee

Free

Mode of learning

Online

Schedule type

Self paced

Difficulty level

Intermediate

Official Website

Explore Free Course External Link Icon

Credential

Certificate

Data engineering with Azure Databricks
 at 
Microsoft 
Highlights

  • Learn how to use Delta Lake to create, append, and upsert data to Apache Spark tables, taking advantage of built-in reliability and optimizations
  • Learn how Structured Streaming helps you process streaming data in real time
  • Know how to integrate with Azure Synapse Analytics as part of your data architecture
  • Best practices for workspace administration, security, tools, integration, databricks runtime, HA/DR, and clusters in Azure Databricks
Read more
Details Icon

Data engineering with Azure Databricks
 at 
Microsoft 
Course details

What are the course deliverables?
  • Describe Azure Databricks
  • Spark architecture fundamentals
  • Read and write data in Azure Databricks
  • Work with DataFrames in Azure Databricks
  • Describe lazy evaluation and other performance features in Azure Databricks
  • Work with DataFrames columns in Azure Databricks
  • Work with DataFrames advanced methods in Azure Databricks
  • Describe platform architecture, security, and data protection in Azure Databricks
  • Build and query a Delta Lake
  • Process streaming data with Azure Databricks structured streaming
  • Describe Azure Databricks Delta Lake architecture
  • Create production workloads on Azure Databricks with Azure Data Factory
  • Implement CI/CD with Azure DevOps
  • Integrate Azure Databricks with Azure Synapse
  • Describe Azure Databricks best practices
Read more
More about this course
  • Discover the capabilities of Azure Databricks and the Apache Spark notebook for processing huge files
  • Understand the Azure Databricks platform and identify the types of tasks well-suited for Apache Spark
  • Understand the architecture of an Azure Databricks Spark Cluster and Spark Jobs
  • Advanced DataFrame functions operations to manipulate data, apply aggregates, and perform date and time operations in Azure Databricks

Data engineering with Azure Databricks
 at 
Microsoft 
Curriculum

MODULE:1

Explain Azure Databricks

Create an Azure Databricks workspace and cluster

Understand Azure Databricks Notebooks

Exercise: Work with Notebooks

MODULE:2

Understand the architecture of Azure Databricks spark cluster

Understand the architecture of spark job

Knowledge check

MODULE:3

Read data in CSV format

Read data in JSON format

Read data in Parquet format

Read data stored in tables and views

Write data

Exercises: Read and write data

MODULE:4

Describe a DataFrame

Use common DataFrame methods

Use the display function

Exercise: Distinct articles

MODULE:5

Describe the difference between eager and lazy execution

Describe the fundamentals of how the Catalyst Optimizer works

Define and identify actions and transformations

Describe performance enhancements enabled by shuffle operations and Tungsten

MODULE:6

Describe the column class

Work with column expressions

Exercise: Washingtons and Marthas

MODULE:7

Perform date and time manipulation

Use aggregate functions

Exercise: Deduplication of data

MODULE:8

Describe the Azure Databricks platform architecture

Perform data protection

Describe Azure key vault and Databricks security scopes

Secure access with Azure IAM and authentication

Describe security

Exercise: Access Azure Storage with key vault-backed secrets

MODULE:9

Describe the open source Delta Lake

Exercise: Work with basic Delta Lake functionality

Describe how Azure Databricks manages Delta Lake

Exercise: Use the Delta Lake Time Machine and perform optimization

MODULE:10

Describe Azure Databricks structured streaming

Perform stream processing using structured streaming

Work with Time Windows9 min

Process data from Event Hubs with structured streaming

Other courses offered by Microsoft

Free
2 hours
Intermediate
Free
4 hours
Intermediate
Free
5 hours
Beginner
Free
1 hours
Beginner
View Other 1171 CoursesRight Arrow Icon
qna

Data engineering with Azure Databricks
 at 
Microsoft 

Student Forum

chatAnything you would want to ask experts?
Write here...