Data engineering with Azure Databricks
- Offered byMicrosoft
Data engineering with Azure Databricks at Microsoft Overview
Duration | 10 hours |
Total fee | Free |
Mode of learning | Online |
Schedule type | Self paced |
Difficulty level | Intermediate |
Official Website | Explore Free Course |
Credential | Certificate |
Data engineering with Azure Databricks at Microsoft Highlights
- Learn how to use Delta Lake to create, append, and upsert data to Apache Spark tables, taking advantage of built-in reliability and optimizations
- Learn how Structured Streaming helps you process streaming data in real time
- Know how to integrate with Azure Synapse Analytics as part of your data architecture
- Best practices for workspace administration, security, tools, integration, databricks runtime, HA/DR, and clusters in Azure Databricks
Data engineering with Azure Databricks at Microsoft Course details
- Describe Azure Databricks
- Spark architecture fundamentals
- Read and write data in Azure Databricks
- Work with DataFrames in Azure Databricks
- Describe lazy evaluation and other performance features in Azure Databricks
- Work with DataFrames columns in Azure Databricks
- Work with DataFrames advanced methods in Azure Databricks
- Describe platform architecture, security, and data protection in Azure Databricks
- Build and query a Delta Lake
- Process streaming data with Azure Databricks structured streaming
- Describe Azure Databricks Delta Lake architecture
- Create production workloads on Azure Databricks with Azure Data Factory
- Implement CI/CD with Azure DevOps
- Integrate Azure Databricks with Azure Synapse
- Describe Azure Databricks best practices
- Discover the capabilities of Azure Databricks and the Apache Spark notebook for processing huge files
- Understand the Azure Databricks platform and identify the types of tasks well-suited for Apache Spark
- Understand the architecture of an Azure Databricks Spark Cluster and Spark Jobs
- Advanced DataFrame functions operations to manipulate data, apply aggregates, and perform date and time operations in Azure Databricks
Data engineering with Azure Databricks at Microsoft Curriculum
MODULE:1
Explain Azure Databricks
Create an Azure Databricks workspace and cluster
Understand Azure Databricks Notebooks
Exercise: Work with Notebooks
MODULE:2
Understand the architecture of Azure Databricks spark cluster
Understand the architecture of spark job
Knowledge check
MODULE:3
Read data in CSV format
Read data in JSON format
Read data in Parquet format
Read data stored in tables and views
Write data
Exercises: Read and write data
MODULE:4
Describe a DataFrame
Use common DataFrame methods
Use the display function
Exercise: Distinct articles
MODULE:5
Describe the difference between eager and lazy execution
Describe the fundamentals of how the Catalyst Optimizer works
Define and identify actions and transformations
Describe performance enhancements enabled by shuffle operations and Tungsten
MODULE:6
Describe the column class
Work with column expressions
Exercise: Washingtons and Marthas
MODULE:7
Perform date and time manipulation
Use aggregate functions
Exercise: Deduplication of data
MODULE:8
Describe the Azure Databricks platform architecture
Perform data protection
Describe Azure key vault and Databricks security scopes
Secure access with Azure IAM and authentication
Describe security
Exercise: Access Azure Storage with key vault-backed secrets
MODULE:9
Describe the open source Delta Lake
Exercise: Work with basic Delta Lake functionality
Describe how Azure Databricks manages Delta Lake
Exercise: Use the Delta Lake Time Machine and perform optimization
MODULE:10
Describe Azure Databricks structured streaming
Perform stream processing using structured streaming
Work with Time Windows9 min
Process data from Event Hubs with structured streaming