Advanced Data Engineering
- Offered byCoursera
Advanced Data Engineering at Coursera Overview
Duration | 23 hours |
Start from | Start Now |
Total fee | Free |
Mode of learning | Online |
Official Website | Explore Free Course |
Credential | Certificate |
Advanced Data Engineering at Coursera Highlights
- Earn a certificate from Coursera
- Add to your LinkedIn profile
- 14 quizzes
Advanced Data Engineering at Coursera Course details
- Create and manage data pipelines and their lifecycle
- Connect and work with message queues to manage data processing
- Use vector, graph, and key/value databases for data storage at scale
- In this advanced course, you will gain practical expertise in scaling data engineering systems using cutting-edge tools and techniques
- This course is designed for data scientists, data engineers, and anyone with a foundational understanding of data handling who desires to escalate their skills to handle larger, more complex datasets efficiently
- Throughout the course, you'll master the application of technologies such as Celery with RabbitMQ for scalable data consumption, Apache Airflow for optimized workflow management, and Vector and Graph databases for robust data management at scale
- The course will culminate with hands-on projects that offer real-world experience, where you'll put your acquired skills to test in solving data engineering challenges
- You will not only learn to create scalable data systems but also to analyze their performance and make necessary adjustments for optimum results
- This invaluable experience in advanced data engineering techniques will prepare you for the demanding tasks of handling massive datasets, streamlining complex workflows, and optimizing data operations for businesses of any scale
Advanced Data Engineering at Coursera Curriculum
Queues and Databases-RabbitMQ and MySQL
Meet your instructor: Alfredo Deza
About this course
Introduction
Overview of Queues
What is Celery?
Use cases for RabbitMQ
Overview of a Flask and Celery application
Summary
Introduction
Configuring Celery with Flask
Connecting Celery with RabbitMQ
Defining a Celery task in Flask
Fire and forget task in Flask
Retrieve values from asynchronous tasks
Summary
MySQL Overview
MySQL from Terminal
Archive and Drop Database
Import external database Sakila
Modify database Sakila
Bash pipelines with MySQL
MySQL to Python Standard Library Web Server
Connect with your instructor
Meet your instructor: Noah Gift
Course structure and discussion etiquette
Key Terms
Introduction to Celery
Using RabbitMQ with Docker
External lab: Start RabbitMQ in a development environment
Key Terms
Build a web app by using Python and Flask
Background tasks with Celery
External lab: Add a new Celery task for RabbitMQ
Key Terms
Getting Started with MySQL
Lesson Reflection
Queues and Databases - Final week quiz
Introduction to RabbitMQ and Flask
RabbitMQ with Celery and Flask
Quiz-MySQL for Data Engineering
Meet and greet (optional)
Linux Hacking with MySQL
Optimizing Workflow Management at Scale with Apache Airflow
Introduction
What is Apache Airflow?
Installing Apache Airflow from PyPI
Using Apache Airflow with Docker
Exploring the Airflow UI
Introduction
Exploring directed acyclic graphs (DAG)
Creating a DAG
Running a backfill
Testing and validation
Summary
Introduction
Identifying a task to build a DAG
Retrieving remote data
Cleaning and normalizing data
Inspecting the UI for results
Summary
Key Terms
What is Apache Airflow
Exploring the Airflow User Interface
External lab: Install Apache Airflow
Lesson Reflection
Key Terms
External lab: Create a DAG
Architecture overview
Lesson Reflection
Key Terms
External Lab: Build a data pipeline for census data
Build Data Pipelines with Apache Airflow
Lesson Reflection
Final Week Quiz-Optimizing Workflow Management at Scale with Apache Airflow
Quiz-Installing Apache Airflow
Quiz-Apache Airflow Fundamentals
Quiz-Creating a pipeline
Achieving Scalability with Vector, Graph, and Key/Value Databases
Picking the proper database
What are vector databases and how they work
Implementing Semantic search
Quickstart Qdrant
Qdrant Rust Client
Vector Database Architectures
Hands-on lab: Enhance Semantic Search
Graph data models and database concepts
Introduction to Amazon Neptune
Graph algorithms: UFC graph centrality in Rust
Kosaraju Community Detection in Graphs
Shortest Path with Graphs
Key Components of Rust CLI Tool
Lab Walkthrough: Building a Rust Graph CLI Tool
Key Terms
What is a Vector Database?
External Lab: Run Quickstart of qdrant
External Lab: Extend Semantic Search
Jaccard index
Lesson Reflection
Key Terms
Rust CLI with Clap
External Lab: Rust Graph CLI Tool
Amazon Neptune
Lesson Reflection
Final Quiz-Achieving Scalability with Vector, Graph, and Key/Value Databases
Quiz-Introduction to Vector Databases
Quiz-Introduction to Graph Databases
Social Media Recommender
Real-world Advanced Data Engineering Projects
Learn AWS CloudShell for Dynamo Development
Learn AWS CodeCatalyst for Dynamo Development
Leveraging AWS CodeWhisperer for Dynamo Development
Create a Table with CLI
Populate a Table With Batching Records
Query a Table with Records
Project Walkthrough
Introduction
Overview of a pipeline requirements
Using SqlAlchemy with Pandas
Persisting data in a task
Reviewing the results
Summary
Key Terms
Amazon CodeCatalyst
Lesson Reflection
External Lab: Extended DynamoDB
Key Terms
Quick start for SQLAlchemy
Explore and analyze data with Python
Lesson Reflection
Recommended Next Steps
Final Quiz-Advanced Data Engineering
Quiz-Building a solution with DynamoDB with the AWS CLI
Quiz-Persisting data through a multi-task DAG with Pandas
Jupyter Sandbox
VS Code Sandbox