Coursera
Coursera Logo

Advanced Data Engineering 

  • Offered byCoursera

Advanced Data Engineering
 at 
Coursera 
Overview

Equip participants with the skills to manage the increasing volume, velocity, and variety of data effectively

Duration

23 hours

Start from

Start Now

Total fee

Free

Mode of learning

Online

Official Website

Explore Free Course External Link Icon

Credential

Certificate

Advanced Data Engineering
 at 
Coursera 
Highlights

  • Earn a certificate from Coursera
  • Add to your LinkedIn profile
  • 14 quizzes
Details Icon

Advanced Data Engineering
 at 
Coursera 
Course details

What are the course deliverables?
  • Create and manage data pipelines and their lifecycle
  • Connect and work with message queues to manage data processing
  • Use vector, graph, and key/value databases for data storage at scale
More about this course
  • In this advanced course, you will gain practical expertise in scaling data engineering systems using cutting-edge tools and techniques
  • This course is designed for data scientists, data engineers, and anyone with a foundational understanding of data handling who desires to escalate their skills to handle larger, more complex datasets efficiently
  • Throughout the course, you'll master the application of technologies such as Celery with RabbitMQ for scalable data consumption, Apache Airflow for optimized workflow management, and Vector and Graph databases for robust data management at scale
  • The course will culminate with hands-on projects that offer real-world experience, where you'll put your acquired skills to test in solving data engineering challenges
  • You will not only learn to create scalable data systems but also to analyze their performance and make necessary adjustments for optimum results
  • This invaluable experience in advanced data engineering techniques will prepare you for the demanding tasks of handling massive datasets, streamlining complex workflows, and optimizing data operations for businesses of any scale
Read more

Advanced Data Engineering
 at 
Coursera 
Curriculum

Queues and Databases-RabbitMQ and MySQL

Meet your instructor: Alfredo Deza

About this course

Introduction

Overview of Queues

What is Celery?

Use cases for RabbitMQ

Overview of a Flask and Celery application

Summary

Introduction

Configuring Celery with Flask

Connecting Celery with RabbitMQ

Defining a Celery task in Flask

Fire and forget task in Flask

Retrieve values from asynchronous tasks

Summary

MySQL Overview

MySQL from Terminal

Archive and Drop Database

Import external database Sakila

Modify database Sakila

Bash pipelines with MySQL

MySQL to Python Standard Library Web Server

Connect with your instructor

Meet your instructor: Noah Gift

Course structure and discussion etiquette

Key Terms

Introduction to Celery

Using RabbitMQ with Docker

External lab: Start RabbitMQ in a development environment

Key Terms

Build a web app by using Python and Flask

Background tasks with Celery

External lab: Add a new Celery task for RabbitMQ

Key Terms

Getting Started with MySQL

Lesson Reflection

Queues and Databases - Final week quiz

Introduction to RabbitMQ and Flask

RabbitMQ with Celery and Flask

Quiz-MySQL for Data Engineering

Meet and greet (optional)

Linux Hacking with MySQL

Optimizing Workflow Management at Scale with Apache Airflow

Introduction

What is Apache Airflow?

Installing Apache Airflow from PyPI

Using Apache Airflow with Docker

Exploring the Airflow UI

Introduction

Exploring directed acyclic graphs (DAG)

Creating a DAG

Running a backfill

Testing and validation

Summary

Introduction

Identifying a task to build a DAG

Retrieving remote data

Cleaning and normalizing data

Inspecting the UI for results

Summary

Key Terms

What is Apache Airflow

Exploring the Airflow User Interface

External lab: Install Apache Airflow

Lesson Reflection

Key Terms

External lab: Create a DAG

Architecture overview

Lesson Reflection

Key Terms

External Lab: Build a data pipeline for census data

Build Data Pipelines with Apache Airflow

Lesson Reflection

Final Week Quiz-Optimizing Workflow Management at Scale with Apache Airflow

Quiz-Installing Apache Airflow

Quiz-Apache Airflow Fundamentals

Quiz-Creating a pipeline

Achieving Scalability with Vector, Graph, and Key/Value Databases

Picking the proper database

What are vector databases and how they work

Implementing Semantic search

Quickstart Qdrant

Qdrant Rust Client

Vector Database Architectures

Hands-on lab: Enhance Semantic Search

Graph data models and database concepts

Introduction to Amazon Neptune

Graph algorithms: UFC graph centrality in Rust

Kosaraju Community Detection in Graphs

Shortest Path with Graphs

Key Components of Rust CLI Tool

Lab Walkthrough: Building a Rust Graph CLI Tool

Key Terms

What is a Vector Database?

External Lab: Run Quickstart of qdrant

External Lab: Extend Semantic Search

Jaccard index

Lesson Reflection

Key Terms

Rust CLI with Clap

External Lab: Rust Graph CLI Tool

Amazon Neptune

Lesson Reflection

Final Quiz-Achieving Scalability with Vector, Graph, and Key/Value Databases

Quiz-Introduction to Vector Databases

Quiz-Introduction to Graph Databases

Social Media Recommender

Real-world Advanced Data Engineering Projects

Learn AWS CloudShell for Dynamo Development

Learn AWS CodeCatalyst for Dynamo Development

Leveraging AWS CodeWhisperer for Dynamo Development

Create a Table with CLI

Populate a Table With Batching Records

Query a Table with Records

Project Walkthrough

Introduction

Overview of a pipeline requirements

Using SqlAlchemy with Pandas

Persisting data in a task

Reviewing the results

Summary

Key Terms

Amazon CodeCatalyst

Lesson Reflection

External Lab: Extended DynamoDB

Key Terms

Quick start for SQLAlchemy

Explore and analyze data with Python

Lesson Reflection

Recommended Next Steps

Final Quiz-Advanced Data Engineering

Quiz-Building a solution with DynamoDB with the AWS CLI

Quiz-Persisting data through a multi-task DAG with Pandas

Jupyter Sandbox

VS Code Sandbox

Advanced Data Engineering
 at 
Coursera 
Admission Process

    Important Dates

    May 25, 2024
    Course Commencement Date

    Other courses offered by Coursera

    – / –
    3 months
    Beginner
    – / –
    20 hours
    Beginner
    – / –
    2 months
    Beginner
    – / –
    3 months
    Beginner
    View Other 6715 CoursesRight Arrow Icon
    qna

    Advanced Data Engineering
     at 
    Coursera 

    Student Forum

    chatAnything you would want to ask experts?
    Write here...