Spark, Hadoop, and Snowflake for Data Engineering

Offered byCoursera

Spark, Hadoop, and Snowflake for Data Engineering
at
Coursera
Overview

Duration	29 hours
Start from	Start Now
Total fee	Free
Mode of learning	Online
Difficulty level	Advanced
Official Website	Explore Free Course
Credential	Certificate

Spark, Hadoop, and Snowflake for Data Engineering
at
Coursera
Highlights

Earn a certificate from Duke University
Add to your LinkedIn profile
21 quizzes

Spark, Hadoop, and Snowflake for Data Engineering
at
Coursera
Course details

Skills you will learn

Spark Python Cloud Computing Microsoft Azure Hadoop Data Science Data Warehousing Big Data

What are the course deliverables?

What you'll learn
Create scalable data pipelines (Hadoop, Spark, Snowflake, Databricks) for efficient data handling.
Optimize data engineering with clustering and scaling to boost performance and resource use.
Build ML solutions (PySpark, MLFlow) on Databricks for seamless model development and deployment.
Implement DataOps and DevOps practices for continuous integration and deployment (CI/CD) of data-driven applications, including automating processes.

More about this course

This is primarily aimed at first- and second-year undergraduates interested in engineering or science, along with high school students and professionals with an interest in programmingGain the skills for building efficient and scalable data pipelines. Explore essential data engineering platforms (Hadoop, Spark, and Snowflake) as well as learn how to optimize and manage them. Delve into Databricks, a powerful platform for executing data analytics and machine learning tasks, while honing your Python data science skills with PySpark. Finally, discover the key concepts of MLflow, an open-source platform for managing the end-to-end machine learning lifecycle, and learn how to integrate it with Databricks.
This course is designed for learners who want to pursue or advance their career in data science or data engineering, or for software developers or engineers who want to grow their data management skill set. In addition to the technologies you will learn, you will also gain methodologies to help you hone your project management and workflow skills for data engineering, including applying Kaizen, DevOps, and Data Ops methodologies and best practices.
With quizzes to test your knowledge throughout, this comprehensive course will help guide your learning journey to become a proficient data engineer, ready to tackle the challenges of today's data-driven world.

Spark, Hadoop, and Snowflake for Data Engineering
at
Coursera
Curriculum

Overview and Introduction to PySpark

Meet your Co-Instructor: Kennedy Behrman

Meet your Co-Instructor: Noah Gift

Overview of Big Data Platforms

Getting Started with Hadoop

Getting Started with Spark

Introduction to Resilient Distributed Datasets (RDD)

Resilient Distributed Datasets (RDD) Demo

Introduction to Spark SQL

PySpark Dataframe Demo: Part 1

PySpark Dataframe Demo: Part 2

Welcome to Data Engineering Platforms with Python!

What is Apache Hadoop?

What is Apache Spark?

Use Apache Spark in Azure Databricks (optional)

Choosing between Hadoop and Spark

What are RDDs?

Getting Started: Creating RDD's with PySpark

Spark SQL, Dataframes and Datasets

PySpark and Spark SQL

Big Data Platforms

Apache Hadoop Concepts

Apache Spark Concepts

RDD Concepts

Spark SQL Concepts

PySpark Dataframe Concepts

PySpark

Meet and Greet (optional)

Let Us Know if Something's Not Working

Practice: Creating RDD's with PySpark

Practice: Reading Data into Dataframes

Snowflake

What is Snowflake?

Snowflake Layers

Snowflake Web UI

Navigating Snowflake

Creating a Table in Snowflake

Snowflake Warehouses

Writing to Snowflake

Reading from Snowflake

Accessing Snowflake

Detailed View Inside Snowflake

Snowsight: The Snowflake Web Interface

Working with Warehouses

Python Connector Documentation

Snowflake Architecture

Snowflake Layers

Navigating Snowflake

Creating a Table

Writing to Snowflake

Snowflake

Azure Databricks and MLFLow

Accessing Databricks

Spark Notebooks with Databricks

Using Data with Databricks

Working with Workspaces in Databricks

Advanced Capabilities of Databricks

PySpark Introduction on Databricks

Exploring Databricks Azure Features

Using the DBFS to AutoML Workflow

Load, Register and Deploy ML Models

Databricks Model Registry

Model Serving on Databricks

What is MLOps?

Exploring Open-Source MLFlow Frameworks

Running MLFlow with Databricks

End to End Databricks MLFlow

Databricks Autologging with MLFlow

What is Azure Databricks?

Introduction to Databricks Machine Learning

What is the Databricks File System (DBFS)?

Serverless Compute with Databricks

MLOps Workflow on Azure Databricks

Run MLFlow Projects on Azure Databricks

Databricks Autologging

PySpark SQL

PySpark DataFrames

MLFlow with Databricks

DataBricks

ETL-Part-1: Keyword Extractor Tool to HashTag Tool

DataOps and Operations Methodologies

Kaizen Methodology for Data

Introducing GitHub CodeSpaces

Compiling Python in GitHub Codespaces

Walking through Sagemaker Studio Lab

Pytest Master Class (Optional)

What is DevOps?

DevOps Key Concepts

Continuous Integration Overview

Build an NLP in Cloud9 with Python

Build a Continuously Deployed Containerized FastAPI Microservice

Hugo Continuous Deploy on AWS

Container Based Continuous Delivery

What is DataOps?

DataOps and MLOps with Snowflake

Building Cloud Pipelines with Step Functions and Lambda

What is a Data Lake?

Data Warehouse vs. Feature Store

Big Data Challenges

Types of Big Data Processing

Real-World Data Engineering Pipeline

Data Feedback Loop

GitHub Codespaces Overview

Getting Started with Amazon SageMaker Studio Lab

Teaching MLOps at Scale with GitHub (Optional)

Getting Started with DevOps and Cloud Computing

Benefits of Serverless ETL Technologies

Kaizen Methodology

DevOps

DataOps

DataOps and Operations Methodologies

ETL-Part2: SQLite ETL Destination

Spark, Hadoop, and Snowflake for Data Engineering
at
Coursera
Admission Process

Important Dates

May 25, 2024

Course Commencement Date

Other courses offered by Coursera

Databases and SQL for Data Science with Python

IBM - Institute of Business ManagementCertificate

Total Fees

– / –

Duration

3 months

Difficulty level

Beginner

Databases and SQL for Data Science with Python

IBM - Institute of Business ManagementCertificate

Total Fees

– / –

Duration

20 hours

Difficulty level

Beginner

Skills

Python RDBMS

Learn SQL Basics for Data Science Specialization

University of California, DavisCertificate

Total Fees

– / –

Duration

2 months

Difficulty level

Beginner

Skills

Data analysis MySQL Apache

Machine Learning for Marketing Specialization

CourseraCertificate

Total Fees

– / –

Duration

3 months

Difficulty level

Beginner

Skills

Data analysis

View Other 6715 Courses

Spark, Hadoop, and Snowflake for Data Engineering

Coursera

Student Forum

Anything you would want to ask experts?

Write here...

Data ScienceData Science BasicsData EngineeringSpark, Hadoop, and Snowflake for Data Engineering

Useful Links

Know more about Coursera

All About Coursera

Courses 2025

Reviews on Placements, Faculty & Facilities

Know more about Programs

Data Science Course, Certification, Degree, Fees, Admission, Career, Syllabus

Data Exploration

Deep Learning and Neural Networks

Spark, Hadoop, and Snowflake for Data Engineering

Spark, Hadoop, and Snowflake for Data Engineering at Coursera Overview

Spark, Hadoop, and Snowflake for Data Engineering at Coursera Highlights

Spark, Hadoop, and Snowflake for Data Engineering at Coursera Course details

Spark, Hadoop, and Snowflake for Data Engineering at Coursera Curriculum

Spark, Hadoop, and Snowflake for Data Engineering at Coursera Admission Process

Important Dates

Other courses offered by Coursera

Databases and SQL for Data Science with Python

Databases and SQL for Data Science with Python

Learn SQL Basics for Data Science Specialization

Machine Learning for Marketing Specialization

Student Forum

Useful Links

Know more about Coursera

Know more about Programs

Spark, Hadoop, and Snowflake for Data Engineering
at
Coursera
Overview

Spark, Hadoop, and Snowflake for Data Engineering
at
Coursera
Highlights

Spark, Hadoop, and Snowflake for Data Engineering
at
Coursera
Course details

Spark, Hadoop, and Snowflake for Data Engineering
at
Coursera
Curriculum

Spark, Hadoop, and Snowflake for Data Engineering
at
Coursera
Admission Process