Coursera
Coursera Logo

Spark, Hadoop, and Snowflake for Data Engineering 

  • Offered byCoursera

Spark, Hadoop, and Snowflake for Data Engineering
 at 
Coursera 
Overview

Duration

29 hours

Start from

Start Now

Total fee

Free

Mode of learning

Online

Difficulty level

Advanced

Official Website

Explore Free Course External Link Icon

Credential

Certificate

Spark, Hadoop, and Snowflake for Data Engineering
 at 
Coursera 
Highlights

  • Earn a certificate from Duke University
  • Add to your LinkedIn profile
  • 21 quizzes
Details Icon

Spark, Hadoop, and Snowflake for Data Engineering
 at 
Coursera 
Course details

What are the course deliverables?
  • What you'll learn
  • Create scalable data pipelines (Hadoop, Spark, Snowflake, Databricks) for efficient data handling.
  • Optimize data engineering with clustering and scaling to boost performance and resource use.
  • Build ML solutions (PySpark, MLFlow) on Databricks for seamless model development and deployment.
  • Implement DataOps and DevOps practices for continuous integration and deployment (CI/CD) of data-driven applications, including automating processes.
More about this course
  • This is primarily aimed at first- and second-year undergraduates interested in engineering or science, along with high school students and professionals with an interest in programmingGain the skills for building efficient and scalable data pipelines. Explore essential data engineering platforms (Hadoop, Spark, and Snowflake) as well as learn how to optimize and manage them. Delve into Databricks, a powerful platform for executing data analytics and machine learning tasks, while honing your Python data science skills with PySpark. Finally, discover the key concepts of MLflow, an open-source platform for managing the end-to-end machine learning lifecycle, and learn how to integrate it with Databricks.
  • This course is designed for learners who want to pursue or advance their career in data science or data engineering, or for software developers or engineers who want to grow their data management skill set. In addition to the technologies you will learn, you will also gain methodologies to help you hone your project management and workflow skills for data engineering, including applying Kaizen, DevOps, and Data Ops methodologies and best practices.
  • With quizzes to test your knowledge throughout, this comprehensive course will help guide your learning journey to become a proficient data engineer, ready to tackle the challenges of today's data-driven world.
Read more

Spark, Hadoop, and Snowflake for Data Engineering
 at 
Coursera 
Curriculum

Overview and Introduction to PySpark

Meet your Co-Instructor: Kennedy Behrman

Meet your Co-Instructor: Noah Gift

Overview of Big Data Platforms

Getting Started with Hadoop

Getting Started with Spark

Introduction to Resilient Distributed Datasets (RDD)

Resilient Distributed Datasets (RDD) Demo

Introduction to Spark SQL

PySpark Dataframe Demo: Part 1

PySpark Dataframe Demo: Part 2

Welcome to Data Engineering Platforms with Python!

What is Apache Hadoop?

What is Apache Spark?

Use Apache Spark in Azure Databricks (optional)

Choosing between Hadoop and Spark

What are RDDs?

Getting Started: Creating RDD's with PySpark

Spark SQL, Dataframes and Datasets

PySpark and Spark SQL

Big Data Platforms

Apache Hadoop Concepts

Apache Spark Concepts

RDD Concepts

Spark SQL Concepts

PySpark Dataframe Concepts

PySpark

Meet and Greet (optional)

Let Us Know if Something's Not Working

Practice: Creating RDD's with PySpark

Practice: Reading Data into Dataframes

Snowflake

What is Snowflake?

Snowflake Layers

Snowflake Web UI

Navigating Snowflake

Creating a Table in Snowflake

Snowflake Warehouses

Writing to Snowflake

Reading from Snowflake

Accessing Snowflake

Detailed View Inside Snowflake

Snowsight: The Snowflake Web Interface

Working with Warehouses

Python Connector Documentation

Snowflake Architecture

Snowflake Layers

Navigating Snowflake

Creating a Table

Writing to Snowflake

Snowflake

Azure Databricks and MLFLow

Accessing Databricks

Spark Notebooks with Databricks

Using Data with Databricks

Working with Workspaces in Databricks

Advanced Capabilities of Databricks

PySpark Introduction on Databricks

Exploring Databricks Azure Features

Using the DBFS to AutoML Workflow

Load, Register and Deploy ML Models

Databricks Model Registry

Model Serving on Databricks

What is MLOps?

Exploring Open-Source MLFlow Frameworks

Running MLFlow with Databricks

End to End Databricks MLFlow

Databricks Autologging with MLFlow

What is Azure Databricks?

Introduction to Databricks Machine Learning

What is the Databricks File System (DBFS)?

Serverless Compute with Databricks

MLOps Workflow on Azure Databricks

Run MLFlow Projects on Azure Databricks

Databricks Autologging

PySpark SQL

PySpark DataFrames

MLFlow with Databricks

DataBricks

ETL-Part-1: Keyword Extractor Tool to HashTag Tool

DataOps and Operations Methodologies

Kaizen Methodology for Data

Introducing GitHub CodeSpaces

Compiling Python in GitHub Codespaces

Walking through Sagemaker Studio Lab

Pytest Master Class (Optional)

What is DevOps?

DevOps Key Concepts

Continuous Integration Overview

Build an NLP in Cloud9 with Python

Build a Continuously Deployed Containerized FastAPI Microservice

Hugo Continuous Deploy on AWS

Container Based Continuous Delivery

What is DataOps?

DataOps and MLOps with Snowflake

Building Cloud Pipelines with Step Functions and Lambda

What is a Data Lake?

Data Warehouse vs. Feature Store

Big Data Challenges

Types of Big Data Processing

Real-World Data Engineering Pipeline

Data Feedback Loop

GitHub Codespaces Overview

Getting Started with Amazon SageMaker Studio Lab

Teaching MLOps at Scale with GitHub (Optional)

Getting Started with DevOps and Cloud Computing

Benefits of Serverless ETL Technologies

Kaizen Methodology

DevOps

DataOps

DataOps and Operations Methodologies

ETL-Part2: SQLite ETL Destination

Spark, Hadoop, and Snowflake for Data Engineering
 at 
Coursera 
Admission Process

    Important Dates

    May 25, 2024
    Course Commencement Date

    Other courses offered by Coursera

    – / –
    3 months
    Beginner
    – / –
    20 hours
    Beginner
    – / –
    2 months
    Beginner
    – / –
    3 months
    Beginner
    View Other 6715 CoursesRight Arrow Icon
    qna

    Spark, Hadoop, and Snowflake for Data Engineering
     at 
    Coursera 

    Student Forum

    chatAnything you would want to ask experts?
    Write here...