Coursera
Coursera Logo

University of California, Davis - Distributed Computing with Spark SQL 

  • Offered byCoursera

Distributed Computing with Spark SQL
 at 
Coursera 
Overview

Duration

13 hours

Start from

Start Now

Total fee

Free

Mode of learning

Online

Difficulty level

Intermediate

Official Website

Explore Free Course External Link Icon

Credential

Certificate

Distributed Computing with Spark SQL
 at 
Coursera 
Highlights

  • Shareable Certificate Earn a Certificate upon completion
  • 100% online Start instantly and learn at your own schedule.
  • Course 3 of 4 in the Learn SQL Basics for Data Science Specialization
  • Flexible deadlines Reset deadlines in accordance to your schedule.
  • Intermediate Level
  • Approx. 13 hours to complete
  • English Subtitles: Arabic, French, Portuguese (European), Italian, Vietnamese, German, Russian, English, Spanish
Read more
Details Icon

Distributed Computing with Spark SQL
 at 
Coursera 
Course details

Skills you will learn
More about this course
  • This course is for students with SQL experience and now want to take the next step in gaining familiarity with distributed computing using Spark. Students will gain an understanding of when to use Spark and how Spark as an engine uniquely combines Data and AI technologies at scale. The four modules build on one another and by the end of the course the student will understand: Spark architecture, Spark DataFrame, optimizing reading/writing data, and how to build a machine learning model. The first module will introduce Spark, including how Spark works with distributed computing and what are Spark Dataframes. Module 2 covers the core concepts of Spark such as storage vs. computing, caching, partitions and Spark UI. The third module looks at Engineering Data Pipelines covering connecting to databases, schemas and type, file formats and writing good data. The final module looks at the application of Spark with Machine Learning through the business use case, a short introduction to what machine learning is, building and applying models and a final course conclusion. By understanding when to use Spark, either scaling out when the model or data is too large to process on a single machine, or having a need to simply speed up to get faster results, students will hone their SQL skills and become a more adept Data Scientist.
Read more

Distributed Computing with Spark SQL
 at 
Coursera 
Curriculum

Introduction to Spark

Course Introduction

Why Distributed Computing?

Spark DataFrames

The Databricks Environment

SQL in Notebooks

Import Data

A Note From UC Davis

Readings and Resources

Assignment #1 - Queries in Spark SQL

Assignment #1 Quiz - Queries in Spark SQL

Module 1 Quiz

Spark Core Concepts

Introduction to Spark Core Concepts

Spark Terminology

Caching

Shuffle Partitions

Spark UI

Broadcast Joins

Readings

Assignment #2 - Spark Internals

Assignment #2 Quiz - Spark Internals

Module 2 Quiz

Engineering Data Pipelines

Engineering Data Pipelines

Spark as a Connector

Accessing Data

File Formats

Schemas and Types

Writing Data

Managed and Unmanaged Tables

Readings

Assignment #3 - Engineering Data Pipelines

Assignment #3 Quiz - Engineering Data Pipelines

Module 3 Quiz

Machine Learning Applications of Spark

Machine Learning Applications of Spark

Applications of Machine Learning

Machine Learning Fundamentals

Linear Regression

Training Linear Regression Model

Applying Machine Learning with UDFs

Course Summary

Readings

Assignment #4 - Logistic Regression Classifier

Assignment #4 Quiz - Logistic Regression Classifier

Module 4 Quiz

Distributed Computing with Spark SQL
 at 
Coursera 
Admission Process

    Important Dates

    May 25, 2024
    Course Commencement Date

    Other courses offered by Coursera

    – / –
    3 months
    Beginner
    – / –
    20 hours
    Beginner
    – / –
    2 months
    Beginner
    – / –
    3 months
    Beginner
    View Other 6715 CoursesRight Arrow Icon
    qna

    Distributed Computing with Spark SQL
     at 
    Coursera 

    Student Forum

    chatAnything you would want to ask experts?
    Write here...