UDEMY
UDEMY Logo

Spark and Python for Big Data with PySpark 

  • Offered byUDEMY

Spark and Python for Big Data with PySpark
 at 
UDEMY 
Overview

Duration

11 hours

Total fee

599

Mode of learning

Online

Difficulty level

Intermediate

Official Website

Go to Website External Link Icon

Credential

Certificate

Spark and Python for Big Data with PySpark
 at 
UDEMY 
Highlights

  • Compatible on Mobile and TV
  • Earn a Cerificate on successful completion
  • Get Full Lifetime Access
  • Course Instructor
  • Jose Portilla
Read more
Details Icon

Spark and Python for Big Data with PySpark
 at 
UDEMY 
Course details

Who should do this course?
  • Someone who knows Python and would like to learn how to use it for Big Data
  • Someone who is very familiar with another programming language and needs to learn Spark
What are the course deliverables?
  • Use Python and Spark together to analyze Big Data
  • Learn how to use the new Spark 2.0 DataFrame Syntax
  • Work on Consulting Projects that mimic real world situations!
  • Classify Customer Churn with Logisitic Regression
  • Use Spark with Random Forests for Classification
  • Learn how to use Spark's Gradient Boosted Trees
  • Use Spark's MLlib to create Powerful Machine Learning Models
  • Learn about the DataBricks Platform!
  • Get set up on Amazon Web Services EC2 for Big Data Analysis
  • Learn how to use AWS Elastic MapReduce Service!
  • Learn how to leverage the power of Linux with a Spark Environment!
  • Create a Spam filter using Spark and Natural Language Processing!
  • Use Spark Streaming to Analyze Tweets in Real Time!
Read more
More about this course
  • Learn the latest Big Data Technology - Spark! And learn to use it with one of the most popular programming languages, Python! One of the most valuable technology skills is the ability to analyze huge data sets, and this course is specifically designed to bring you up to speed on one of the best technologies for this task, Apache Spark ! The top technology companies like Google, Facebook, Netflix, Airbnb, Amazon, NASA, and more are all using Spark to solve their big data problems!Spark can perform up to 100x faster than Hadoop MapReduce , which has caused an explosion in demand for this skill! Because the Spark 2.0 DataFrame framework is so new, you now have the ability to quickly become one of the most knowledgeable people in the job market! This course will teach the basics with a crash course in Python, continuing on to learning how to use Spark DataFrames with the latest Spark 2.0 syntax! Once we've done that we'll go through how to use the MLlib Machine Library with the DataFrame syntax and Spark. All along the way you'll have exercises and Mock Consulting Projects that put you right into a real world situation where you need to use your new skills to solve a real problem! We also cover the latest Spark Technologies, like Spark SQL, Spark Streaming, and advanced models like Gradient Boosted Trees! After you complete this course you will feel comfortable putting Spark and PySpark on your resume! This course also has a full 30 day money back guarantee and comes with a LinkedIn Certificate of Completion! If you're ready to jump into the world of Python, Spark, and Big Data, this is the course for you!
Read more

Spark and Python for Big Data with PySpark
 at 
UDEMY 
Curriculum

Introduction to Course

Introduction

Course Overview

Frequently Asked Questions

What is Spark? Why Python?

Setting up Python with Spark

Set-up Overview

Note on Installation Sections

Local VirtualBox Set-up

Local Installation VirtualBox Part 1

Local Installation VirtualBox Part 2

Setting up PySpark

AWS EC2 PySpark Set-up

AWS EC2 Set-up Guide

Creating the EC2 Instance

SSH with Mac or Linux

Installations on EC2

Databricks Setup

Databricks Setup

AWS EMR Cluster Setup

AWS EMR Setup

Python Crash Course

Introduction to Python Crash Course

Jupyter Notebook Overview

Python Crash Course Part One

Python Crash Course Part Two

Python Crash Course Part Three

Python Crash Course Exercises

Python Crash Course Exercise Solutions

Spark DataFrame Basics

Introduction to Spark DataFrames

Spark DataFrame Basics

Spark DataFrame Basics Part Two

Spark DataFrame Basic Operations

Groupby and Aggregate Operations

Missing Data

Dates and Timestamps

Spark DataFrame Project Exercise

DataFrame Project Exercise

DataFrame Project Exercise Solutions

Introduction to Machine Learning with MLlib

Introduction to Machine Learning and ISLR

Machine Learning with Spark and Python with MLlib

Linear Regression

Linear Regression Theory and Reading

Linear Regression Documentation Example

Regression Evaluation

Linear Regression Example Code Along

Linear Regression Consulting Project

Linear Regression Consulting Project Solutions

Logistic Regression

Logistic Regression Theory and Reading

Logistic Regression Example Code Along

Logistic Regression Code Along

Logistic Regression Consulting Project

Logistic Regression Consulting Project Solutions

Decision Trees and Random Forests

Tree Methods Theory and Reading

Tree Methods Documentation Examples

Decision Tress and Random Forest Code Along Examples

Random Forest - Classification Consulting Project

Random Forest Classification Consulting Project Solutions

K-means Clustering

K-means Clustering Theory and Reading

KMeans Clustering Documentation Example

Clustering Example Code Along

Clustering Consulting Project

Clustering Consulting Project Solutions

Collaborative Filtering for Recommender Systems

Introduction to Recommender Systems

Recommender System - Code Along Project

Natural Language Processing

Introduction to Natural Language Processing

NLP Tools Part One

NLP Tools Part Two

Natural Language Processing Code Along Project

Spark Streaming with Python

Introduction to Streaming with Spark!

Spark Streaming Documentation Example

Spark Streaming Twitter Project - Part

Spark Streaming Twitter Project - Part Two

Spark Streaming Twitter Project - Part Three

Bonus

Bonus Lecture:

Other courses offered by UDEMY

549
50 hours
– / –
3 K
10 hours
– / –
549
4 hours
– / –
599
10 hours
– / –
View Other 2346 CoursesRight Arrow Icon

Spark and Python for Big Data with PySpark
 at 
UDEMY 
Students Ratings & Reviews

4/5
Verified Icon3 Ratings
P
Prabhat Kumar
Spark and Python for Big Data with PySpark
Offered by UDEMY
4
Learning Experience: The training material clearly focuses on Pyspark API, it show how to create dataframes
Faculty: The course the all the basics operation needed to make data ready for performing machine learning algorithms etc.The instructor name was Jose Portilla This course is sufficient to get started with pyspark ,it also Mlib library example for doing machine learning.
Course Support: No
Reviewed on 22 Jul 2022Read More
Thumbs Up IconThumbs Down Icon
S
Suryavo Pal
Spark and Python for Big Data with PySpark
Offered by UDEMY
4
Other: This course is very helpful for beginners who wants to get into bigdata. Working on cluster and distributed system pyspark is a very helpful tool
Reviewed on 30 Oct 2021Read More
Thumbs Up IconThumbs Down Icon
View All 2 ReviewsRight Arrow Icon
qna

Spark and Python for Big Data with PySpark
 at 
UDEMY 

Student Forum

chatAnything you would want to ask experts?
Write here...