Coursera
Coursera Logo

Data Analysis Using Pyspark 

  • Offered byCoursera

Data Analysis Using Pyspark
 at 
Coursera 
Overview

Gain a comprehensive overview of the Data Analysis principles and concepts

Duration

2 hours

Mode of learning

Online

Difficulty level

Intermediate

Credential

Certificate

Data Analysis Using Pyspark
 at 
Coursera 
Highlights

  • Earn a certificate upon completion
  • Receive training from industry experts
  • Gain hands-on experience solving real-world job tasks
  • Build confidence using the latest tools and technologies
Read more
Details Icon

Data Analysis Using Pyspark
 at 
Coursera 
Course details

What are the course deliverables?
  • Learn how to setup the google colab for distributed data processing
  • Learn applying different queries to your dataset to extract useful Information
  • Learn how to visualize this information using matplotlib
More about this course
  • One of the important topics that every data analyst should be familiar with is the distributed data processing technologies
  • As a data analyst, you should be able to apply different queries to your dataset to extract useful information out of it
  • That is when the distributed data processing and Spark Technology will become handy
  • So in this project, we are going to work with pyspark module in python and we are going to use google colab environment in order to apply some queries to the dataset we have related to lastfm website which is an online music service where users can listen to different songs
  • Also, we will learn how we can visualize our query results using matplotlib

Data Analysis Using Pyspark
 at 
Coursera 
Curriculum

Prepare the Google Colab for distributed data processing

Mounting our Google Drive into Google Colab environment

Importing first file of our Dataset (1 Gb) into pySpark dataframe

Applying some Queries to extract useful information out of our data

Importing second file of our Dataset (3 Mb) into pySpark dataframe

Joining two dataframes and prepapre it for more advanced queries

Learn visualizing our query results using matplotlib

Faculty Icon

Data Analysis Using Pyspark
 at 
Coursera 
Faculty details

Ahmad Varasteh
Data Mining and Machine Learning Instructor

Other courses offered by Coursera

– / –
3 months
Beginner
– / –
20 hours
Beginner
– / –
2 months
Beginner
– / –
3 months
Beginner
View Other 6715 CoursesRight Arrow Icon
qna

Data Analysis Using Pyspark
 at 
Coursera 

Student Forum

chatAnything you would want to ask experts?
Write here...