Data Analysis Using Pyspark
5.0 /5
- Offered byCoursera
Data Analysis Using Pyspark at Coursera Overview
Data Analysis Using Pyspark
at Coursera
Gain a comprehensive overview of the Data Analysis principles and concepts
Duration | 2 hours |
Mode of learning | Online |
Difficulty level | Intermediate |
Credential | Certificate |
Data Analysis Using Pyspark at Coursera Highlights
Data Analysis Using Pyspark
at Coursera
- Earn a certificate upon completion
- Receive training from industry experts
- Gain hands-on experience solving real-world job tasks
- Build confidence using the latest tools and technologies
Read more
Data Analysis Using Pyspark at Coursera Course details
Data Analysis Using Pyspark
at Coursera
Skills you will learn
What are the course deliverables?
- Learn how to setup the google colab for distributed data processing
- Learn applying different queries to your dataset to extract useful Information
- Learn how to visualize this information using matplotlib
More about this course
- One of the important topics that every data analyst should be familiar with is the distributed data processing technologies
- As a data analyst, you should be able to apply different queries to your dataset to extract useful information out of it
- That is when the distributed data processing and Spark Technology will become handy
- So in this project, we are going to work with pyspark module in python and we are going to use google colab environment in order to apply some queries to the dataset we have related to lastfm website which is an online music service where users can listen to different songs
- Also, we will learn how we can visualize our query results using matplotlib
Data Analysis Using Pyspark at Coursera Curriculum
Data Analysis Using Pyspark
at Coursera
Prepare the Google Colab for distributed data processing
Mounting our Google Drive into Google Colab environment
Importing first file of our Dataset (1 Gb) into pySpark dataframe
Applying some Queries to extract useful information out of our data
Importing second file of our Dataset (3 Mb) into pySpark dataframe
Joining two dataframes and prepapre it for more advanced queries
Learn visualizing our query results using matplotlib
Data Analysis Using Pyspark at Coursera Faculty details
Data Analysis Using Pyspark
at Coursera
Ahmad Varasteh
Data Mining and Machine Learning Instructor
Other courses offered by Coursera
– / –
3 months
Beginner
View Other 6715 Courses
Data Analysis Using Pyspark
at Coursera
Student Forum
Anything you would want to ask experts?
Write here...