Coursera
Coursera Logo

John Hopkins University - Foundations of Data Science: K-Means Clustering in Python 

  • Offered byCoursera

Foundations of Data Science: K-Means Clustering in Python
 at 
Coursera 
Overview

Duration

29 hours

Start from

Start Now

Total fee

Free

Mode of learning

Online

Difficulty level

Beginner

Official Website

Explore Free Course External Link Icon

Credential

Certificate

Foundations of Data Science: K-Means Clustering in Python
 at 
Coursera 
Highlights

  • Earn a shareable certificate upon completion.
  • Flexible deadlines according to your schedule.
  • Earn a certificate from the University of London upon completion of course.
Details Icon

Foundations of Data Science: K-Means Clustering in Python
 at 
Coursera 
Course details

More about this course
  • Organisations all around the world are using data to predict behaviours and extract valuable real-world insights to inform decisions. Managing and analysing big data has become an essential part of modern finance, retail, marketing, social science, development and research, medicine and government.
  • This MOOC, designed by an academic team from Goldsmiths, University of London, will quickly introduce you to the core concepts of Data Science to prepare you for intermediate and advanced Data Science courses. It focuses on the basic mathematics, statistics and programming skills that are necessary for typical data analysis tasks.
  • You will consider these fundamental concepts on an example data clustering task, and you will use this example to learn basic programming skills that are necessary for mastering Data Science techniques. During the course, you will be asked to do a series of mathematical and programming exercises and a small data clustering project for a given dataset.
Read more

Foundations of Data Science: K-Means Clustering in Python
 at 
Coursera 
Curriculum

Week 1: Foundations of Data Science: K-Means Clustering in Python

Welcome and Introduction

Introduction to Data Science

What is Data?

Types of Data

Machine Learning

Supervised vs Unsupervised Learning

K-Means Clustering

Preparing your Data

A Real World Dataset

Types of Data ? Review Information

Supervised vs Unsupervised ? Review Information

K-Means Clustering ? Review Information

Week 1 Summative Assessment

Week 2: Means and Deviations in Mathematics and Python

2.0: Week 2 Introduction

2.1 ? Introduction to Mathematical Concepts of Data Clustering

2.2 ? Mean of One Dimensional Lists

2.3 ? Variance and Standard Deviation

2.4 Jupyter Notebooks

2.5 Variables

2.6 Lists

2.7 Computing the Mean

2.8 Better Lists: NumPy

2.9 Computing the Standard Deviation

Week 2 Conclusion

Population vs Sample, Bias

Variability, Standard Deviation and Bias

Python Style Guide

Numpy and Array Creation

Population vs Sample ? Review Information

Mean of One Dimensional Lists ? Review Information

Variance and Standard Deviation ? Review Information

Jupyter Notebooks ? Review Information

Variables ? Review Information

Lists ? Review Information

Computing the Mean ? Review Information

Better Lists ? Review Information

Computing the Standard Deviation ? Review Information

Week 2 Summative Assessment

Week 3: Moving from One to Two Dimensional Data

Week 3 Introduction

3.1 Multidimensional Data Points and Features

3.2 Multidimensional Mean

3.3 Dispersion: Multidimensional Variables

3.4 Distance Metrics

3.5 Normalisation

3.6 Outliers

3.7 Basic Plotting

3.7a Storing 2D Coordinates in a Single Data Structure

3.8 Multidimensional Mean

3.9 Adding Graphical Overlays

3.10 Calculating the Distance to the Mean

3.11 List Comprehension

3.12 Normalisation in Python

3.13 Outliers and Plotting Normalised Data

Week 3 Conclusion

Multidimensional Data Points and Features Recap

Multidimensional Mean Recap

Multidimensional Variables Recap

Distance Metrics Recap

Normalisation Recap

Note on Matplotlib

Matplotlib Scatter Plot Documentation

Matplotlib Patches Documentation

List Comprehension Documentation

3.12 Errata

Multidimensional Data Points and Features ? Review Information

Multidimensional Mean ? Review Information

Dispersion: Multidimensional Variables ? Review Information

Distance Metrics ? Review Information

Normalisation ? Review Information

Outliers ? Review Information

Basic Plotting ? Review Information

Storing 2D Coordinates ? Review Information

Multidimensional Mean ? Review Information

Adding Graphical Overlays ? Review Information

Calculating Distance ? Review Information

List Comprehension ? Review Information

Normalisation in Python ? Review Information

Outliers ? Review Information

Week 3 Summative Assessment

Week 4: Introducing Pandas and Using K-Means to Analyse Data

Week 4 Introduction

4.1: Using the Pandas Library to Read csv Files

4.1a: Sorting and Filtering Data Using Pandas

4.1b: Labelling Points on a Graph

4.1c: Labelling all the Points on a Graph

4.2: Eyeballing the Data

4.3: Using K-Means to Interpret the Data

Week 4: Conclusion

Week 4 Code Resources

Pandas Read_CSV Function

More Pandas Library Documentation

The Pyplot Text Function

For Loops in Python

Documentation for sklearn.cluster.KMeans

Using the Pandas Library to Read csv Files ? Review Information

Sorting and Filtering Data Using Pandas ? Review Information

Labelling Points on a Graph ? Review Information

Labelling all the Points on a Graph ? Review Information

Eyeballing the Data ? Review Information

Using K-Means to Interpret the Data ? Review Information

Week 4 Summative Assessment

Week 5: A Data Clustering Project

Introduction to Week 5

5.1 Can a Machine Detect Fake Notes?

5.2 Working for a Client

5.3 How to Organize Work on Your Project

5.4 Dealing With Difficulties

5.5 No Data no Data Science: Introduction of the Dataset

5.6 Modelling

5.7 Presenting the Project Results

5.8 Concluding Remarks

Week 5 Code Resource ? the Dataset for our Project

Saving plt.scatter Outputs as Figures

Additional Recommended Reading for Week 5

How Would You Help? ? Review Information

Python ? Review Information

Week 5 Summative Assessment

Foundations of Data Science: K-Means Clustering in Python
 at 
Coursera 
Admission Process

    Important Dates

    May 25, 2024
    Course Commencement Date

    Other courses offered by Coursera

    – / –
    3 months
    Beginner
    – / –
    20 hours
    Beginner
    – / –
    2 months
    Beginner
    – / –
    3 months
    Beginner
    View Other 6715 CoursesRight Arrow Icon

    Foundations of Data Science: K-Means Clustering in Python
     at 
    Coursera 
    Students Ratings & Reviews

    4/5
    Verified Icon1 Rating
    H
    Harsha Veena
    Foundations of Data Science: K-Means Clustering in Python
    Offered by Coursera
    4
    Learning Experience: Explanation of mathematical version of kmeans clustering
    Faculty: Instructors taught well Curriculum was relevant and comprehensive
    Course Support: No career support provided
    Reviewed on 21 May 2022Read More
    Thumbs Up IconThumbs Down Icon
    View 1 ReviewRight Arrow Icon
    qna

    Foundations of Data Science: K-Means Clustering in Python
     at 
    Coursera 

    Student Forum

    chatAnything you would want to ask experts?
    Write here...