University of Washington - Machine Learning: Clustering & Retrieval

Offered byCoursera

Machine Learning: Clustering & Retrieval
at
Coursera
Overview

Duration	17 hours
Total fee	Free
Mode of learning	Online
Official Website	Explore Free Course
Credential	Certificate

Machine Learning: Clustering & Retrieval
at
Coursera
Highlights

Shareable Certificate Earn a Certificate upon completion
100% online Start instantly and learn at your own schedule.
Course 4 of 4 in the Machine Learning Specialization
Flexible deadlines Reset deadlines in accordance to your schedule.
Approx. 17 hours to complete
English Subtitles: Arabic, French, Portuguese (European), Italian, Vietnamese, Korean, German, Russian, English, Spanish

Machine Learning: Clustering & Retrieval
at
Coursera
Course details

Skills you will learn

Python

More about this course

Case Studies: Finding Similar Documents
A reader is interested in a specific news article and you want to find similar articles to recommend. What is the right notion of similarity? Moreover, what if there are millions of other documents? Each time you want to a retrieve a new document, do you need to search through all other documents? How do you group similar documents together? How do you discover new, emerging topics that the documents cover?
In this third case study, finding similar documents, you will examine similarity-based algorithms for retrieval. In this course, you will also examine structured representations for describing the documents in the corpus, including clustering and mixed membership models, such as latent Dirichlet allocation (LDA). You will implement expectation maximization (EM) to learn the document clusterings, and see how to scale the methods using MapReduce.
Learning Outcomes: By the end of this course, you will be able to:
-Create a document retrieval system using k-nearest neighbors.
-Identify various similarity metrics for text data.
-Reduce computations in k-nearest neighbor search by using KD-trees.
-Produce approximate nearest neighbors using locality sensitive hashing.
-Compare and contrast supervised and unsupervised learning tasks.
-Cluster documents by topic using k-means.
-Describe how to parallelize k-means using MapReduce.
-Examine probabilistic clustering approaches using mixtures models.
-Fit a mixture of Gaussian model using expectation maximization (EM).
-Perform mixed membership modeling using latent Dirichlet allocation (LDA).
-Describe the steps of a Gibbs sampler and how to use its output to draw inferences.
-Compare and contrast initialization techniques for non-convex optimization objectives.
-Implement these techniques in Python.

Machine Learning: Clustering & Retrieval
at
Coursera
Curriculum

Welcome

Welcome and introduction to clustering and retrieval tasks

Course overview

Module-by-module topics covered

Assumed background

Important Update regarding the Machine Learning Specialization

Slides presented in this module

Software tools you'll need for this course

A big week ahead!

Nearest Neighbor Search

Retrieval as k-nearest neighbor search

1-NN algorithm

k-NN algorithm

Document representation

Distance metrics: Euclidean and scaled Euclidean

Writing (scaled) Euclidean distance using (weighted) inner products

Distance metrics: Cosine similarity

To normalize or not and other distance considerations

Complexity of brute force search

KD-tree representation

NN search with KD-trees

Complexity of NN search with KD-trees

Visualizing scaling behavior of KD-trees

Approximate k-NN search using KD-trees

Limitations of KD-trees

LSH as an alternative to KD-trees

Using random lines to partition points

Defining more bins

Searching neighboring bins

LSH in higher dimensions

(OPTIONAL) Improving efficiency through multiple tables

A brief recap

Slides presented in this module

Choosing features and metrics for nearest neighbor search

(OPTIONAL) A worked-out example for KD-trees

Implementing Locality Sensitive Hashing from scratch

Representations and metrics

Choosing features and metrics for nearest neighbor search

KD-trees

Locality Sensitive Hashing

Implementing Locality Sensitive Hashing from scratch

Clustering with k-means

The goal of clustering

An unsupervised task

Hope for unsupervised learning, and some challenge cases

The k-means algorithm

k-means as coordinate descent

Smart initialization via k-means++

Assessing the quality and choosing the number of clusters

Motivating MapReduce

The general MapReduce abstraction

MapReduce execution overview and combiners

MapReduce for k-means

Other applications of clustering

A brief recap

Slides presented in this module

Clustering text data with k-means

k-means

Clustering text data with K-means

MapReduce for k-means

Mixture Models

Motiving probabilistic clustering models

Aggregating over unknown classes in an image dataset

Univariate Gaussian distributions

Bivariate and multivariate Gaussians

Mixture of Gaussians

Interpreting the mixture of Gaussian terms

Scaling mixtures of Gaussians for document clustering

Computing soft assignments from known cluster parameters

(OPTIONAL) Responsibilities as Bayes' rule

Estimating cluster parameters from known cluster assignments

Estimating cluster parameters from soft assignments

EM iterates in equations and pictures

Convergence, initialization, and overfitting of EM

Relationship to k-means

A brief recap

Slides presented in this module

(OPTIONAL) A worked-out example for EM

Implementing EM for Gaussian mixtures

Clustering text data with Gaussian mixtures

EM for Gaussian mixtures

Implementing EM for Gaussian mixtures

Clustering text data with Gaussian mixtures

Mixed Membership Modeling via Latent Dirichlet Allocation

Mixed membership models for documents

An alternative document clustering model

Components of latent Dirichlet allocation model

Goal of LDA inference

The need for Bayesian inference

Gibbs sampling from 10,000 feet

A standard Gibbs sampler for LDA

What is collapsed Gibbs sampling?

A worked example for LDA: Initial setup

A worked example for LDA: Deriving the resampling distribution

Using the output of collapsed Gibbs sampling

A brief recap

Slides presented in this module

Modeling text topics with Latent Dirichlet Allocation

Latent Dirichlet Allocation

Learning LDA model via Gibbs sampling

Modeling text topics with Latent Dirichlet Allocation

Hierarchical Clustering & Closing Remarks

Module 1 recap

Module 2 recap

Module 3 recap

Module 4 recap

Why hierarchical clustering?

Divisive clustering

Agglomerative clustering

The dendrogram

Agglomerative clustering details

Hidden Markov models

What we didn't cover

Thank you!

Slides presented in this module

Modeling text data with a hierarchy of clusters

Other courses offered by Coursera

Databases and SQL for Data Science with Python

IBM - Institute of Business ManagementCertificate

Total Fees

– / –

Duration

3 months

Difficulty level

Beginner

Databases and SQL for Data Science with Python

IBM - Institute of Business ManagementCertificate

Total Fees

– / –

Duration

20 hours

Difficulty level

Beginner

Skills

Python RDBMS

Learn SQL Basics for Data Science Specialization

University of California, DavisCertificate

Total Fees

– / –

Duration

2 months

Difficulty level

Beginner

Skills

Data analysis MySQL Apache

Machine Learning for Marketing Specialization

CourseraCertificate

Total Fees

– / –

Duration

3 months

Difficulty level

Beginner

Skills

Data analysis

View Other 6719 Courses

Machine Learning: Clustering & Retrieval

Coursera

Student Forum

Anything you would want to ask experts?

Write here...

Data ScienceMachine LearningMachine Learning and Pattern RecognitionMachine Learning: Clustering & Retrieval

Useful Links

Know more about Coursera

All About Coursera

Courses 2025

Reviews on Placements, Faculty & Facilities

Know more about Programs

Data Science Course, Certification, Degree, Fees, Admission, Career, Syllabus

Data Exploration

Deep Learning and Neural Networks

University of Washington - Machine Learning: Clustering & Retrieval

Machine Learning: Clustering & Retrieval at Coursera Overview

Machine Learning: Clustering & Retrieval at Coursera Highlights

Machine Learning: Clustering & Retrieval at Coursera Course details

Machine Learning: Clustering & Retrieval at Coursera Curriculum

Other courses offered by Coursera

Databases and SQL for Data Science with Python

Databases and SQL for Data Science with Python

Learn SQL Basics for Data Science Specialization

Machine Learning for Marketing Specialization

Student Forum

Useful Links

Know more about Coursera

Know more about Programs

Machine Learning: Clustering & Retrieval
at
Coursera
Overview

Machine Learning: Clustering & Retrieval
at
Coursera
Highlights

Machine Learning: Clustering & Retrieval
at
Coursera
Course details

Machine Learning: Clustering & Retrieval
at
Coursera
Curriculum