Mining Massive Data Sets
offered by Stanford University

Mining Massive Data Sets
at
Stanford University
Overview

Earn a Certificate of completion from Stanford School Of Engineering on successful course completion
Instructors - Jure Leskovec, Anand Rajaraman, & Jeffrey Ullman
An introduction to modern distributed file systems, MapReduce, and algorithms
FREE. Add a Verified Certificate for ?11,151

Skills you will learn

Who should do this course?

This course is designed for those who want to learn the concepts of modern distributed file systems and MapReduce.

What are the course deliverables?

There will be about 2 hours of video to watch each week, broken into small segments. There will be automated homeworks to do for each week, and a final exam.

More about this course

The course introduces the participant to modern distributed file systems and MapReduce, including what distinguishes good MapReduce algorithms from good algorithms in general. The rest of the course is devoted to algorithms for extracting models and information from large datasets. Participants will learn how Google's PageRank algorithm models importance of Web pages and some of the many extensions that have been used for a variety of purposes. It will then cover locality-sensitive hashing, a bit of magic that allows you to find similar items in a set of items so large you cannot possibly compare each pair. When data is stored as a very large, sparse matrix, dimensionality reduction is often a good way to model the data, but standard approaches do not scale well; it will talk about efficient approaches. Many other large-scale algorithms are covered as well, as outlined in the course syllabus.

Week 1: MapReduce

Link Analysis -- PageRank

Week 2: Locality-Sensitive Hashing -- Basics + Applications

Distance Measures

Nearest Neighbors

Frequent Itemsets

Week 3: Data Stream Mining

Analysis of Large Graphs

Week 4: Recommender Systems

Dimensionality Reduction

Week 5: Clustering

Computational Advertising

Week 6: Support-Vector Machines

Decision Trees

MapReduce Algorithms

Week 7: More About Link Analysis - Topic-specific PageRank, Link Spam

More About Locality-Sensitive Hashing

Eligibility criteria Up Arrow Icon

The course is intended for graduate students and advanced undergraduates in Computer Science. At a minimum, you should have had courses in Data structures, Algorithms, Database systems, Linear algebra, Multivariable calculus, and Statistics

Conditional Offer Up Arrow Icon