Coursera
Coursera Logo

Databricks - Applied Data Science for Data Analysts 

  • Offered byCoursera

Applied Data Science for Data Analysts
 at 
Coursera 
Overview

Duration

16 hours

Start from

Start Now

Total fee

Free

Mode of learning

Online

Difficulty level

Intermediate

Official Website

Explore Free Course External Link Icon

Credential

Certificate

Applied Data Science for Data Analysts
 at 
Coursera 
Highlights

  • Earn a shareable certificate upon completion.
  • Flexible deadlines according to your schedule.
Details Icon

Applied Data Science for Data Analysts
 at 
Coursera 
Course details

More about this course
  • In this course, you will develop your data science skills while solving real-world problems. You'll work through the data science process to and use unsupervised learning to explore data, engineer and select meaningful features, and solve complex supervised learning problems using tree-based models. You will also learn to apply hyperparameter tuning and cross-validation strategies to improve model performance.
  • NOTE: This is the third and final course in the Data Science with Databricks for Data Analysts Coursera specialization. To be successful in this course we highly recommend taking the first two courses in that specialization prior to taking this course. These courses are: Apache Spark for Data Analysts and Data Science Fundamentals for Data Analysts.

Applied Data Science for Data Analysts
 at 
Coursera 
Curriculum

Welcome to the Course

Course introduction

Review of Data Science

Review of Machine Learning

Data Science Process vs. Machine Learning Workflow

Introduction to Databricks (Optional)

Introduction to the Platform (Optional)

Introduction to Apache Spark (Optional)

Introduction to Delta Lake (Optional)

Before you begin

Hands-on with Databricks Lab (Optional)

Course Introduction and Prerequisites

Applied Unsupervised Learning

Lesson Introduction

Exploring Data

Visualizing Data

Introduction to K-means Clustering

Applied K-means Clustering

Identifying the Number of Clusters

Identifying the Number of Clusters Demo

Utilizing Clusters

Lesson Introduction

Feature Relationships

Correlation Matrix

Introduction to Principal Components Analysis

Applied Principal Components Analysis

PCA for Feature Relationships

PCA for Dimensionality Reduction

K-means Clustering Lab

Principal Components Analysis Lab

Exploring and Visualizing Data

K-means Clustering

K-means Clustering Lab Results

Feature Correlation

Principal Components Analysis

PCA Lab Results

Feature Engineering and Selection

Lesson Introduction

Introduction to Feature Engineering

Common Feature Improvements

Handling Missing Values

Imputing Missing Values

Feature Scaling

Converting Feature Types

Representing Categorical Features

One-hot Encoding

Lesson Introduction

Problems with High Dimensions and Dimensionality Reduction

A Review of Feature Importance

Linear Regression Coefficients and P-values

Introduction to Feature Selection

Regularization

Regularized Regression

Applied Regularized Regression

Feature Engineering Lab

Feature Selection Lab

Feature Engineering Concepts

Missing Values

Feature Engineering Lab Results

Dimensionality and Feature Importance

Feature Selection in Linear Regression

Feature Selection Lab Results

Applied Tree-based Models

Lesson Introduction

A Review of Decision Trees

Algorithm Selection

String Indexing Categorical Features

Decision Tree Pruning

Lesson Introduction

Introduction to Ensemble Modeling

Bootstrap Sampling Training Data

Applied Random Forest

Lesson Introduction

A Review of Classification Evaluation Metrics

A Review of Assigning Classes

Oversampling and Undersampling Classes

Weighting Classes in Random Forest

Feature Engineering in Decision Trees

Preventing Overfitting

Applied Decision Trees Lab

Aggregating Bootstrapped Results

Random Forest Algorithm

Applied Random Forest Lab

Problems with Class Imbalance

Label-based Bootstrap Sampling

Label-based Evaluation Weighting

Label Imbalance Lab

Algorithm Selection and Decision Trees

Categorical Features

Applied Decision Trees Lab Results

Tree-based Ensemble Modeling

Bootstrap Aggregation

Applied Random Forest Lab Results

Classification Evaluation

Label Imbalance and Sampling

Label Imbalance Lab Results

Model Optimization

Lesson Introduction

Introduction to Hyperparameters

Hyperparameters in Tree-based Models

Optimizing Hyperparameters

Grid Search for Hyperparameter Optimization

Validation Set

Grid-search for Random Forests

Lesson Introduction

A Review of Model Generalization

Validation Set Limitations

Introduction to Cross-Validation

K-fold Cross-Validation with Random Forest

Other Cross-Validation Strategies

Hyperparameter Search Lab

Cross-Validation Lab

Hyperparameters in Tree-based Models

Grid Search

Hyperparameter Search Lab Results

Model Generalization and Validation Set

Cross-Validation

Cross-Validation Lab Results

Applied Data Science for Data Analysts
 at 
Coursera 
Admission Process

    Important Dates

    May 25, 2024
    Course Commencement Date

    Other courses offered by Coursera

    – / –
    3 months
    Beginner
    – / –
    20 hours
    Beginner
    – / –
    2 months
    Beginner
    – / –
    3 months
    Beginner
    View Other 6715 CoursesRight Arrow Icon
    qna

    Applied Data Science for Data Analysts
     at 
    Coursera 

    Student Forum

    chatAnything you would want to ask experts?
    Write here...