Databricks - Applied Data Science for Data Analysts
- Offered byCoursera
Applied Data Science for Data Analysts at Coursera Overview
Duration | 16 hours |
Start from | Start Now |
Total fee | Free |
Mode of learning | Online |
Difficulty level | Intermediate |
Official Website | Explore Free Course |
Credential | Certificate |
Applied Data Science for Data Analysts at Coursera Highlights
- Earn a shareable certificate upon completion.
- Flexible deadlines according to your schedule.
Applied Data Science for Data Analysts at Coursera Course details
- In this course, you will develop your data science skills while solving real-world problems. You'll work through the data science process to and use unsupervised learning to explore data, engineer and select meaningful features, and solve complex supervised learning problems using tree-based models. You will also learn to apply hyperparameter tuning and cross-validation strategies to improve model performance.
- NOTE: This is the third and final course in the Data Science with Databricks for Data Analysts Coursera specialization. To be successful in this course we highly recommend taking the first two courses in that specialization prior to taking this course. These courses are: Apache Spark for Data Analysts and Data Science Fundamentals for Data Analysts.
Applied Data Science for Data Analysts at Coursera Curriculum
Welcome to the Course
Course introduction
Review of Data Science
Review of Machine Learning
Data Science Process vs. Machine Learning Workflow
Introduction to Databricks (Optional)
Introduction to the Platform (Optional)
Introduction to Apache Spark (Optional)
Introduction to Delta Lake (Optional)
Before you begin
Hands-on with Databricks Lab (Optional)
Course Introduction and Prerequisites
Applied Unsupervised Learning
Lesson Introduction
Exploring Data
Visualizing Data
Introduction to K-means Clustering
Applied K-means Clustering
Identifying the Number of Clusters
Identifying the Number of Clusters Demo
Utilizing Clusters
Lesson Introduction
Feature Relationships
Correlation Matrix
Introduction to Principal Components Analysis
Applied Principal Components Analysis
PCA for Feature Relationships
PCA for Dimensionality Reduction
K-means Clustering Lab
Principal Components Analysis Lab
Exploring and Visualizing Data
K-means Clustering
K-means Clustering Lab Results
Feature Correlation
Principal Components Analysis
PCA Lab Results
Feature Engineering and Selection
Lesson Introduction
Introduction to Feature Engineering
Common Feature Improvements
Handling Missing Values
Imputing Missing Values
Feature Scaling
Converting Feature Types
Representing Categorical Features
One-hot Encoding
Lesson Introduction
Problems with High Dimensions and Dimensionality Reduction
A Review of Feature Importance
Linear Regression Coefficients and P-values
Introduction to Feature Selection
Regularization
Regularized Regression
Applied Regularized Regression
Feature Engineering Lab
Feature Selection Lab
Feature Engineering Concepts
Missing Values
Feature Engineering Lab Results
Dimensionality and Feature Importance
Feature Selection in Linear Regression
Feature Selection Lab Results
Applied Tree-based Models
Lesson Introduction
A Review of Decision Trees
Algorithm Selection
String Indexing Categorical Features
Decision Tree Pruning
Lesson Introduction
Introduction to Ensemble Modeling
Bootstrap Sampling Training Data
Applied Random Forest
Lesson Introduction
A Review of Classification Evaluation Metrics
A Review of Assigning Classes
Oversampling and Undersampling Classes
Weighting Classes in Random Forest
Feature Engineering in Decision Trees
Preventing Overfitting
Applied Decision Trees Lab
Aggregating Bootstrapped Results
Random Forest Algorithm
Applied Random Forest Lab
Problems with Class Imbalance
Label-based Bootstrap Sampling
Label-based Evaluation Weighting
Label Imbalance Lab
Algorithm Selection and Decision Trees
Categorical Features
Applied Decision Trees Lab Results
Tree-based Ensemble Modeling
Bootstrap Aggregation
Applied Random Forest Lab Results
Classification Evaluation
Label Imbalance and Sampling
Label Imbalance Lab Results
Model Optimization
Lesson Introduction
Introduction to Hyperparameters
Hyperparameters in Tree-based Models
Optimizing Hyperparameters
Grid Search for Hyperparameter Optimization
Validation Set
Grid-search for Random Forests
Lesson Introduction
A Review of Model Generalization
Validation Set Limitations
Introduction to Cross-Validation
K-fold Cross-Validation with Random Forest
Other Cross-Validation Strategies
Hyperparameter Search Lab
Cross-Validation Lab
Hyperparameters in Tree-based Models
Grid Search
Hyperparameter Search Lab Results
Model Generalization and Validation Set
Cross-Validation
Cross-Validation Lab Results