Scalable Machine Learning with Apache Spark™
- Offered byDatabricks
Scalable Machine Learning with Apache Spark™ at Databricks Overview
Duration | 16 hours |
Mode of learning | Online |
Official Website | Go to Website |
Credential | Certificate |
Scalable Machine Learning with Apache Spark™ at Databricks Highlights
- Earn a certificate from Databricks
- Learn from industry experts
Scalable Machine Learning with Apache Spark™ at Databricks Course details
This course teaches you how to scale ML pipelines with Spark, including distributed training, hyperparameter tuning, and inference
You will build and tune ML models with SparkML while leveraging MLflow to track, version, and manage these models
This course covers the latest ML features in Apache Spark, such as Pandas UDFs, Pandas Functions, and the Pandas API on Spark, as well as the latest ML product offerings, such as Feature Store and AutoML
Scalable Machine Learning with Apache Spark™ at Databricks Curriculum
Day 1
Spark / ML overview
Exploratory data analysis (EDA) and feature engineering with Spark
Linear regression with SparkML: transformers, estimators, pipelines, and evaluators
MLflow Tracking and Model Registry
Day 2
Tree-based models: Hyperparameter tuning and parallelism
HyperOpt for distributed hyperparameter tuning
Databricks AutoML and Feature Store
Integrating 3rd party packages (distributed XGBoost)
Distributed inference of sci-kit-learn models with pandas UDFs
Distributed training with pandas function API
Pandas API on Spark for data manipulation