Cloudera - Managing Big Data in Clusters and Cloud Storage
- Offered byCoursera
Managing Big Data in Clusters and Cloud Storage at Coursera Overview
Duration | 20 hours |
Mode of learning | Online |
Difficulty level | Beginner |
Credential | Certificate |
Managing Big Data in Clusters and Cloud Storage at Coursera Highlights
- Offered by Cloudera
- Requires effort of 6 hours per week
- Earn a certificate upon successful completion
- Learn in-depth knowledge from senior instructor of Cloudera
Managing Big Data in Clusters and Cloud Storage at Coursera Course details
- Use different tools to browse existing databases and tables in big data systems
- Use different tools to explore files in distributed big data filesystems and cloud storage
- Create and manage big data databases and tables using Apache Hive and Apache Impala
- Describe and choose among different data types and file formats for big data systems
- In this course, you'll learn how to manage big datasets, how to load them into clusters and cloud storage, and how to apply structure to the data so that you can run queries on it using distributed SQL engines like Apache Hive and Apache Impala. You'll learn how to choose the right data types, storage systems, and file formats based on which tools you'll use and what performance you need.
Managing Big Data in Clusters and Cloud Storage at Coursera Curriculum
Week 1: Orientation to Data in Clusters and Cloud Storage
Welcome to the Course
Browsing Tables with Hue
Browsing Tables with SQL Utility Statements
Browsing HDFS with the Hue File Browser
Browsing HDFS from the Command Line
Understanding S3 and Other Cloud Storage Platforms
Browsing S3 Buckets from the Command Line
Week 2: Defining Databases, Tables, and Columns
Week 2 Introduction
Introduction to the CREATE TABLE Statement
Using Different Schemas on the Same Data
Specifying TBLPROPERTIES
Examining, Modifying, and Removing Tables
Hive and Impala Interoperability
Impala Metadata Refresh
Week 3: Data Types and File Types
Week 3 Introduction
Overview of Data Types
Choosing the Right Data Types
Overview of File Types
Choosing the Right File Types
Week 4: Managing Datasets in Clusters and Cloud Storage
Week 4 Introduction
Refresh Impala's Metadata Cache after Loading Data
Loading Files into HDFS with Hue's Table Browser
Loading Files into HDFS with Hue's File Browser
Loading Files into HDFS from the Command Line
Loading Files into S3 from the Command Line
Using Hive and Impala to Load Data into Tables
Conclusion
Week 5: Optimizing Hive and Impala (Honors)
Week 5 Introduction
What to Do When Queries Are Too Complex
What to Do When Queries Take Too Long
When to Use Table Partitioning
When to Use Complex Columns
File Systems versus Storage Engines