Coursera
Coursera Logo

Cloudera - Managing Big Data in Clusters and Cloud Storage 

  • Offered byCoursera

Managing Big Data in Clusters and Cloud Storage
 at 
Coursera 
Overview

Duration

20 hours

Mode of learning

Online

Difficulty level

Beginner

Credential

Certificate

Managing Big Data in Clusters and Cloud Storage
 at 
Coursera 
Highlights

  • Offered by Cloudera
  • Requires effort of 6 hours per week
  • Earn a certificate upon successful completion
  • Learn in-depth knowledge from senior instructor of Cloudera
Read more
Details Icon

Managing Big Data in Clusters and Cloud Storage
 at 
Coursera 
Course details

Skills you will learn
What are the course deliverables?
  • Use different tools to browse existing databases and tables in big data systems
  • Use different tools to explore files in distributed big data filesystems and cloud storage
  • Create and manage big data databases and tables using Apache Hive and Apache Impala
  • Describe and choose among different data types and file formats for big data systems
More about this course
  • In this course, you'll learn how to manage big datasets, how to load them into clusters and cloud storage, and how to apply structure to the data so that you can run queries on it using distributed SQL engines like Apache Hive and Apache Impala. You'll learn how to choose the right data types, storage systems, and file formats based on which tools you'll use and what performance you need.

Managing Big Data in Clusters and Cloud Storage
 at 
Coursera 
Curriculum

Week 1: Orientation to Data in Clusters and Cloud Storage

Welcome to the Course

Browsing Tables with Hue

Browsing Tables with SQL Utility Statements

Browsing HDFS with the Hue File Browser

Browsing HDFS from the Command Line

Understanding S3 and Other Cloud Storage Platforms

Browsing S3 Buckets from the Command Line

Week 2: Defining Databases, Tables, and Columns

Week 2 Introduction

Introduction to the CREATE TABLE Statement

Using Different Schemas on the Same Data

Specifying TBLPROPERTIES

Examining, Modifying, and Removing Tables

Hive and Impala Interoperability

Impala Metadata Refresh

Week 3: Data Types and File Types

Week 3 Introduction

Overview of Data Types

Choosing the Right Data Types

Overview of File Types

Choosing the Right File Types

Week 4: Managing Datasets in Clusters and Cloud Storage

Week 4 Introduction

Refresh Impala's Metadata Cache after Loading Data

Loading Files into HDFS with Hue's Table Browser

Loading Files into HDFS with Hue's File Browser

Loading Files into HDFS from the Command Line

Loading Files into S3 from the Command Line

Using Hive and Impala to Load Data into Tables

Conclusion

Week 5: Optimizing Hive and Impala (Honors)

Week 5 Introduction

What to Do When Queries Are Too Complex

What to Do When Queries Take Too Long

When to Use Table Partitioning

When to Use Complex Columns

File Systems versus Storage Engines

Managing Big Data in Clusters and Cloud Storage
 at 
Coursera 
Entry Requirements

Eligibility criteriaUp Arrow Icon
Conditional OfferUp Arrow Icon
  • Not mentioned

Other courses offered by Coursera

– / –
3 months
Beginner
– / –
20 hours
Beginner
– / –
2 months
Beginner
– / –
3 months
Beginner
View Other 6714 CoursesRight Arrow Icon
qna

Managing Big Data in Clusters and Cloud Storage
 at 
Coursera 

Student Forum

chatAnything you would want to ask experts?
Write here...