Top HBase Interview Questions and Answers [Updated]

Top HBase Interview Questions and Answers [Updated]

10 mins read3.3K Views Comment
Updated on Sep 20, 2021 16:36 IST

Apache HBase commonly referred to as the ‘Hadoop Database’, is a column-oriented, non-relational, distributed database in the Hadoop ecosystem. It runs on top of the Hadoop Distributed File System (HDFS) and is extremely similar to Bigtable in Google. Designed to provide real-time read and write access to big data, it has become one of the largest and most popular open-source tools in big data and one of the most popular NoSQL databases.

2020_10_HBase-Interview-Questions.jpg

If you are planning to prepare for your next HBase interview, then this post will give you a sneak peek into the frequently asked HBase interview questions and answers. 

These interview questions will help you in cracking your interview and acquiring your dream career as an HBase Developer. This article will also help you prepare for your Hadoop interviews related to HBase.

Note: Before going through the top HBase interview questions and answers, you can go through some of these popular courses to revise your HBase concepts.

Top HBase Interview Questions and Answers

Here’s a curated list of HBase interview questions and answers that will help you gain in-depth knowledge of the subject to clear your HBase interview in the first attempt.

Q1. Name the commands used in HBase operations?

Ans. The commands used in HBase operations are Scan, Get, Increment, Put, and Delete.

Q2. What is the role of the Zookeeper?

Ans. It provides configuration and state management services to Dgraph nodes of the Big Data Discovery cluster.

Q3. What are the main components of HBase?

Ans. The main components of HBase are:

  • HBase Master
  • RegionServer
  • Catalog Tables
  • Region
  • ZooKeeper

Q4. What is the Ulimit of HBase?

Ans. It is an upper bound of the process.

Q5. Name different types of blocks in HBase?

Ans. The different types of blocks in HBase are:

  • Meta
  • Bloom
  • Index
  • Data

Q6. Define HFile?

Ans. It is the underlying storage format for HBase.

Q7. What does S3 stand for?

Ans. S3 stands for simple storage service.

Also Read>> Top 10 Free Big Data Courses You Can Take up in 2020

Q8. What are the benefits of using HBase?

Ans. The benefits of using HBase are:

  • Great record level consistency
  • Really tough for querying
  • Doesn’t support partial keys completely
  • Flexible, column-based multidimensional map structure

Q9. What are the three types of tombstone markers?

Ans. The three types of tombstone markers are:

  • Column delete marker
  • Family delete marker
  • Version delete marker

Q10. What happens when you modify the block size of a column family?

Ans. When you modify the block size, the new data occupies the new block size while the old data remains in the old block size.

Q11. What is the difference between HBase and relational Database?

Ans. Relational Database

  • A schema-based database
  • Contains thin tables
  • No built-in support for partitioning
  • Used to store normalized data

HBase

  • Schema-less
  • Used to store de-normalized data
  • Contains sparsely populated tables
  • Automate partitioning

Q12. What are column-oriented databases other than HBase?

Ans. Cassandra, CouchDB, and MongoDB are some of the column-oriented databases other than HBase.

Also Read>> Top Big Data Interview Questions & Answers [Updated]

Q13. What type of partition provided by HBase?

Ans. Automatic partitions.

Q14. Explain WAL?

Ans. Write Ahead Log (WAL) is similar to MySQL BIN Log. It is a standard file for ensuring data integrity.

Q15. Name some of the different types of filters?

Ans. Following are used to get specific data from an HBase rather than all the records

  • KeyValue Metadata filters
  • Column value filter
  • RowKey filters
  • Column Value Comparators

Q16. Which command to use if a table is disabled?

Ans. Hbase > is_disabled “table name”

Q17. What does YCSB stand for?

Ans. YCSB stands for Yahoo Cloud Serving Benchmark

Q18. What is the use of truncate command?

Ans. Truncate command is used to drop, recreate, and disable the specified tables.

Q19. When is tall and thin table design considered?

Ans. It is considered when there is a small number of rows and a large number of columns.

Q20. What is the role of HMaster?

Ans. It is responsible for monitoring all RegionServer instances in the cluster.

Also Read>> 7 Trending Tech Skills to Master in 2020

Q21. Explain when you should use Hbase?

Ans. HBase is used when: 

  • You need random, realtime read/write access to your Big Data 
  • Data size is huge
  • Moving from RDBMS to Hbase and you consider redesigning it. 
  • You need to have enough cluster for Hbase to be useful
  • There is strong data consistency

Q22. What is meant by Deletion in Hbase?

Ans. When a Delete command is requested in the HBase, the data is not actually deleted. Rather, a tombstone marker is set, making the deleted cells invisible. HBase deleted cells are removed during compactions.

Q23. How does Hbase actually delete a row?

Ans. Whatever you write in Hbase will be stored from RAM to disk. The disk writes are immutable and bar compaction. When the deletion process starts, the major compaction process deletes the marker while minor compactions do not delete the marker. Rather, a delete tombstone marker is created.

Now, if you will try to delete data and add more data with a different timestamp than the timestamp of the tombstone, then Gets may be masked by the delete/tombstone marker. The result will be that you will not get the inserted value until after the major compaction.

Q24. Explain HBaseFsck class?

Ans. hbck is a tool in Hbase. This tool is executed by the HBaseFsck class. It provides several command-line switches that influence its behavior. HBaseFsck class helps in checking for region consistency, table integrity problems, and repairing corruption. It works in two modes: 

  • Read-only inconsistency identification mode
  • Read-write repair mode

Q25. What do you mean by the Bloom Filters?

Ans. Bloom filters are space-efficient data structures that are used to test whether an element is a member of a set, rapidly and memory-efficiently. Bloom Filters enable you to test whether an HFile includes a certain row or row-col cell. They help you to improve the overall throughput of the cluster. 

Also Read>> How Top Companies Use Big Data Technology!

Q26. What is the data model of HBase?

Ans. The data model of Hbase comprises different logical components such as tables, rows, column families, columns, cells, and versions. It consists of:

  • Tables: It is a logical collection of rows stored in separate partitions called Regions. Each table consists of column families and rows.
  • Rows: A row is one instance of data in a table and is identified by a row key. A Row key acts as a Primary key in HBase.
  • Column Families:  Data in a row is grouped as Column Families. Every Column Family has one or more Columns. Every Column qualifier in HBase denotes attributes corresponding to the object which resides in the cell.
  • Version: The data stored in a cell is versioned. The versions of data are identified by the timestamp.
  • Cells: A cell is a unique combination of row key, Column Family, and the Column Qualifier. It stores data.

Q27. Explain the data manipulation commands of HBase?

Ans. The data manipulation commands enable us to perform modification on the table data, such as adding data into a table and retrieving data from a table. The data manipulation commands of HBase: 

  • Count: Returns the number of rows in a table
  • Put: Insert rows into a table and update an existing cell value of the table
  • Get: Read data from a table in HBase
  • Delete: Deletes a cell value in a table.
  • Deleteall: Deletes all the cells in a given row
  • Scan: Displays the entire content of the table.
  • Truncate: Disables dropped table and recreates a table

Q28: What is REST?

Ans. Rest stands for Representational State Transfer. It defines the semantics such that the protocol can be used in a generic way to address remote resources. REST offers support for different message formats, providing a variety of options for a client application to communicate with the server.

Q29. What is Compaction? What are the different compaction types in HBase?

Ans. Compaction is a process by which HBase cleans itself. There are two types of compaction:

Major compaction: In this, all column-based HFiles are merged to create a single HFiles. Once the HFiles are deleted, they are discarded.

Minor compaction: Here, a single Hfile is created by merging many adjacent small HFiles. The small files are selected randomly.

Q30. Is it possible to perform iteration through the rows in HBase? 

Ans. Yes, it is possible to perform iteration through the rows in HBase. But it is not allowed if the same task is carried out in reverse order. 

Thus, iteration can be performed through the rows if the task is not carried out in reverse order. This happens because the column values are stored on a disk in HBase and their length should be correctly defined. Moreover, the bytes which are related to the value should be written after it. Hence, if we want to carry out this task in the reverse order, the values must be stored one more time, which can lead to compatibility problems and can affect the memory of the Hbase.

Also Read>> Top Online Courses for IT Professionals

Q31. Explain Hlog and HFile.

Ans. Also known as WAL, HLog is the write-ahead log file. Every Region Server has a Write-Ahead Log (called HLog) and multiple Regions.

HFile is the real data storage file. Every Region is comprised of a MemStore and multiple StoreFiles (HFile).

Data is first written to the write-ahead log file and also written in MemStore. When the MemStore is full, the contents of the MemStore are flushed to the disk into HFiles. 

Q32. Which filter accept the page size as the parameter in HBase?

Ans. PageFilter accepts the page size as the parameter in HBase. It takes one argument as page size and returns the page size number of the rows from the table.

Q33. Why Hbase is called as a schema-less database?

Ans. HBase is called as schema-less as it does not have the concept of fixed columns schema. It only defines column families. Thus, you need not define the data before the time. You only need to define the column family name.

Q34. Name the three places where data will be reconciled before returning the value while reading the data from HBase. 

Ans. The three places where data will be reconciled before returning the value are: 

  1. MemStore: Check for any pending modification in the system.
  2. BlockCache: Verify if the block containing the row has been accessed recently.
  3. HFiles: Acces the relevant HFiles on disk.

Q35. What is an HBase Shell?

Ans. HBase Shell is a Java API that enables you to establish a connection with the Hbase. HBase uses the Hadoop File System (HDFS) to store its data. Some of the commands supported by HBase Shell are: 

  • status: shows the status of HBase
  • version: provides the version of HBase being used
  • whoami: offers information about the user
  • table_help: offers help for table-reference commands

Download PDF

————————————————————————————————————–

FAQs

What is HBase used for?

HBase is a column-oriented, non-relational, distributed database in the Hadoop ecosystem. It runs on top of the Hadoop Distributed File System (HDFS). It provides random, real-time read/write access to Big Data. HBase comes with very large tables on top of clusters and is similar to Google's Bigtable.

What does HBase stand for?

HBase (HBase) is the Hadoop database. It is a non-relational, column-oriented, distributed database in the Hadoop ecosystem. It is a sub-project of the Apache Hadoop project and is used to provide real-time read and write access to big data.

Why is HBase faster?

HBase has a four-dimensional data model as it stored data in rows of column families. This makes it fast. It enables you to retrieve a specific column by specifying the table, row, and column family.

Which companies use HBase?

A lot of companies use HBase in their tech stacks, some of them are: u2022 Facebook u2022 Twitter u2022 Yahoo u2022 Pinterest u2022 Mozilla

Is HBase NoSQL database?

Yes, Apache HBase is a NoSQL database. It runs on top of the Hadoop Distributed File System (HDFS). HBase is an implementation of Google's Bigtable paper. It combines the scalability of Hadoop with real-time data access.

About the Author

This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio