Using Partition By SQL Clause

8 mins read436 Views Comment

Assistant Manager - Content

Updated on Aug 16, 2024 11:57 IST

The Partition By SQL clause is a subclause of OVER clause that is used in every invocation of window functions such as MAX(), RANK() and AVG().

What is Partition By SQL Clause?
Usage
Examples
Applications
Types of Partition By SQL Clause

What is Partition By SQL Clause?

In SQL, the “PARTITION BY” clause is often used in the context of window functions. It is used to specify the columns by which the rows of a result set should be divided into partitions. Within each partition, the window function operates independently, treating the rows in each partition as if they were a separate group.

Let’s take an example, considering a table which contains information about sales and transactions, where one row per transaction. Now, if we want to compute the running total of sales by store, we can use the window function having a “PARTITION BY” clause which specifies the “store” column, as mentioned below:

SELECT store, sales, SUM(sales) OVER (PARTITION BY store ORDER BY transaction_date) AS running_total
FROM sales_table
ORDER BY store, transaction_date;
Copy code

The query written above returns a result set which includes a “running_total” column, which shows the cumulative total sales for each store, up to and it also includes the current row. The “PARTITION BY” SQL clause ensures that the running total is being reset every time for each store so that it only considers the sales for that particular store.

Recommended online courses

Best-suited Database and SQL courses for you

Learn Database and SQL with these high-rated online courses

Databases and SQL for Data Science with Python

IBM - Institute of Business ManagementCertificate

Total Fees

– / –

Duration

20 hours

Learn SQL Basics for Data Science Specialization

University of California, DavisCertificate

Total Fees

– / –

Duration

2 months

MySQL -A Practical Approach

IIT KanpurCertificate

Total Fees

₹4.24 K

Duration

6 weeks

SQL for Data Science

University of California, DavisCertificate

Total Fees

– / –

Duration

15 hours

SQL for Beginners

NIITCertificate

4.8

Total Fees

– / –

Duration

– / –

SQL Injection

EC-CouncilCertificate

5.0

Total Fees

– / –

Duration

30 hours

SQL Injection Attacks

EC-CouncilCertificate

4.5

Total Fees

– / –

Duration

1 hours

SQL: A Practical Introduction for Querying Databases

IBM - Institute of Business ManagementCertificate

Total Fees

– / –

Duration

21 hours

Full Stack Data Science Program

Jigsaw AcademyCertificate

4.8

Total Fees

– / –

Duration

31 hours

Discontinued (Sep 2024)- Certificate Course in Oracle DBA

NIELIT CalicutCertificate

Total Fees

– / –

Duration

80 hours

Checkout the Top Online SQL Courses and Certifications

Usage of “Partition By” SQL Clause

The “PARTITION BY” clause in SQL is used to specify the columns by which the data in a query should be partitioned or grouped. Here are a few examples to illustrate its usage:

1. Partitioning by a single column:

SELECT department, AVG(salary)
FROM employees
GROUP BY department
PARTITION BY department;
Copy code

2. Partitioning by multiple columns:

SELECT department, location, AVG(salary)
FROM employees
GROUP BY department, location
PARTITION BY department, location;
Copy code

3. Partitioning by a calculated value:

SELECT floor(hire_date/7) as week, AVG(salary)
FROM employees
GROUP BY floor(hire_date/7)
PARTITION BY floor(hire_date/7);
Copy code

In the above examples, the data is first partitioned based on the columns specified in the PARTITION BY clause and then grouped based on the columns specified in the GROUP BY clause. The aggregate functions (e.g., AVG, SUM) are then applied to each partition.

Examples of “Partition By” SQL Clause

Below are a few detailed examples of the PARTITION BY clause in SQL:

1. Partitioning sales data by year and product category:

SELECT year, category, SUM(sales)
FROM sales_data
GROUP BY year, category
PARTITION BY year, category;
Copy code

In this example, the “PARTITION BY” clause partitions the sales data by year and product category, and the “GROUP BY” clause groups the data by the same columns. The SUM function calculates the total sales for each partition.

2. Partitioning employee data by hire date and department:

SELECT department, hire_date, AVG(salary)
FROM employees
GROUP BY department, hire_date
PARTITION BY hire_date, department;
Copy code

In this example, the PARTITION BY clause partitions the employee data by hire date and department, and the GROUP BY clause groups the data by the same columns. The AVG function calculates the average salary for each partition.

3. Partitioning product data by month and product name:

SELECT product_name, MONTH(order_date) as month, SUM(quantity)
FROM product_orders
GROUP BY product_name, MONTH(order_date)
PARTITION BY MONTH(order_date), product_name;
FROM employees
GROUP BY department, hire_date
PARTITION BY hire_date, department;
Copy code

In this example, the “PARTITION BY” clause partitions the product data by the month of the order date and product name, and the “GROUP BY” clause groups the data by the same columns. The SUM function calculates the total quantity of each product ordered in each partition.

In each of these examples, the “PARTITION BY” clause allows for the efficient calculation of aggregate functions over subgroups of data. This can be useful for performance optimization and for organizing results in a specific way.

Applications of Partition By Clause

The “PARTITION BY” clause in SQL is used in various applications where you need to perform calculations based on subsets of data within a larger set. Here are a few common uses of the PARTITION BY clause:

Data Aggregation- The “PARTITION BY” clause can be used to perform aggregation (such as sum, average, count, etc.) on subsets of data based on specific columns. For example, you can calculate the total sales of each product in each quarter of the year.
Window Function- The “PARTITION BY” clause is used with window functions (such as ROW_NUMBER(), RANK(), DENSE_RANK(), etc.) to perform calculations based on subsets of data. For example, you can calculate the running total of sales for each product in a given time period.
Rank Calculation- The “PARTITION BY” clause can be used to calculate the rank of rows within a subset of data. For example, you can determine the rank of each employee within their department based on their salary.
Pivot Tables- The “PARTITION BY” clause can be used to create pivot tables in SQL, where you can summarize data in a compact format with rows and columns. For example, you can create a pivot table that shows the total sales of each product by region.
REPORT GENERATION- The “PARTITION BY” clause can be used to generate reports that summarize data based on specific columns. For example, you can generate a report that shows the average salary of employees by department and year of hire.

In all of these applications, the “PARTITION BY” clause is used to divide a large set of data into smaller partitions, allowing you to perform more specific calculations on each partition and make more informed decisions based on the results.

Types Of Partition By SQL Clause

As mentioned above, the “PARTITION BY” SQL clause is used in the context of window functions and is used to divide the result set into partitions or groups. Each partition is processed independently and the window function is applied to each partition.

Below are illustrated the different types of partitioning which can be done using the “PARTITION BY” clause:

1. Partition By a Single Column

In this type of partitioning, the result set is divided into partitions based on the values of a single column. For example, you can partition the result set by the values of the “Department” column.

A practical query example is as followed-

SELECT 
    EmployeeID,
    Department,
    Salary,
    SUM(Salary) OVER (PARTITION BY Department ORDER BY Salary) AS RunningTotal
FROM 
    Employees;
Copy code

In the above example, the result set is divided into partitions based on the values of the “Department” column. The SUM function calculates the running total of the salary for each department.

2. Partition By Multiple Columns

In this type of partitioning, the result set is divided into partitions based on the values of multiple columns. For example, you can partition the result set by both the “Department” and “Designation” columns.

A practical query example is as followed-

SELECT 
    EmployeeID,
    Department,
    Designation,
    Salary,
    SUM(Salary) OVER (PARTITION BY Department, Designation ORDER BY Salary) AS RunningTotal
FROM 
    Employees;
Copy code

In this example, the result set is divided into partitions based on the values of both the “Department” and “Designation” columns. The SUM function calculates the running total of the salary for each department and designation combination.

3. Partition By Expressions

In this type of partitioning, the result set is divided into partitions based on the results of a mathematical expression or user-defined function. For example, you can partition the result set by the result of an expression that calculates the total salary for each employee.

SELECT 
    EmployeeID,
    Department,
    Salary,
    (Salary + (Salary * 0.1)) AS TotalSalary,
    SUM(TotalSalary) OVER (PARTITION BY Department ORDER BY TotalSalary) AS RunningTotal
FROM 
    Employees;
Copy code

In this example, the result set is divided into partitions based on the values of the expression that calculates the total salary for each employee. The SUM function calculates the running total of the total salary for each department.

4. Partition By Range

In this type of partitioning, the result set is divided into partitions based on the values of a single column, where each partition represents a range of values. For example, you can partition the result set by the values of the “Age” column, where each partition represents a range of 5 years.

SELECT 
    EmployeeID,
    Age,
    Salary,
    SUM(Salary) OVER (PARTITION BY 
        CASE 
            WHEN Age BETWEEN 18 AND 22 THEN '18-22'
            WHEN Age BETWEEN 23 AND 27 THEN '23-27'
            ELSE '28+' 
        END
    ORDER BY Age) AS RunningTotal
FROM 
    Employees;
Copy code

In this example, the result set is divided into partitions based on the range of values of the “Age” column. The SUM function calculates the running total of the salary for each age range.

5. Partition By List

In this type of partitioning, the result set is divided into partitions based on specific values of a single column. For example, you can partition the result set by the values of the “Department” column, where each partition represents a specific department.

SELECT 
    EmployeeID,
    Department,
    Salary,
    SUM(Salary) OVER (PARTITION BY 
        CASE Department
            WHEN 'IT' THEN 'IT'
            WHEN 'HR' THEN 'HR'
            ELSE 'OTHER' 
        END
    ORDER BY Department) AS RunningTotal
FROM 
    Employees;
Copy code

In this example, the result set is divided into partitions based on specific values of the “Department” column. The SUM function calculates the running total of the salary for each department type.

Note that the “PARTITION BY” clause is optional, and if it is not specified, the entire result set is treated as a single partition.

Advantages Of Partition By SQL Clause

The “PARTITION BY” clause in SQL has several advantages, including:

Improved Performance: Partitioning the result set can significantly improve the performance of window functions, especially when the result set is large and complex. This is because window functions are processed independently for each partition, reducing the amount of data that needs to be processed.
Better Organization: Partitioning the result set can help organize the data into meaningful groups, making it easier to understand and analyze.
Increased Flexibility: Partitioning the result set can allow for more complex calculations, as you can apply different window functions to different partitions.
Improved Readibility: Partitioning the result set can make the SQL code more readable, as it separates the calculations for each partition into separate sections.
Easier Maintainance: Partitioning the result set can make the SQL code easier to maintain, as it reduces the complexity of the calculations and makes it easier to understand and modify the code.
Better Scalability: Partitioning the result set can make the SQL code more scalable, as it reduces the amount of data that needs to be processed, making it easier to scale the calculations as the data grows.

Please Checkout More SQL Blogs

How to Find Nth Highest Salary in SQL

Finding out the N’th highest salary from a table is one of the most frequently asked SQL interview questions. In this article, we will discuss four different approaches to find...read more

Read Later

Order of Execution in SQL

An SQL query comprises of various clauses like SELECT, FROM, WHERE, GROUPBY, HAVING, and ORDERBY clauses. Each clause has a specific role in the query. The correct order of execution...read more

Read Later

How to Find Second Highest Salary in SQL

It's crucial to master SQL queries for managing and analyzing databases. This article focuses on finding the second-highest salary in SQL, which is a common yet important task in database...read more

Read Later

Using Partition By SQL Clause

The Partition By SQL clause is a subclause of OVER clause that is used in every invocation of window functions such as MAX(), RANK() and AVG(). In SQL, the “PARTITION...read more

Read Later

Conclusion

The “PARTITION BY” clause in SQL is used to divide a result set into partitions based on the values of one or more columns. These partitions can then be used to perform calculations and aggregate functions, such as running totals or cumulative sums, using window functions. This can greatly improve the performance of these calculations, as well as make the SQL code more readable, flexible, and scalable.

“PARTITION BY” SQL clause is an important tool for data analysis and reporting, allowing for more complex and meaningful calculations to be performed on large and complex result sets. Overall, the “PARTITION BY” clause can greatly enhance the functionality and performance of window functions in SQL, making it a valuable tool for data analysis and reporting.

Explore free data analysis courses

Contributed by: Nimisha

About the Author

Vikram Singh

Assistant Manager - Content

Vikram has a Postgraduate degree in Applied Mathematics, with a keen interest in Data Science and Machine Learning. He has experience of 2+ years in content creation in Mathematics, Statistics, Data Science, and Mac... Read Full Bio

Using Partition By SQL Clause

Table of Contents

What is Partition By SQL Clause?

Best-suited Database and SQL courses for you

Databases and SQL for Data Science with Python

Learn SQL Basics for Data Science Specialization

MySQL -A Practical Approach

SQL for Data Science

SQL for Beginners

SQL Injection

SQL Injection Attacks

SQL: A Practical Introduction for Querying Databases

Full Stack Data Science Program

Discontinued (Sep 2024)- Certificate Course in Oracle DBA

Checkout the Top Online SQL Courses and Certifications

Usage of “Partition By” SQL Clause

1. Partitioning by a single column:

2. Partitioning by multiple columns:

3. Partitioning by a calculated value:

Examples of “Partition By” SQL Clause

1. Partitioning sales data by year and product category:

2. Partitioning employee data by hire date and department:

3. Partitioning product data by month and product name:

Applications of Partition By Clause

Types Of Partition By SQL Clause

1. Partition By a Single Column

2. Partition By Multiple Columns

3. Partition By Expressions

4. Partition By Range

5. Partition By List

Advantages Of Partition By SQL Clause

Conclusion

Top Picks & New Arrivals