Last modified: November 07, 2024
This article is written in: 🇺🇸
Aggregate functions in SQL are powerful tools that allow you to perform calculations on a set of values to return a single scalar value. They are commonly used with the GROUP BY clause to group rows that share a common attribute and then perform calculations on each group. Aggregate functions are essential for data analysis, reporting, and generating insights from your data.
After reading the material, you should be able to answer the following questions:
GROUP BY clause?COUNT function work, and what is the difference between COUNT(*) and COUNT(column)?SUM, AVG, MIN, and MAX functions, and how do they operate on data?HAVING clause, and how does it differ from the WHERE clause when filtering aggregated data?Here are the most commonly used aggregate functions in SQL:
COUNT function calculates the number of rows in a dataset that meet a specified condition or the total number of rows when no condition is provided.SUM function adds together all the numeric values in a specified column.AVG function computes the average of all numeric values in a column by dividing the sum of the values by the number of non-NULL entries.MIN function identifies the smallest value in a dataset, including numeric, string, and date types.MAX function returns the largest value in a dataset, similar to MIN, and supports numeric, string, and date types.Suppose we have two tables: Employees and Departments.
Employees Table
| EmployeeID | FirstName | LastName | DepartmentID | Salary |
| 1 | John | Doe | 1 | 60000 |
| 2 | Jane | Smith | 1 | 65000 |
| 3 | Mike | Johnson | 2 | 70000 |
| 4 | Emily | Davis | 2 | 72000 |
| 5 | David | Wilson | 3 | 55000 |
Departments Table
| DepartmentID | DepartmentName |
| 1 | Human Resources |
| 2 | Engineering |
| 3 | Marketing |
The COUNT function returns the number of rows that match a specified condition.
SELECT COUNT(*) AS TotalEmployees
FROM Employees;
Result
| TotalEmployees |
| 5 |
SELECT DepartmentID, COUNT(*) AS NumberOfEmployees
FROM Employees
GROUP BY DepartmentID;
Result
| DepartmentID | NumberOfEmployees |
| 1 | 2 |
| 2 | 2 |
| 3 | 1 |
GROUP BY clause groups the rows by DepartmentID.COUNT(*) function counts the number of employees in each department.The SUM function adds up all the values in a numeric column.
SELECT SUM(Salary) AS TotalSalary
FROM Employees;
Result
| TotalSalary |
| 322000 |
SELECT DepartmentID, SUM(Salary) AS TotalSalary
FROM Employees
GROUP BY DepartmentID;
Result
| DepartmentID | TotalSalary |
| 1 | 125000 |
| 2 | 142000 |
| 3 | 55000 |
The SUM(Salary) function calculates the total salary for each department.
The AVG function calculates the average value of a numeric column.
SELECT AVG(Salary) AS AverageSalary
FROM Employees;
Result
| AverageSalary |
| 64400 |
SELECT DepartmentID, AVG(Salary) AS AverageSalary
FROM Employees
GROUP BY DepartmentID;
Result
| DepartmentID | AverageSalary |
| 1 | 62500 |
| 2 | 71000 |
| 3 | 55000 |
The AVG(Salary) function computes the average salary for each department.
The MIN and MAX functions return the smallest and largest values in a set, respectively.
SELECT MIN(Salary) AS MinimumSalary, MAX(Salary) AS MaximumSalary
FROM Employees;
Result
| MinimumSalary | MaximumSalary |
| 55000 | 72000 |
SELECT DepartmentID, MIN(Salary) AS MinimumSalary, MAX(Salary) AS MaximumSalary
FROM Employees
GROUP BY DepartmentID;
Result
| DepartmentID | MinimumSalary | MaximumSalary |
| 1 | 60000 | 65000 |
| 2 | 70000 | 72000 |
| 3 | 55000 | 55000 |
The MIN(Salary) and MAX(Salary) functions find the lowest and highest salaries in each department.
The GROUP BY clause is used with aggregate functions to group the result set by one or more columns.
To make the results more readable, let's join the Employees and Departments tables.
SELECT d.DepartmentName, COUNT(*) AS NumberOfEmployees
FROM Employees e
JOIN Departments d ON e.DepartmentID = d.DepartmentID
GROUP BY d.DepartmentName;
Result
| DepartmentName | NumberOfEmployees |
| Human Resources | 2 |
| Engineering | 2 |
| Marketing | 1 |
JOIN clause combines the Employees and Departments tables.GROUP BY clause groups the results by DepartmentName.The HAVING clause is used to filter groups based on a condition, similar to how the WHERE clause filters rows.
SELECT DepartmentID, COUNT(*) AS NumberOfEmployees
FROM Employees
GROUP BY DepartmentID
HAVING COUNT(*) > 1;
Result
| DepartmentID | NumberOfEmployees |
| 1 | 2 |
| 2 | 2 |
The HAVING clause filters groups where the count of employees is greater than one.
You can use multiple aggregate functions in a single query to get comprehensive insights.
SELECT
d.DepartmentName,
COUNT(*) AS NumberOfEmployees,
MIN(e.Salary) AS MinimumSalary,
MAX(e.Salary) AS MaximumSalary,
AVG(e.Salary) AS AverageSalary,
SUM(e.Salary) AS TotalSalary
FROM Employees e
JOIN Departments d ON e.DepartmentID = d.DepartmentID
GROUP BY d.DepartmentName;
Result
| DepartmentName | NumberOfEmployees | MinimumSalary | MaximumSalary | AverageSalary | TotalSalary |
| Human Resources | 2 | 60000 | 65000 | 62500 | 125000 |
| Engineering | 2 | 70000 | 72000 | 71000 | 142000 |
| Marketing | 1 | 55000 | 55000 | 55000 | 55000 |
The query provides a comprehensive overview of salary statistics for each department.
Aggregate functions generally ignore NULL values except for the COUNT(*) function.
Suppose we have an additional employee with a NULL salary.
Updated Employees Table
| EmployeeID | FirstName | LastName | DepartmentID | Salary |
| 6 | Susan | Miller | 1 | NULL |
SELECT DepartmentID, AVG(Salary) AS AverageSalary
FROM Employees
GROUP BY DepartmentID;
Result
| DepartmentID | AverageSalary |
| 1 | 62500 |
| 2 | 71000 |
| 3 | 55000 |
AVG function ignores the NULL salary for Susan Miller.The DISTINCT keyword can be used inside aggregate functions to consider only unique values.
SELECT COUNT(DISTINCT Salary) AS UniqueSalaries
FROM Employees;
Result
| UniqueSalaries |
| 5 |
The COUNT(DISTINCT Salary) function counts the number of unique salary values, excluding NULL.
Aggregate functions can be used in subqueries to compare individual rows to aggregate values.
SELECT FirstName, LastName, Salary
FROM Employees
WHERE Salary > (
SELECT AVG(Salary)
FROM Employees
);
Result
| FirstName | LastName | Salary |
| Mike | Johnson | 70000 |
| Emily | Davis | 72000 |
In addition to aggregate functions, SQL supports window functions that perform calculations across a set of rows related to the current row.
SELECT
EmployeeID,
FirstName,
LastName,
Salary,
SUM(Salary) OVER (ORDER BY EmployeeID) AS RunningTotal
FROM Employees;
Result
| EmployeeID | FirstName | LastName | Salary | RunningTotal |
| 1 | John | Doe | 60000 | 60000 |
| 2 | Jane | Smith | 65000 | 125000 |
| 3 | Mike | Johnson | 70000 | 195000 |
| 4 | Emily | Davis | 72000 | 267000 |
| 5 | David | Wilson | 55000 | 322000 |
| 6 | Susan | Miller | NULL | NULL |
SUM(Salary) OVER (ORDER BY EmployeeID) calculates a running total of salaries.NULL values are handled according to the window function's rules.WHERE clause to filter rows before performing aggregation, ensuring only relevant data is included in calculations.SUM, AVG) ignore NULL values, which might lead to unexpected results.GROUP BY clause to only those essential for your analysis, as excessive grouping can increase query complexity and runtime.HAVING clause to filter groups after aggregation, allowing conditions based on aggregated results.