Last modified: November 07, 2024
This article is written in: 🇺🇸
Aggregate functions in SQL are powerful tools that allow you to perform calculations on a set of values to return a single scalar value. They are commonly used with the GROUP BY
clause to group rows that share a common attribute and then perform calculations on each group. Aggregate functions are essential for data analysis, reporting, and generating insights from your data.
After reading the material, you should be able to answer the following questions:
GROUP BY
clause?COUNT
function work, and what is the difference between COUNT(*)
and COUNT(column)
?SUM
, AVG
, MIN
, and MAX
functions, and how do they operate on data?HAVING
clause, and how does it differ from the WHERE
clause when filtering aggregated data?Here are the most commonly used aggregate functions in SQL:
COUNT
function calculates the number of rows in a dataset that meet a specified condition or the total number of rows when no condition is provided.SUM
function adds together all the numeric values in a specified column.AVG
function computes the average of all numeric values in a column by dividing the sum of the values by the number of non-NULL entries.MIN
function identifies the smallest value in a dataset, including numeric, string, and date types.MAX
function returns the largest value in a dataset, similar to MIN
, and supports numeric, string, and date types.Suppose we have two tables: Employees
and Departments
.
Employees Table
EmployeeID | FirstName | LastName | DepartmentID | Salary |
1 | John | Doe | 1 | 60000 |
2 | Jane | Smith | 1 | 65000 |
3 | Mike | Johnson | 2 | 70000 |
4 | Emily | Davis | 2 | 72000 |
5 | David | Wilson | 3 | 55000 |
Departments Table
DepartmentID | DepartmentName |
1 | Human Resources |
2 | Engineering |
3 | Marketing |
The COUNT
function returns the number of rows that match a specified condition.
SELECT COUNT(*) AS TotalEmployees
FROM Employees;
Result
TotalEmployees |
5 |
SELECT DepartmentID, COUNT(*) AS NumberOfEmployees
FROM Employees
GROUP BY DepartmentID;
Result
DepartmentID | NumberOfEmployees |
1 | 2 |
2 | 2 |
3 | 1 |
GROUP BY
clause groups the rows by DepartmentID
.COUNT(*)
function counts the number of employees in each department.The SUM
function adds up all the values in a numeric column.
SELECT SUM(Salary) AS TotalSalary
FROM Employees;
Result
TotalSalary |
322000 |
SELECT DepartmentID, SUM(Salary) AS TotalSalary
FROM Employees
GROUP BY DepartmentID;
Result
DepartmentID | TotalSalary |
1 | 125000 |
2 | 142000 |
3 | 55000 |
The SUM(Salary)
function calculates the total salary for each department.
The AVG
function calculates the average value of a numeric column.
SELECT AVG(Salary) AS AverageSalary
FROM Employees;
Result
AverageSalary |
64400 |
SELECT DepartmentID, AVG(Salary) AS AverageSalary
FROM Employees
GROUP BY DepartmentID;
Result
DepartmentID | AverageSalary |
1 | 62500 |
2 | 71000 |
3 | 55000 |
The AVG(Salary)
function computes the average salary for each department.
The MIN
and MAX
functions return the smallest and largest values in a set, respectively.
SELECT MIN(Salary) AS MinimumSalary, MAX(Salary) AS MaximumSalary
FROM Employees;
Result
MinimumSalary | MaximumSalary |
55000 | 72000 |
SELECT DepartmentID, MIN(Salary) AS MinimumSalary, MAX(Salary) AS MaximumSalary
FROM Employees
GROUP BY DepartmentID;
Result
DepartmentID | MinimumSalary | MaximumSalary |
1 | 60000 | 65000 |
2 | 70000 | 72000 |
3 | 55000 | 55000 |
The MIN(Salary)
and MAX(Salary)
functions find the lowest and highest salaries in each department.
The GROUP BY
clause is used with aggregate functions to group the result set by one or more columns.
To make the results more readable, let's join the Employees
and Departments
tables.
SELECT d.DepartmentName, COUNT(*) AS NumberOfEmployees
FROM Employees e
JOIN Departments d ON e.DepartmentID = d.DepartmentID
GROUP BY d.DepartmentName;
Result
DepartmentName | NumberOfEmployees |
Human Resources | 2 |
Engineering | 2 |
Marketing | 1 |
JOIN
clause combines the Employees
and Departments
tables.GROUP BY
clause groups the results by DepartmentName
.The HAVING
clause is used to filter groups based on a condition, similar to how the WHERE
clause filters rows.
SELECT DepartmentID, COUNT(*) AS NumberOfEmployees
FROM Employees
GROUP BY DepartmentID
HAVING COUNT(*) > 1;
Result
DepartmentID | NumberOfEmployees |
1 | 2 |
2 | 2 |
The HAVING
clause filters groups where the count of employees is greater than one.
You can use multiple aggregate functions in a single query to get comprehensive insights.
SELECT
d.DepartmentName,
COUNT(*) AS NumberOfEmployees,
MIN(e.Salary) AS MinimumSalary,
MAX(e.Salary) AS MaximumSalary,
AVG(e.Salary) AS AverageSalary,
SUM(e.Salary) AS TotalSalary
FROM Employees e
JOIN Departments d ON e.DepartmentID = d.DepartmentID
GROUP BY d.DepartmentName;
Result
DepartmentName | NumberOfEmployees | MinimumSalary | MaximumSalary | AverageSalary | TotalSalary |
Human Resources | 2 | 60000 | 65000 | 62500 | 125000 |
Engineering | 2 | 70000 | 72000 | 71000 | 142000 |
Marketing | 1 | 55000 | 55000 | 55000 | 55000 |
The query provides a comprehensive overview of salary statistics for each department.
Aggregate functions generally ignore NULL
values except for the COUNT(*)
function.
Suppose we have an additional employee with a NULL
salary.
Updated Employees Table
EmployeeID | FirstName | LastName | DepartmentID | Salary |
6 | Susan | Miller | 1 | NULL |
SELECT DepartmentID, AVG(Salary) AS AverageSalary
FROM Employees
GROUP BY DepartmentID;
Result
DepartmentID | AverageSalary |
1 | 62500 |
2 | 71000 |
3 | 55000 |
AVG
function ignores the NULL
salary for Susan Miller.The DISTINCT
keyword can be used inside aggregate functions to consider only unique values.
SELECT COUNT(DISTINCT Salary) AS UniqueSalaries
FROM Employees;
Result
UniqueSalaries |
5 |
The COUNT(DISTINCT Salary)
function counts the number of unique salary values, excluding NULL
.
Aggregate functions can be used in subqueries to compare individual rows to aggregate values.
SELECT FirstName, LastName, Salary
FROM Employees
WHERE Salary > (
SELECT AVG(Salary)
FROM Employees
);
Result
FirstName | LastName | Salary |
Mike | Johnson | 70000 |
Emily | Davis | 72000 |
In addition to aggregate functions, SQL supports window functions that perform calculations across a set of rows related to the current row.
SELECT
EmployeeID,
FirstName,
LastName,
Salary,
SUM(Salary) OVER (ORDER BY EmployeeID) AS RunningTotal
FROM Employees;
Result
EmployeeID | FirstName | LastName | Salary | RunningTotal |
1 | John | Doe | 60000 | 60000 |
2 | Jane | Smith | 65000 | 125000 |
3 | Mike | Johnson | 70000 | 195000 |
4 | Emily | Davis | 72000 | 267000 |
5 | David | Wilson | 55000 | 322000 |
6 | Susan | Miller | NULL | NULL |
SUM(Salary) OVER (ORDER BY EmployeeID)
calculates a running total of salaries.NULL
values are handled according to the window function's rules.WHERE
clause to filter rows before performing aggregation, ensuring only relevant data is included in calculations.SUM
, AVG
) ignore NULL
values, which might lead to unexpected results.GROUP BY
clause to only those essential for your analysis, as excessive grouping can increase query complexity and runtime.HAVING
clause to filter groups after aggregation, allowing conditions based on aggregated results.