Last modified: June 17, 2024
This article is written in: 🇺🇸
Welcome to the world of SQL, where you can communicate with databases using simple, yet powerful commands. SQL, which stands for Structured Query Language, is a standardized language designed specifically for managing and querying relational databases.
After reading the material, you should be able to answer the following questions:
At its core, SQL allows you to perform a variety of operations on data stored in relational databases. These databases organize data into tables, which consist of rows and columns, much like a spreadsheet. Each table represents a specific entity, such as customers or orders, and each column represents an attribute of that entity.
Imagine a simple table of customers:
+----+-----------+-----------+-------------------+
| ID | FirstName | LastName | Email |
+----+-----------+-----------+-------------------+
| 1 | Alice | Smith | alice@example.com |
| 2 | Bob | Johnson | bob@example.com |
| 3 | Carol | Williams | carol@example.com |
+----+-----------+-----------+-------------------+
This table, named Customers
, stores basic information about each customer. SQL allows you to interact with this table in various ways, such as retrieving all customers, finding a customer by email, or adding a new customer to the list.
Before diving into writing SQL commands, it's important to grasp some fundamental concepts that form the foundation of SQL and relational databases.
+------------------------------------------------------------+
| Schema |
| (Organizational Framework) |
| |
| +----------------------+ +-------------------------+ |
| | Table 1 | | Table 2 | |
| |----------------------| |-------------------------| |
| | ID | Name | | ProductID | ProductName | |
| |--------|-------------| |-----------|-------------| |
| | 1 | Alice | | 101 | Widget | |
| | 2 | Bob | | 102 | Gizmo | |
| | 3 | Charlie | | 103 | Thingamajig | |
| +----------------------+ +-------------------------+ |
| |
| +----------------------+ |
| | Table 3 | |
| |----------------------| |
| | OrderID | CustomerID | |
| |---------|------------| |
| | 5001 | 1 | |
| | 5002 | 2 | |
| | 5003 | 3 | |
| +----------------------+ |
| |
+------------------------------------------------------------+
Every column in a table has a data type, which defines the kind of data it can store. Common data types include:
n
characters.Understanding data types is crucial for defining tables and ensuring data integrity.
These keys are essential for maintaining relationships and enforcing data integrity across tables.
SQL queries are statements that retrieve data from one or more tables. The most common SQL command is SELECT
, which allows you to specify which columns of data you want to retrieve.
The basic syntax of a SELECT
statement is:
SELECT column1, column2 FROM table_name;
For example, to retrieve all first names and last names from the Customers
table:
SELECT FirstName, LastName FROM Customers;
Output:
+-----------+-----------+
| FirstName | LastName |
+-----------+-----------+
| Alice | Smith |
| Bob | Johnson |
| Carol | Williams |
+-----------+-----------+
The WHERE
clause filters records based on specified conditions.
SELECT * FROM Customers WHERE LastName = 'Johnson';
Output:
+----+-----------+----------+-----------------+
| ID | FirstName | LastName | Email |
+----+-----------+----------+-----------------+
| 2 | Bob | Johnson | bob@example.com |
+----+-----------+----------+-----------------+
You can sort the results using the ORDER BY
clause.
SELECT * FROM Customers ORDER BY LastName ASC;
This query retrieves all customers sorted alphabetically by last name.
To limit the number of records returned, use the LIMIT
clause (Note: Some SQL implementations use TOP
instead).
SELECT * FROM Customers LIMIT 2;
Output:
+----+-----------+-----------+-----------------+
| ID | FirstName | LastName | Email |
+----+-----------+-----------+-----------------+
| 1 | Alice | Smith | alice@example.com |
| 2 | Bob | Johnson | bob@example.com |
+----+-----------+-----------+-----------------+
One of the powerful features of SQL is the ability to combine data from multiple tables using joins. This is essential when dealing with normalized databases where related data is stored in separate tables.
NULL
.NULL
.NULL
in unmatched columns.Suppose we have an Orders
table:
+----+------------+-----------+
| ID | CustomerID | OrderDate |
+----+------------+-----------+
| 1 | 1 | 2023-01-15|
| 2 | 2 | 2023-01-17|
| 3 | 1 | 2023-01-20|
+----+------------+-----------+
To retrieve orders along with customer names:
SELECT Customers.FirstName, Customers.LastName, Orders.OrderDate
FROM Customers
INNER JOIN Orders ON Customers.ID = Orders.CustomerID;
Output:
+-----------+----------+------------+
| FirstName | LastName | OrderDate |
+-----------+----------+------------+
| Alice | Smith | 2023-01-15 |
| Bob | Johnson | 2023-01-17 |
| Alice | Smith | 2023-01-20 |
+-----------+----------+------------+
This query joins the Customers
and Orders
tables on the matching ID
and CustomerID
fields, displaying a combined view of data from both tables.
SQL allows you to write queries within queries, known as subqueries, and to define temporary result sets using Common Table Expressions (CTEs).
A subquery is a query nested inside another SQL statement.
SELECT FirstName, LastName
FROM Customers
WHERE ID IN (SELECT CustomerID FROM Orders WHERE OrderDate > '2023-01-16');
This query retrieves customers who have placed orders after January 16, 2023.
CTEs provide a way to define temporary named result sets that can be referenced within the main query.
WITH RecentOrders AS (
SELECT CustomerID, OrderDate
FROM Orders
WHERE OrderDate > '2023-01-16'
)
SELECT Customers.FirstName, Customers.LastName, RecentOrders.OrderDate
FROM Customers
JOIN RecentOrders ON Customers.ID = RecentOrders.CustomerID;
This achieves the same result as the previous subquery but can be more readable, especially with complex queries.
SQL provides functions to perform calculations on sets of rows, such as counting records, calculating averages, and summing values.
Common aggregate functions include:
NULL
values based on the context.The GROUP BY
clause groups rows that have the same values in specified columns, allowing aggregate functions to be applied to each group.
SELECT CustomerID, COUNT(*) AS OrderCount
FROM Orders
GROUP BY CustomerID;
Output:
+------------+------------+
| CustomerID | OrderCount |
+------------+------------+
| 1 | 2 |
| 2 | 1 |
+------------+------------+
This query counts the number of orders placed by each customer.
In SQL, NULL
represents missing or unknown data. It's important to handle NULL
values properly to avoid unexpected results.
Use the IS NULL
and IS NOT NULL
operators.
SELECT * FROM Customers WHERE Email IS NULL;
This retrieves customers who do not have an email address on file.
When performing joins, NULL
values can affect the results. For example, a customer without orders may not appear in an INNER JOIN
. Using a LEFT JOIN
ensures all customers are included.
SELECT Customers.FirstName, Orders.OrderDate
FROM Customers
LEFT JOIN Orders ON Customers.ID = Orders.CustomerID;
This retrieves all customers, along with their orders if they have any.
SQL is standardized by ANSI (American National Standards Institute) and ISO (International Organization for Standardization), ensuring consistency across different database systems. However, each database system may have its own extensions and variations.
Open Source Databases:
Proprietary Databases:
Understanding the specific features and syntax variations of each database system is important when working in different environments.
While the SQL standard defines a set of data types, each database system may implement additional types or have variations.
Data Type | Description |
INTEGER | Whole numbers |
DECIMAL(p,s) | Fixed-point numbers with precision p and scale s |
VARCHAR(n) | Variable-length strings up to n characters |
DATE | Date values (year, month, day) |
TIMESTAMP | Date and time values |
BOOLEAN | Logical true or false |
TINYINT
for small integers, TEXT
for large text data, BLOB
for binary large objects, and ENUM
for predefined string values.SERIAL
for auto-incrementing integers, ARRAY
types to store arrays, and JSONB
for efficiently storing and querying JSON data.UNIQUEIDENTIFIER
for globally unique identifiers (GUIDs), MONEY
for precise currency values, and NVARCHAR
for storing Unicode strings.Beyond querying data, SQL provides commands to insert, update, and delete records.
INSERT INTO Customers (FirstName, LastName, Email)
VALUES ('David', 'Brown', 'david@example.com');
This adds a new customer to the Customers
table.
UPDATE Customers
SET Email = 'alice.smith@example.com'
WHERE ID = 1;
This updates Alice's email address in the Customers
table.
DELETE FROM Customers WHERE ID = 2;
This removes the customer with ID
2 from the table.
SQL also allows you to define the structure of your database using Data Definition Language (DDL) statements.
CREATE TABLE Products (
ID INT PRIMARY KEY,
Name VARCHAR(100),
Price DECIMAL(10,2),
Stock INT
);
This command creates a new table named Products
with specified columns and data types.
ALTER TABLE Products ADD Description TEXT;
This adds a new column Description
to the Products
table.
DROP TABLE Products;
This command deletes the Products
table and all of its data.
Stored procedures and functions are blocks of SQL code that can be saved and reused, allowing for more complex operations and logic.
CREATE PROCEDURE GetCustomerOrders(IN custID INT)
BEGIN
SELECT * FROM Orders WHERE CustomerID = custID;
END;
This procedure retrieves all orders for a given customer ID.
CALL GetCustomerOrders(1);
This executes the procedure with custID
equal to 1.
Transactions ensure that a series of SQL commands either all succeed or all fail, maintaining database integrity.
START TRANSACTION;
COMMIT;
ROLLBACK;
Transactions are essential for operations that involve multiple steps, ensuring that the database doesn't end up in an inconsistent state if an error occurs.
Indexes improve the speed of data retrieval by providing quick access paths to data within tables. By creating indexes on frequently searched columns, the database can locate and retrieve the desired rows more efficiently, reducing the need to scan entire tables. This optimization is crucial for enhancing the performance of queries, especially in large databases where full table scans can be time-consuming. Additionally, indexes can help enforce uniqueness and improve the performance of join operations by providing efficient pathways between related tables.
CREATE INDEX idx_lastname ON Customers(LastName);
This SQL command creates an index named idx_lastname
on the LastName
column of the Customers
table. By indexing the LastName
column, the database can quickly locate records based on last names without scanning the entire table. For example, when executing a query that searches for customers with a specific last name, the database will use the idx_lastname
index to find matching records more rapidly. The expected result of this command is the creation of the index, which can be confirmed by querying the database's metadata or using database management tools to view existing indexes on the Customers
table.
After executing the CREATE INDEX
command, you can verify the creation of the index by running the following command:
SHOW INDEX FROM Customers;
Sample Output:
Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
Customers | 1 | idx_lastname | 1 | LastName | A | 1500 | NULL | NULL | YES | BTREE |
This output indicates that the idx_lastname
index has been successfully created on the LastName
column of the Customers
table. The Cardinality
value suggests the number of unique values in the index, which helps in assessing the index's effectiveness.
Most database systems provide tools to analyze how queries are executed, known as query plans. These plans detail the steps the database engine takes to execute a query, including which indexes are used, the order of operations, and the methods of data retrieval. By examining query plans, developers and database administrators can identify bottlenecks, such as unnecessary full table scans or inefficient join operations, and make informed decisions to optimize performance. Analyzing query plans is essential for understanding how changes to indexes or query structure can impact overall database performance.
EXPLAIN SELECT * FROM Customers WHERE LastName = 'Smith';
When this command is executed, the database generates a query plan that outlines how it will retrieve the requested data. The expected result is a detailed breakdown showing whether the idx_lastname
index is utilized to find records where the LastName
is 'Smith'. The query plan might indicate the use of an index scan, the estimated number of rows to be processed, and the cost associated with the operation. For instance, the output may show that the index is used to perform an efficient lookup, significantly reducing the query execution time compared to a full table scan. By analyzing this information, one can verify that the index is effectively enhancing query performance or identify areas where further optimization may be necessary.
After running the EXPLAIN
command, you might receive an output similar to the following:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
1 | SIMPLE | Customers | ref | idx_lastname | idx_lastname | 767 | const | 10 | Using index |
This output indicates that the idx_lastname
index is being used to execute the query. The type
column shows a ref
type access, which is efficient and typically means the index is being utilized correctly. The rows
column estimates that only 10 rows need to be examined to satisfy the query, demonstrating a significant performance improvement over a full table scan. The Extra
column mentioning Using index
further confirms that the index is effectively optimizing the query execution.
Window functions perform calculations across sets of rows related to the current row.
SELECT
OrderID,
CustomerID,
OrderDate,
SUM(Amount) OVER (PARTITION BY CustomerID) AS TotalSpent
FROM Orders;
This query calculates the total amount each customer has spent across all their orders.
Partitioning divides large tables into smaller, more manageable pieces, improving performance and maintenance.
Replication involves copying data between databases to improve availability and reliability.
Understanding common issues in SQL can prevent errors and improve data integrity.