Introduction to SQL

Last modified: June 17, 2024

This article is written in: 🇺🇸

Introduction to SQL

Welcome to the world of SQL, where you can communicate with databases using simple, yet powerful commands. SQL, which stands for Structured Query Language, is a standardized language designed specifically for managing and querying relational databases.

After reading the material, you should be able to answer the following questions:

What are the fundamental components and data types in SQL, and how are they used to structure and manage data within relational databases?
How can you construct and execute basic SQL queries to retrieve, filter, sort, and limit data?
What are primary and foreign keys, and how do they enforce relationships and data integrity within a database?
How do joins, subqueries, and common table expressions (CTEs) function to retrieve and manipulate data across multiple tables?
What are the techniques for optimizing SQL queries, including the use of indexes, understanding query plans, and managing transactions and concurrency control?

Understanding the Basics

At its core, SQL allows you to perform a variety of operations on data stored in relational databases. These databases organize data into tables, which consist of rows and columns, much like a spreadsheet. Each table represents a specific entity, such as customers or orders, and each column represents an attribute of that entity.

Imagine a simple table of customers:

+----+-----------+-----------+-------------------+
| ID | FirstName | LastName  | Email             |
+----+-----------+-----------+-------------------+
| 1  | Alice     | Smith     | alice@example.com |
| 2  | Bob       | Johnson   | bob@example.com   |
| 3  | Carol     | Williams  | carol@example.com |
+----+-----------+-----------+-------------------+

This table, named Customers, stores basic information about each customer. SQL allows you to interact with this table in various ways, such as retrieving all customers, finding a customer by email, or adding a new customer to the list.

Concepts in SQL

Before diving into writing SQL commands, it's important to grasp some fundamental concepts that form the foundation of SQL and relational databases.

Tables and Schemas

Tables are structured collections of related data, organized into rows (records) and columns (attributes), serving as the primary storage units in a database.
Schemas are organizational frameworks within a database, grouping related tables, views, indexes, and other objects to manage and structure data effectively.

+------------------------------------------------------------+
|                      Schema                                |
|           (Organizational Framework)                       |
|                                                            |
|  +----------------------+      +-------------------------+ |
|  |        Table 1       |      |        Table 2          | |
|  |----------------------|      |-------------------------| |
|  | ID     | Name        |      | ProductID | ProductName | |
|  |--------|-------------|      |-----------|-------------| |
|  | 1      | Alice       |      | 101       | Widget      | |
|  | 2      | Bob         |      | 102       | Gizmo       | |
|  | 3      | Charlie     |      | 103       | Thingamajig | |
|  +----------------------+      +-------------------------+ |
|                                                            |
|  +----------------------+                                  |
|  |        Table 3       |                                  |
|  |----------------------|                                  |
|  | OrderID | CustomerID |                                  |
|  |---------|------------|                                  |
|  | 5001    | 1          |                                  |
|  | 5002    | 2          |                                  |
|  | 5003    | 3          |                                  |
|  +----------------------+                                  |
|                                                            |
+------------------------------------------------------------+

Data Types

Every column in a table has a data type, which defines the kind of data it can store. Common data types include:

INTEGER is used to store whole numbers without decimal points, suitable for numeric data like counts or IDs.
VARCHAR(n) stores variable-length character strings, allowing text with a maximum length of n characters.
DATE is designed to store calendar dates in the format year-month-day, enabling operations on date values.
BOOLEAN is used to represent true or false values, ideal for binary states or conditions.

Understanding data types is crucial for defining tables and ensuring data integrity.

Primary and Foreign Keys

Primary Key is a column or set of columns in a table that uniquely identifies each row, ensuring that no duplicate or null values exist for the key.
Foreign Key is a column in one table that references the primary key in another table, creating and enforcing a relationship between the two tables.

These keys are essential for maintaining relationships and enforcing data integrity across tables.

Writing Basic SQL Queries

SQL queries are statements that retrieve data from one or more tables. The most common SQL command is SELECT, which allows you to specify which columns of data you want to retrieve.

The SELECT Statement

The basic syntax of a SELECT statement is:

SELECT column1, column2 FROM table_name;

For example, to retrieve all first names and last names from the Customers table:

SELECT FirstName, LastName FROM Customers;

Output:

+-----------+-----------+
| FirstName | LastName  |
+-----------+-----------+
| Alice     | Smith     |
| Bob       | Johnson   |
| Carol     | Williams  |
+-----------+-----------+

Filtering Data with WHERE

The WHERE clause filters records based on specified conditions.

SELECT * FROM Customers WHERE LastName = 'Johnson';

Output:

+----+-----------+----------+-----------------+
| ID | FirstName | LastName | Email           |
+----+-----------+----------+-----------------+
| 2  | Bob       | Johnson  | bob@example.com |
+----+-----------+----------+-----------------+

Sorting Results with ORDER BY

You can sort the results using the ORDER BY clause.

SELECT * FROM Customers ORDER BY LastName ASC;

This query retrieves all customers sorted alphabetically by last name.

Limiting Results with LIMIT

To limit the number of records returned, use the LIMIT clause (Note: Some SQL implementations use TOP instead).

SELECT * FROM Customers LIMIT 2;

Output:

+----+-----------+-----------+-----------------+
| ID | FirstName | LastName  | Email           |
+----+-----------+-----------+-----------------+
| 1  | Alice     | Smith     | alice@example.com |
| 2  | Bob       | Johnson   | bob@example.com   |
+----+-----------+-----------+-----------------+

Joining Tables

One of the powerful features of SQL is the ability to combine data from multiple tables using joins. This is essential when dealing with normalized databases where related data is stored in separate tables.

Types of Joins

INNER JOIN returns only the records where there is a match between the specified columns in both tables.
LEFT JOIN includes all records from the left table and the matched records from the right table, with unmatched right table data appearing as NULL.
RIGHT JOIN includes all records from the right table and the matched records from the left table, with unmatched left table data appearing as NULL.
FULL OUTER JOIN combines the results of both left and right joins, returning all records from both tables, with NULL in unmatched columns.

Example of an INNER JOIN

Suppose we have an Orders table:

+----+------------+-----------+
| ID | CustomerID | OrderDate |
+----+------------+-----------+
| 1  | 1          | 2023-01-15|
| 2  | 2          | 2023-01-17|
| 3  | 1          | 2023-01-20|
+----+------------+-----------+

To retrieve orders along with customer names:

SELECT Customers.FirstName, Customers.LastName, Orders.OrderDate
FROM Customers
INNER JOIN Orders ON Customers.ID = Orders.CustomerID;

Output:

+-----------+----------+------------+
| FirstName | LastName | OrderDate  |
+-----------+----------+------------+
| Alice     | Smith    | 2023-01-15 |
| Bob       | Johnson  | 2023-01-17 |
| Alice     | Smith    | 2023-01-20 |
+-----------+----------+------------+

This query joins the Customers and Orders tables on the matching ID and CustomerID fields, displaying a combined view of data from both tables.

Subqueries and Common Table Expressions

SQL allows you to write queries within queries, known as subqueries, and to define temporary result sets using Common Table Expressions (CTEs).

Subqueries

A subquery is a query nested inside another SQL statement.

SELECT FirstName, LastName
FROM Customers
WHERE ID IN (SELECT CustomerID FROM Orders WHERE OrderDate > '2023-01-16');

This query retrieves customers who have placed orders after January 16, 2023.

Common Table Expressions (CTEs)

CTEs provide a way to define temporary named result sets that can be referenced within the main query.

WITH RecentOrders AS (
    SELECT CustomerID, OrderDate
    FROM Orders
    WHERE OrderDate > '2023-01-16'
)
SELECT Customers.FirstName, Customers.LastName, RecentOrders.OrderDate
FROM Customers
JOIN RecentOrders ON Customers.ID = RecentOrders.CustomerID;

This achieves the same result as the previous subquery but can be more readable, especially with complex queries.

Aggregation and Grouping

SQL provides functions to perform calculations on sets of rows, such as counting records, calculating averages, and summing values.

Using Aggregate Functions

Common aggregate functions include:

COUNT() returns the total number of rows in a query, including or excluding NULL values based on the context.
SUM() computes the total sum of values in a numeric column, useful for aggregating data.
AVG() calculates the average value of a numeric column, providing the mean of the dataset.
MAX() and MIN() retrieve the highest and lowest values in a column, respectively, for comparison or range analysis.

Grouping Data with GROUP BY

The GROUP BY clause groups rows that have the same values in specified columns, allowing aggregate functions to be applied to each group.

SELECT CustomerID, COUNT(*) AS OrderCount
FROM Orders
GROUP BY CustomerID;

Output:

+------------+------------+
| CustomerID | OrderCount |
+------------+------------+
| 1          | 2          |
| 2          | 1          |
+------------+------------+

This query counts the number of orders placed by each customer.

Handling NULL Values

In SQL, NULL represents missing or unknown data. It's important to handle NULL values properly to avoid unexpected results.

Checking for NULL

Use the IS NULL and IS NOT NULL operators.

SELECT * FROM Customers WHERE Email IS NULL;

This retrieves customers who do not have an email address on file.

Dealing with NULL in Joins

When performing joins, NULL values can affect the results. For example, a customer without orders may not appear in an INNER JOIN. Using a LEFT JOIN ensures all customers are included.

SELECT Customers.FirstName, Orders.OrderDate
FROM Customers
LEFT JOIN Orders ON Customers.ID = Orders.CustomerID;

This retrieves all customers, along with their orders if they have any.

SQL Standardization and Implementations

SQL is standardized by ANSI (American National Standards Institute) and ISO (International Organization for Standardization), ensuring consistency across different database systems. However, each database system may have its own extensions and variations.

Popular SQL Implementations

Open Source Databases:

MySQL and MariaDB are popular relational databases commonly used in web applications due to their speed, reliability, and ease of integration with server-side technologies.
PostgreSQL is recognized for its standards compliance, robust support for advanced features like JSON, and strong focus on data integrity and extensibility.
SQLite is a lightweight, file-based database that is self-contained and widely used in mobile applications, embedded systems, and small-scale applications.

Proprietary Databases:

Oracle Database provides a comprehensive set of features tailored for enterprise environments, including high availability, scalability, and advanced security options.
Microsoft SQL Server is closely integrated with Microsoft's ecosystem, offering seamless compatibility with tools like Azure and Power BI, making it a preferred choice for businesses using Microsoft technologies.

Understanding the specific features and syntax variations of each database system is important when working in different environments.

Data Types Across SQL Implementations

While the SQL standard defines a set of data types, each database system may implement additional types or have variations.

Common Data Types

Data Type	Description
INTEGER	Whole numbers
DECIMAL(p,s)	Fixed-point numbers with precision `p` and scale `s`
VARCHAR(n)	Variable-length strings up to `n` characters
DATE	Date values (year, month, day)
TIMESTAMP	Date and time values
BOOLEAN	Logical true or false

Examples in Different Databases

MySQL supports data types like TINYINT for small integers, TEXT for large text data, BLOB for binary large objects, and ENUM for predefined string values.
PostgreSQL provides SERIAL for auto-incrementing integers, ARRAY types to store arrays, and JSONB for efficiently storing and querying JSON data.
SQL Server includes UNIQUEIDENTIFIER for globally unique identifiers (GUIDs), MONEY for precise currency values, and NVARCHAR for storing Unicode strings.

Writing Data Manipulation Statements

Beyond querying data, SQL provides commands to insert, update, and delete records.

Inserting Data

INSERT INTO Customers (FirstName, LastName, Email)
VALUES ('David', 'Brown', 'david@example.com');

This adds a new customer to the Customers table.

Updating Data

UPDATE Customers
SET Email = 'alice.smith@example.com'
WHERE ID = 1;

This updates Alice's email address in the Customers table.

Deleting Data

DELETE FROM Customers WHERE ID = 2;

This removes the customer with ID 2 from the table.

Creating and Modifying Tables

SQL also allows you to define the structure of your database using Data Definition Language (DDL) statements.

Creating a Table

CREATE TABLE Products (
    ID INT PRIMARY KEY,
    Name VARCHAR(100),
    Price DECIMAL(10,2),
    Stock INT
);

This command creates a new table named Products with specified columns and data types.

Altering a Table

ALTER TABLE Products ADD Description TEXT;

This adds a new column Description to the Products table.

Dropping a Table

DROP TABLE Products;

This command deletes the Products table and all of its data.

Stored Procedures and Functions

Stored procedures and functions are blocks of SQL code that can be saved and reused, allowing for more complex operations and logic.

Creating a Stored Procedure

CREATE PROCEDURE GetCustomerOrders(IN custID INT)
BEGIN
    SELECT * FROM Orders WHERE CustomerID = custID;
END;

This procedure retrieves all orders for a given customer ID.

Calling a Stored Procedure

CALL GetCustomerOrders(1);

This executes the procedure with custID equal to 1.

Transactions and Concurrency Control

Transactions ensure that a series of SQL commands either all succeed or all fail, maintaining database integrity.

Starting a Transaction

START TRANSACTION;

Committing a Transaction

COMMIT;

Rolling Back a Transaction

ROLLBACK;

Transactions are essential for operations that involve multiple steps, ensuring that the database doesn't end up in an inconsistent state if an error occurs.

Indexes and Query Optimization

Indexes improve the speed of data retrieval by providing quick access paths to data within tables. By creating indexes on frequently searched columns, the database can locate and retrieve the desired rows more efficiently, reducing the need to scan entire tables. This optimization is crucial for enhancing the performance of queries, especially in large databases where full table scans can be time-consuming. Additionally, indexes can help enforce uniqueness and improve the performance of join operations by providing efficient pathways between related tables.

Creating an Index

CREATE INDEX idx_lastname ON Customers(LastName);

This SQL command creates an index named idx_lastname on the LastName column of the Customers table. By indexing the LastName column, the database can quickly locate records based on last names without scanning the entire table. For example, when executing a query that searches for customers with a specific last name, the database will use the idx_lastname index to find matching records more rapidly. The expected result of this command is the creation of the index, which can be confirmed by querying the database's metadata or using database management tools to view existing indexes on the Customers table.

After executing the CREATE INDEX command, you can verify the creation of the index by running the following command:

SHOW INDEX FROM Customers;

Sample Output:

Table

Non_unique

Key_name

Seq_in_index

Column_name

Collation

Cardinality

Sub_part

Packed

Null

Index_type

Comment

Index_comment

Customers

idx_lastname

LastName

1500

NULL

YES

BTREE

This output indicates that the idx_lastname index has been successfully created on the LastName column of the Customers table. The Cardinality value suggests the number of unique values in the index, which helps in assessing the index's effectiveness.

Understanding Query Plans

Most database systems provide tools to analyze how queries are executed, known as query plans. These plans detail the steps the database engine takes to execute a query, including which indexes are used, the order of operations, and the methods of data retrieval. By examining query plans, developers and database administrators can identify bottlenecks, such as unnecessary full table scans or inefficient join operations, and make informed decisions to optimize performance. Analyzing query plans is essential for understanding how changes to indexes or query structure can impact overall database performance.

EXPLAIN SELECT * FROM Customers WHERE LastName = 'Smith';

When this command is executed, the database generates a query plan that outlines how it will retrieve the requested data. The expected result is a detailed breakdown showing whether the idx_lastname index is utilized to find records where the LastName is 'Smith'. The query plan might indicate the use of an index scan, the estimated number of rows to be processed, and the cost associated with the operation. For instance, the output may show that the index is used to perform an efficient lookup, significantly reducing the query execution time compared to a full table scan. By analyzing this information, one can verify that the index is effectively enhancing query performance or identify areas where further optimization may be necessary.

After running the EXPLAIN command, you might receive an output similar to the following:

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	Customers	ref	idx_lastname	idx_lastname	767	const	10	Using index

This output indicates that the idx_lastname index is being used to execute the query. The type column shows a ref type access, which is efficient and typically means the index is being utilized correctly. The rows column estimates that only 10 rows need to be examined to satisfy the query, demonstrating a significant performance improvement over a full table scan. The Extra column mentioning Using index further confirms that the index is effectively optimizing the query execution.

Advanced Topics

Window Functions

Window functions perform calculations across sets of rows related to the current row.

SELECT
    OrderID,
    CustomerID,
    OrderDate,
    SUM(Amount) OVER (PARTITION BY CustomerID) AS TotalSpent
FROM Orders;

This query calculates the total amount each customer has spent across all their orders.

Partitioning and Replication

Partitioning divides large tables into smaller, more manageable pieces, improving performance and maintenance.

Replication involves copying data between databases to improve availability and reliability.

Handling Pitfalls

Understanding common issues in SQL can prevent errors and improve data integrity.

NULL Values require careful handling in SQL as they can affect comparisons, aggregations, and conditional logic, often leading to unexpected results if not accounted for properly.
Data Type Mismatches can cause errors or inaccurate results when performing operations, so ensuring compatibility between data types is essential.
SQL Injection poses a serious security risk, which can be mitigated by using prepared statements and parameterized queries to avoid executing malicious input as part of SQL commands.