Last modified: April 16, 2023

This article is written in: 🇺🇸

Introduction to SQL

Welcome to the world of SQL, where you can communicate with databases using simple, yet powerful commands. SQL, which stands for Structured Query Language, is a standardized language designed specifically for managing and querying relational databases.

After reading the material, you should be able to answer the following questions:

  1. What are the fundamental components and data types in SQL, and how are they used to structure and manage data within relational databases?
  2. How can you construct and execute basic SQL queries to retrieve, filter, sort, and limit data?
  3. What are primary and foreign keys, and how do they enforce relationships and data integrity within a database?
  4. How do joins, subqueries, and common table expressions (CTEs) function to retrieve and manipulate data across multiple tables?
  5. What are the techniques for optimizing SQL queries, including the use of indexes, understanding query plans, and managing transactions and concurrency control?

Understanding the Basics

At its core, SQL allows you to perform a variety of operations on data stored in relational databases. These databases organize data into tables, which consist of rows and columns, much like a spreadsheet. Each table represents a specific entity, such as customers or orders, and each column represents an attribute of that entity.

Imagine a simple table of customers:

+----+-----------+-----------+-------------------+
| ID | FirstName | LastName  | Email             |
+----+-----------+-----------+-------------------+
| 1  | Alice     | Smith     | alice@example.com |
| 2  | Bob       | Johnson   | bob@example.com   |
| 3  | Carol     | Williams  | carol@example.com |
+----+-----------+-----------+-------------------+

This table, named Customers, stores basic information about each customer. SQL allows you to interact with this table in various ways, such as retrieving all customers, finding a customer by email, or adding a new customer to the list.

Concepts in SQL

Before diving into writing SQL commands, it's important to grasp some fundamental concepts that form the foundation of SQL and relational databases.

Tables and Schemas

+------------------------------------------------------------+
|                      Schema                                |
|           (Organizational Framework)                       |
|                                                            |
|  +----------------------+      +-------------------------+ |
|  |        Table 1       |      |        Table 2          | |
|  |----------------------|      |-------------------------| |
|  | ID     | Name        |      | ProductID | ProductName | |
|  |--------|-------------|      |-----------|-------------| |
|  | 1      | Alice       |      | 101       | Widget      | |
|  | 2      | Bob         |      | 102       | Gizmo       | |
|  | 3      | Charlie     |      | 103       | Thingamajig | |
|  +----------------------+      +-------------------------+ |
|                                                            |
|  +----------------------+                                  |
|  |        Table 3       |                                  |
|  |----------------------|                                  |
|  | OrderID | CustomerID |                                  |
|  |---------|------------|                                  |
|  | 5001    | 1          |                                  |
|  | 5002    | 2          |                                  |
|  | 5003    | 3          |                                  |
|  +----------------------+                                  |
|                                                            |
+------------------------------------------------------------+

Data Types

Every column in a table has a data type, which defines the kind of data it can store. Common data types include:

Understanding data types is crucial for defining tables and ensuring data integrity.

Primary and Foreign Keys

These keys are essential for maintaining relationships and enforcing data integrity across tables.

Writing Basic SQL Queries

SQL queries are statements that retrieve data from one or more tables. The most common SQL command is SELECT, which allows you to specify which columns of data you want to retrieve.

The SELECT Statement

The basic syntax of a SELECT statement is:

SELECT column1, column2 FROM table_name;

For example, to retrieve all first names and last names from the Customers table:

SELECT FirstName, LastName FROM Customers;

Output:

+-----------+-----------+
| FirstName | LastName  |
+-----------+-----------+
| Alice     | Smith     |
| Bob       | Johnson   |
| Carol     | Williams  |
+-----------+-----------+

Filtering Data with WHERE

The WHERE clause filters records based on specified conditions.

SELECT * FROM Customers WHERE LastName = 'Johnson';

Output:

+----+-----------+----------+-----------------+
| ID | FirstName | LastName | Email           |
+----+-----------+----------+-----------------+
| 2  | Bob       | Johnson  | bob@example.com |
+----+-----------+----------+-----------------+

Sorting Results with ORDER BY

You can sort the results using the ORDER BY clause.

SELECT * FROM Customers ORDER BY LastName ASC;

This query retrieves all customers sorted alphabetically by last name.

Limiting Results with LIMIT

To limit the number of records returned, use the LIMIT clause (Note: Some SQL implementations use TOP instead).

SELECT * FROM Customers LIMIT 2;

Output:

+----+-----------+-----------+-----------------+
| ID | FirstName | LastName  | Email           |
+----+-----------+-----------+-----------------+
| 1  | Alice     | Smith     | alice@example.com |
| 2  | Bob       | Johnson   | bob@example.com   |
+----+-----------+-----------+-----------------+

Joining Tables

One of the powerful features of SQL is the ability to combine data from multiple tables using joins. This is essential when dealing with normalized databases where related data is stored in separate tables.

Types of Joins

Example of an INNER JOIN

Suppose we have an Orders table:

+----+------------+-----------+
| ID | CustomerID | OrderDate |
+----+------------+-----------+
| 1  | 1          | 2023-01-15|
| 2  | 2          | 2023-01-17|
| 3  | 1          | 2023-01-20|
+----+------------+-----------+

To retrieve orders along with customer names:

SELECT Customers.FirstName, Customers.LastName, Orders.OrderDate
FROM Customers
INNER JOIN Orders ON Customers.ID = Orders.CustomerID;

Output:

+-----------+----------+------------+
| FirstName | LastName | OrderDate  |
+-----------+----------+------------+
| Alice     | Smith    | 2023-01-15 |
| Bob       | Johnson  | 2023-01-17 |
| Alice     | Smith    | 2023-01-20 |
+-----------+----------+------------+

This query joins the Customers and Orders tables on the matching ID and CustomerID fields, displaying a combined view of data from both tables.

Subqueries and Common Table Expressions

SQL allows you to write queries within queries, known as subqueries, and to define temporary result sets using Common Table Expressions (CTEs).

Subqueries

A subquery is a query nested inside another SQL statement.

SELECT FirstName, LastName
FROM Customers
WHERE ID IN (SELECT CustomerID FROM Orders WHERE OrderDate > '2023-01-16');

This query retrieves customers who have placed orders after January 16, 2023.

Common Table Expressions (CTEs)

CTEs provide a way to define temporary named result sets that can be referenced within the main query.

WITH RecentOrders AS (
    SELECT CustomerID, OrderDate
    FROM Orders
    WHERE OrderDate > '2023-01-16'
)
SELECT Customers.FirstName, Customers.LastName, RecentOrders.OrderDate
FROM Customers
JOIN RecentOrders ON Customers.ID = RecentOrders.CustomerID;

This achieves the same result as the previous subquery but can be more readable, especially with complex queries.

Aggregation and Grouping

SQL provides functions to perform calculations on sets of rows, such as counting records, calculating averages, and summing values.

Using Aggregate Functions

Common aggregate functions include:

Grouping Data with GROUP BY

The GROUP BY clause groups rows that have the same values in specified columns, allowing aggregate functions to be applied to each group.

SELECT CustomerID, COUNT(*) AS OrderCount
FROM Orders
GROUP BY CustomerID;

Output:

+------------+------------+
| CustomerID | OrderCount |
+------------+------------+
| 1          | 2          |
| 2          | 1          |
+------------+------------+

This query counts the number of orders placed by each customer.

Handling NULL Values

In SQL, NULL represents missing or unknown data. It's important to handle NULL values properly to avoid unexpected results.

Checking for NULL

Use the IS NULL and IS NOT NULL operators.

SELECT * FROM Customers WHERE Email IS NULL;

This retrieves customers who do not have an email address on file.

Dealing with NULL in Joins

When performing joins, NULL values can affect the results. For example, a customer without orders may not appear in an INNER JOIN. Using a LEFT JOIN ensures all customers are included.

SELECT Customers.FirstName, Orders.OrderDate
FROM Customers
LEFT JOIN Orders ON Customers.ID = Orders.CustomerID;

This retrieves all customers, along with their orders if they have any.

SQL Standardization and Implementations

SQL is standardized by ANSI (American National Standards Institute) and ISO (International Organization for Standardization), ensuring consistency across different database systems. However, each database system may have its own extensions and variations.

Open Source Databases:

Proprietary Databases:

Understanding the specific features and syntax variations of each database system is important when working in different environments.

Data Types Across SQL Implementations

While the SQL standard defines a set of data types, each database system may implement additional types or have variations.

Common Data Types

Data Type Description
INTEGER Whole numbers
DECIMAL(p,s) Fixed-point numbers with precision p and scale s
VARCHAR(n) Variable-length strings up to n characters
DATE Date values (year, month, day)
TIMESTAMP Date and time values
BOOLEAN Logical true or false

Examples in Different Databases

Writing Data Manipulation Statements

Beyond querying data, SQL provides commands to insert, update, and delete records.

Inserting Data

INSERT INTO Customers (FirstName, LastName, Email)
VALUES ('David', 'Brown', 'david@example.com');

This adds a new customer to the Customers table.

Updating Data

UPDATE Customers
SET Email = 'alice.smith@example.com'
WHERE ID = 1;

This updates Alice's email address in the Customers table.

Deleting Data

DELETE FROM Customers WHERE ID = 2;

This removes the customer with ID 2 from the table.

Creating and Modifying Tables

SQL also allows you to define the structure of your database using Data Definition Language (DDL) statements.

Creating a Table

CREATE TABLE Products (
    ID INT PRIMARY KEY,
    Name VARCHAR(100),
    Price DECIMAL(10,2),
    Stock INT
);

This command creates a new table named Products with specified columns and data types.

Altering a Table

ALTER TABLE Products ADD Description TEXT;

This adds a new column Description to the Products table.

Dropping a Table

DROP TABLE Products;

This command deletes the Products table and all of its data.

Stored Procedures and Functions

Stored procedures and functions are blocks of SQL code that can be saved and reused, allowing for more complex operations and logic.

Creating a Stored Procedure

CREATE PROCEDURE GetCustomerOrders(IN custID INT)
BEGIN
    SELECT * FROM Orders WHERE CustomerID = custID;
END;

This procedure retrieves all orders for a given customer ID.

Calling a Stored Procedure

CALL GetCustomerOrders(1);

This executes the procedure with custID equal to 1.

Transactions and Concurrency Control

Transactions ensure that a series of SQL commands either all succeed or all fail, maintaining database integrity.

Starting a Transaction

START TRANSACTION;

Committing a Transaction

COMMIT;

Rolling Back a Transaction

ROLLBACK;

Transactions are essential for operations that involve multiple steps, ensuring that the database doesn't end up in an inconsistent state if an error occurs.

Indexes and Query Optimization

Indexes improve the speed of data retrieval by providing quick access paths to data within tables. By creating indexes on frequently searched columns, the database can locate and retrieve the desired rows more efficiently, reducing the need to scan entire tables. This optimization is crucial for enhancing the performance of queries, especially in large databases where full table scans can be time-consuming. Additionally, indexes can help enforce uniqueness and improve the performance of join operations by providing efficient pathways between related tables.

Creating an Index

CREATE INDEX idx_lastname ON Customers(LastName);

This SQL command creates an index named idx_lastname on the LastName column of the Customers table. By indexing the LastName column, the database can quickly locate records based on last names without scanning the entire table. For example, when executing a query that searches for customers with a specific last name, the database will use the idx_lastname index to find matching records more rapidly. The expected result of this command is the creation of the index, which can be confirmed by querying the database's metadata or using database management tools to view existing indexes on the Customers table.

After executing the CREATE INDEX command, you can verify the creation of the index by running the following command:

SHOW INDEX FROM Customers;

Sample Output:

Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment Index_comment
Customers 1 idx_lastname 1 LastName A 1500 NULL NULL YES BTREE

This output indicates that the idx_lastname index has been successfully created on the LastName column of the Customers table. The Cardinality value suggests the number of unique values in the index, which helps in assessing the index's effectiveness.

Understanding Query Plans

Most database systems provide tools to analyze how queries are executed, known as query plans. These plans detail the steps the database engine takes to execute a query, including which indexes are used, the order of operations, and the methods of data retrieval. By examining query plans, developers and database administrators can identify bottlenecks, such as unnecessary full table scans or inefficient join operations, and make informed decisions to optimize performance. Analyzing query plans is essential for understanding how changes to indexes or query structure can impact overall database performance.

EXPLAIN SELECT * FROM Customers WHERE LastName = 'Smith';

When this command is executed, the database generates a query plan that outlines how it will retrieve the requested data. The expected result is a detailed breakdown showing whether the idx_lastname index is utilized to find records where the LastName is 'Smith'. The query plan might indicate the use of an index scan, the estimated number of rows to be processed, and the cost associated with the operation. For instance, the output may show that the index is used to perform an efficient lookup, significantly reducing the query execution time compared to a full table scan. By analyzing this information, one can verify that the index is effectively enhancing query performance or identify areas where further optimization may be necessary.

After running the EXPLAIN command, you might receive an output similar to the following:

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE Customers ref idx_lastname idx_lastname 767 const 10 Using index

This output indicates that the idx_lastname index is being used to execute the query. The type column shows a ref type access, which is efficient and typically means the index is being utilized correctly. The rows column estimates that only 10 rows need to be examined to satisfy the query, demonstrating a significant performance improvement over a full table scan. The Extra column mentioning Using index further confirms that the index is effectively optimizing the query execution.

Advanced Topics

Window Functions

Window functions perform calculations across sets of rows related to the current row.

SELECT
    OrderID,
    CustomerID,
    OrderDate,
    SUM(Amount) OVER (PARTITION BY CustomerID) AS TotalSpent
FROM Orders;

This query calculates the total amount each customer has spent across all their orders.

Partitioning and Replication

Partitioning divides large tables into smaller, more manageable pieces, improving performance and maintenance.

Replication involves copying data between databases to improve availability and reliability.

Handling Pitfalls

Understanding common issues in SQL can prevent errors and improve data integrity.

Table of Contents

    Introduction to SQL
    1. Understanding the Basics
    2. Concepts in SQL
      1. Tables and Schemas
      2. Data Types
      3. Primary and Foreign Keys
    3. Writing Basic SQL Queries
      1. The SELECT Statement
      2. Filtering Data with WHERE
      3. Sorting Results with ORDER BY
      4. Limiting Results with LIMIT
    4. Joining Tables
      1. Types of Joins
      2. Example of an INNER JOIN
    5. Subqueries and Common Table Expressions
      1. Subqueries
      2. Common Table Expressions (CTEs)
    6. Aggregation and Grouping
      1. Using Aggregate Functions
      2. Grouping Data with GROUP BY
    7. Handling NULL Values
      1. Checking for NULL
      2. Dealing with NULL in Joins
    8. SQL Standardization and Implementations
      1. Popular SQL Implementations
    9. Data Types Across SQL Implementations
      1. Common Data Types
      2. Examples in Different Databases
    10. Writing Data Manipulation Statements
      1. Inserting Data
      2. Updating Data
      3. Deleting Data
    11. Creating and Modifying Tables
      1. Creating a Table
      2. Altering a Table
      3. Dropping a Table
    12. Stored Procedures and Functions
    13. Creating a Stored Procedure
      1. Calling a Stored Procedure
    14. Transactions and Concurrency Control
      1. Starting a Transaction
      2. Committing a Transaction
      3. Rolling Back a Transaction
    15. Indexes and Query Optimization
      1. Creating an Index
      2. Understanding Query Plans
    16. Advanced Topics
      1. Window Functions
      2. Partitioning and Replication
      3. Handling Pitfalls