Last modified: December 29, 2023

This article is written in: 🇺🇸

Understanding Crash Recovery in Databases

Crash recovery is a important component of database systems that ensures data consistency and durability despite unexpected events like power outages, hardware failures, or software crashes. By design, databases must be capable of returning to a reliable state after a failure occurs. This is largely accomplished through mechanisms like the Write-Ahead Log (WAL), which records changes before they are committed to the actual data files on disk.

After reading the material, you should be able to answer the following questions:

The Basics of Crash Recovery

Databases typically cache data in memory (often called the buffer pool) for speed. When a change is made, such as adding a row or updating an existing row, it is applied first in memory. Only later is this modified data flushed to disk. If a crash or power loss happens mid-write, data could end up corrupted or partially written. Crash recovery techniques help the database detect and correct any inconsistencies by replaying or discarding in-flight changes.

The Role of the Write-Ahead Log (WAL)

The WAL, sometimes called the redo log, keeps track of all modifications. Every time data is changed in memory, a record of that change is written to the WAL on disk before the database eventually writes the changed data pages to disk.

+----------------+          +----------------+          +----------------+
|                |          |                |          |                |
|  Application   |  --->    |   Write-Ahead  |  --->    |   Data Store   |
|    Transaction |          |      Log (WAL) |          |   (Main DB)    |
|                |          |                |          |                |
+----------------+          +----------------+          +----------------+
        |                           |                           |
        |                           |                           |
        |                           v                           |
        |                   +---------------+                   |
        |                   | Append Log    |                   |
        |                   | Records to    |                   |
        |                   | WAL on Disk   |                   |
        |                   +---------------+                   |
        |                           |                           |
        |                           v                           |
        |                   +---------------+                   |
        |                   | Sync WAL to   |                   |
        |                   | Disk          |                   |
        |                   +---------------+                   |
        |                           |                           |
        |                           v                           |
        |                   +---------------+                   |
        |                   | Apply Changes |                   |
        |                   | to Data Store |                   |
        |                   +---------------+                   |
        |                                                       |
        +-------------------------------------------------------+

WAL and Transaction States

Databases manage transactions to make sure atomicity (all or nothing). The WAL is directly tied to these transaction guarantees:

Checkpointing

A checkpoint operation flushes all in-memory data pages to disk and writes a special checkpoint record to the WAL. This makes the on-disk data more up-to-date and reduces the amount of log replay needed if a crash occurs.

Crash Recovery Steps

When a database restarts after a crash, it goes through a sequence of steps to make sure a consistent state:

  1. The database checks the latest checkpoint in the WAL to identify the last checkpoint.
  2. Committed transactions after the checkpoint are applied to the data files to bring them up to date.
  3. Any uncommitted transactions in the WAL are discarded or rolled back so they do not appear as valid changes.
  4. The database finishes replaying WAL records and transitions back to handling regular queries.

Flushing the WAL

Some databases offer configuration options for controlling how often the WAL is physically written and synchronized to disk:

+===========================+
|       Database System     |
+===========================+
             |
             | Initiate Transaction
             v
+---------------------------+
|        Client/System      |
+---------------------------+
             |
             | Generate Log Record
             v
+---------------------------+
|      WAL Buffer (RAM)     |
|  -----------------------  |
|  | Log Record 1        |  |
|  | Log Record 2        |  |
|  | Log Record 3        |  |
|  | ...                 |  |
|  -----------------------  |
+---------------------------+
             |
             | Sequential Write (Flush)
             v
+---------------------------+
|     Stable Storage (Disk) |
|  -----------------------  |
|  | Log Record 1        |  |
|  | Log Record 2        |  |
|  | Log Record 3        |  |
|  | ...                 |  |
|  -----------------------  |
+---------------------------+
             |
             | Apply Changes to Database
             v
+---------------------------+
|       Database Files      |
|  -----------------------  |
|  | Data Page A         |  |
|  | Data Page B         |  |
|  | ...                 |  |
|  -----------------------  |
+---------------------------+

Benefits of WAL-Based Recovery

Drawbacks and Trade-Offs

Practical Example

Consider an orders table:

OrderID CustomerID Status
1 1001 Pending
2 1002 Shipped
3 1003 Delivered

Suppose a user updates OrderID = 1 from Pending to Shipped.

I. The database modifies the in-memory page representing OrderID = 1.

II. A corresponding record showing the old and new values (Pending -> Shipped) is appended to the WAL on disk.

III. The data file containing the orders table may not be updated immediately.

IV. If the database crashes at this point, the WAL can be replayed to recover the change.

V. After restart, the database replays the WAL entries for all committed transactions, ensuring OrderID = 1 is set to Shipped in the data file.

Visualizing Crash Recovery

#
               +-----------------+
Changes in --> |    Memory       |
the database   | (Buffer Pool)   |
               +--------+--------+
                        |
                WAL Record Written
                        |
                        v
               +-----------------+
               | Write-Ahead Log |
               |   (Redo Log)    |
               +--------+--------+
                        |
  Checkpoint ---------->+  
     (Flush data pages) .
                        .
                        .
                        v
               +------------------+
               |  Data Files on   |
               |       Disk       |
               +------------------+

Best Practices

I. Find the right interval to minimize both I/O spikes and recovery time.

II. Make sure adequate storage capacity and regularly archive or clean old WAL files.

III. Configure fsync to make sure that WAL data truly resides on stable media.

IV. Combine periodic full backups with continuous WAL archiving for point-in-time recovery.

V. Validate recovery settings in staging environments to confirm that the database can recover from abrupt failures.

Table of Contents

    Understanding Crash Recovery in Databases
    1. The Basics of Crash Recovery
    2. The Role of the Write-Ahead Log (WAL)
    3. WAL and Transaction States
    4. Checkpointing
    5. Crash Recovery Steps
    6. Flushing the WAL
    7. Benefits of WAL-Based Recovery
    8. Drawbacks and Trade-Offs
    9. Practical Example
    10. Visualizing Crash Recovery
    11. Best Practices