Last modified: May 11, 2025
This article is written in: 🇺🇸
Crash recovery is a important component of database systems that ensures data consistency and durability despite unexpected events like power outages, hardware failures, or software crashes. By design, databases must be capable of returning to a reliable state after a failure occurs. This is largely accomplished through mechanisms like the Write-Ahead Log (WAL), which records changes before they are committed to the actual data files on disk.
After reading the material, you should be able to answer the following questions:
Databases typically cache data in memory (often called the buffer pool) for speed. When a change is made, such as adding a row or updating an existing row, it is applied first in memory. Only later is this modified data flushed to disk. If a crash or power loss happens mid-write, data could end up corrupted or partially written. Crash recovery techniques help the database detect and correct any inconsistencies by replaying or discarding in-flight changes.
The WAL, sometimes called the redo log, keeps track of all modifications. Every time data is changed in memory, a record of that change is written to the WAL on disk before the database eventually writes the changed data pages to disk.
WAL:
+----------------+
| Application |
| Transaction |
+--------+-------+
|
| 1. Emit transaction
v
+--------+-------+
| Write-Ahead |
| Log (WAL) |
+--------+-------+
|
+-------------------------------+-------------------------------+
| |
| 2a. Append record to WAL buffer |
| |
| +----------------+ +----------------+ |
| | WAL In-Memory | --(write)--> | WAL On-Disk | |
| | Buffer | | (log file) | |
| +----------------+ +----------------+ |
| | | |
| | 2b. Fsync (sync to durable storage) | |
| +------------------------------------>+ |
| |
+---------------------------------------------------------------+
|
| 3. Apply logged changes
v
+--------+-------+
| Data Store |
| (Main DB) |
+----------------+
Databases manage transactions to make sure atomicity (all or nothing). The WAL is directly tied to these transaction guarantees:
A checkpoint operation flushes all in-memory data pages to disk and writes a special checkpoint record to the WAL. This makes the on-disk data more up-to-date and reduces the amount of log replay needed if a crash occurs.
When a database restarts after a crash, it goes through a sequence of steps to make sure a consistent state:
Some databases offer configuration options for controlling how often the WAL is physically written and synchronized to disk:
+===========================+
| Database System |
+===========================+
|
| Initiate Transaction
v
+---------------------------+
| Client/System |
+---------------------------+
|
| Generate Log Record
v
+---------------------------+
| WAL Buffer (RAM) |
| ----------------------- |
| | Log Record 1 | |
| | Log Record 2 | |
| | Log Record 3 | |
| | ... | |
| ----------------------- |
+---------------------------+
|
| Sequential Write (Flush)
v
+---------------------------+
| Stable Storage (Disk) |
| ----------------------- |
| | Log Record 1 | |
| | Log Record 2 | |
| | Log Record 3 | |
| | ... | |
| ----------------------- |
+---------------------------+
|
| Apply Changes to Database
v
+---------------------------+
| Database Files |
| ----------------------- |
| | Data Page A | |
| | Data Page B | |
| | ... | |
| ----------------------- |
+---------------------------+
Consider an orders
table:
OrderID | CustomerID | Status |
1 | 1001 | Pending |
2 | 1002 | Shipped |
3 | 1003 | Delivered |
Suppose a user updates OrderID = 1
from Pending
to Shipped
.
I. The database modifies the in-memory page representing OrderID = 1
.
II. A corresponding record showing the old and new values (Pending
-> Shipped
) is appended to the WAL on disk.
III. The data file containing the orders
table may not be updated immediately.
IV. If the database crashes at this point, the WAL can be replayed to recover the change.
V. After restart, the database replays the WAL entries for all committed transactions, ensuring OrderID = 1
is set to Shipped
in the data file.
#
+-----------------+
Changes in --> | Memory |
the database | (Buffer Pool) |
+--------+--------+
|
WAL Record Written
|
v
+-----------------+
| Write-Ahead Log |
| (Redo Log) |
+--------+--------+
|
Checkpoint ---------->+
(Flush data pages) .
.
.
v
+------------------+
| Data Files on |
| Disk |
+------------------+
I. Find the right interval to minimize both I/O spikes and recovery time.
II. Make sure adequate storage capacity and regularly archive or clean old WAL files.
III. Configure fsync to make sure that WAL data truly resides on stable media.
IV. Combine periodic full backups with continuous WAL archiving for point-in-time recovery.
V. Validate recovery settings in staging environments to confirm that the database can recover from abrupt failures.