Last modified: May 18, 2025

This article is written in: πŸ‡ΊπŸ‡Έ

Storage of Tables and Indexes on Disk

Exploring how databases store tables and indexes on disk can provide valuable insights into optimizing performance and managing data efficiently. Let's delve into the fundamental concepts of disk storage in relational databases, focusing on the structures and mechanisms that underlie data organization.

Operating-System Files (β€œDatafiles”)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ tablespace_sales                                                     β”‚
β”‚β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                   β”‚
β”‚β”‚ sales01.dbf   β”‚β”‚ sales02.dbf   β”‚β”‚ sales03.dbf   β”‚  ← ordinary files β”‚
β”‚β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Extents & Pages (Allocation Units)

DATAFILE  (sales01.dbf)
β”‚
β”œβ”€ Extent #17  (8 pages @ 8 KB = 64 KB)
β”‚  β”œβ”€ Page 0  (heap rows)
β”‚  β”œβ”€ Page 1  (heap rows)
β”‚  β”œβ”€ Page 2  (B-tree branch)
β”‚  β”œβ”€ Page 3  (free)
β”‚  β”œβ”€ Page 4  (B-tree leaf)
β”‚  β”œβ”€ Page 5  (row-overflow chain)
β”‚  β”œβ”€ Page 6  (heap rows)
β”‚  └─ Page 7  (free)
└─ Extent #18  …

Anatomy of a Data Page

8 KB PAGE (heap)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” 0x0000
β”‚ Page Header      β”‚ ← checksum, LSN, page-type flag, free-space pointers
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Item Slot Array  β”‚ ← 2-byte offsets, newest rows grow this downward
β”‚ ─┬─┬─┬─┬─┬─┬─┬─  β”‚
β”œβ”€β†“β”€β†“β”€β†“β”€β†“β”€β†“β”€β†“β”€β†“β”€β†“β”€β”€β”€
β”‚                  β”‚
β”‚   Tuple Data     β”‚ ← rows grow upward
β”‚                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ 0x2000 (end)

Heap Table Lifecycle

INSERT PATH
Client β†’ Buffer Pool β†’ find page w/ β‰₯ row_size free β†’ write row β†’ mark page dirty
                                                             ↓
                                            checkpoint writes dirty page back to disk

Clustered (Index-Organized) Tables

CLUSTERED B-TREE
           (root p500)
        /               \
   (p510)               (p560)   ← branch nodes
   /   \                 /   \
p512  p518           p562   p570 ← leaf nodes = full rows

Secondary (Non-Clustered) Index Walk

SELECT … WHERE sku = 'A42';
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  index root β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚  (binary search)
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  index leaf β”‚  contains (sku, page#, slot#)
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚  (single pointer hop)
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  heap page  β”‚  contains actual row
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Other Index Types

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ HASH ────────────┐   equality only
β”‚ key β†’ hash β†’ bucket page     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ BITMAP ────────────┐   low-cardinality columns
β”‚ value1  1010011001           β”‚
β”‚ value2  0101100100           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”Œβ”€β”€β”€β”€β”€ GIN / INVERTED ─────────┐   arrays, JSON, full-text
β”‚ term β†’ posting list of RIDs  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Free-Space & Extent Maps

FREE SPACE MAP (FSM)
page_id β”‚ free_bytes
────────┼─────────────
   100  β”‚  800
   101  β”‚    0
   102  β”‚ 1600
…

Write-Ahead Log (WAL) Relationship

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   (1) change row
β”‚ client query β”‚ ───────────────────────────────────┐
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                    β”‚
                 (2) append redo record to WAL file β”‚ (sequential)
                                                    β–Ό
                                             (3) commit-ack
                                                    β”‚
                             (4) later: flush dirty page to datafile

Implications for Performance

Understanding how tables and indexes are stored can have significant implications for database performance and optimization strategies.

Data Locality

Storing related data close together on disk improves data locality, which enhances performance by reducing the number of disk I/O operations required to retrieve data.

For example, when a query requests several rows that are stored sequentially on the same page or extent, the database can read them all in a single disk operation, speeding up the retrieval process.

Choosing the Right Storage Model

Selecting between row-oriented and column-oriented storage depends on the workload and query patterns.

Effective Indexing Strategies

Indexes improve query performance by allowing the database to locate data without scanning every row in a table. However, they also consume disk space and can slow down write operations because the index must be updated whenever data is modified.

Balancing the number and types of indexes is crucial. Over-indexing can lead to unnecessary overhead, while under-indexing can result in slow query performance.

Practical Examples and Commands

Managing and optimizing database storage is crucial for maintaining performance and ensuring efficient use of resources. Below are practical commands for PostgreSQL and MySQL that help inspect and manage storage aspects. Each command includes its purpose, when to use it, expected output, and how to interpret the results.

PostgreSQL Commands

I. Check Table and Index Sizes

SELECT
 pg_size_pretty(pg_relation_size('your_table')) AS table_size,
 pg_size_pretty(pg_indexes_size('your_table')) AS indexes_size,
 pg_size_pretty(pg_total_relation_size('your_table')) AS total_size;

Expected Output:

table_size indexes_size total_size
120 MB 30 MB 150 MB

II. Analyze Query Execution with Buffer Usage

EXPLAIN (ANALYZE, BUFFERS) your_query;

Expected Output:

Seq Scan on your_table  (cost=0.00..431.00 rows=21000 width=...)
 Buffers: shared hit=123 read=45
 ...

III. Reclaim Dead Space with VACUUM

VACUUM your_table;

Expected Output:

VACUUM

Successful execution indicates that dead space has been reclaimed. Use VACUUM VERBOSE your_table; for detailed output.

IV. Check Space Usage with pgstattuple

SELECT * FROM pgstattuple('your_table');

Expected Output:

table_len           |  1000000
  tuple_count         |  20000
  tuple_len           |  800000
  dead_tuple_count    |  5000
  dead_tuple_len      |  200000
  free_space          |  0

V. Enable Page-Level Checksums

initdb --data-checksums -D /path/to/data

Expected Output:

Data page checksums are enabled.

Ensures that any corrupted data pages are identified, enhancing reliability and data integrity.

VI. Monitor I/O Statistics with pg_stat_io

SELECT * FROM pg_stat_io;

Expected Output:

query_id | read_pages | write_pages
---------|------------|------------
1        |     500    |     20
2        |    1500    |     50

VII. Optimize Memory Usage with shared_buffers

shared_buffers = 2GB

VIII. Update Statistics with ANALYZE

ANALYZE your_table;

Expected Output:

ANALYZE

Successful execution means statistics are updated. Use EXPLAIN to see how updated statistics affect query plans.

IX. Reorder Table Data with CLUSTER

CLUSTER your_table USING your_index;

Expected Output:

CLUSTER

The table is physically reordered, which can lead to faster sequential scans and better cache utilization for indexed queries.

MySQL Commands

I. Show Table Status

SHOW TABLE STATUS LIKE 'your_table';

Expected Output:

Name Engine Version Row_format Rows Avg_row_length Data_length Max_data_length Index_length ...
your_table InnoDB 10 Compact 20000 100 2000000 ... 500000 ...

II. Monitor InnoDB Buffer Pool Metrics

Used to determine how effectively the buffer pool is caching data and to identify if more memory allocation is needed.

SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_read_requests';
SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_reads';

Expected Output:

+---------------------------------+---------+
| Variable_name                   | Value   |
+---------------------------------+---------+
| Innodb_buffer_pool_read_requests | 100000 |
+---------------------------------+---------+

+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| Innodb_buffer_pool_reads | 500   |
+--------------------------+-------+

III. Adjust Buffer Pool Size

innodb_buffer_pool_size = 4G

Other Considerations

Table of Contents

    Storage of Tables and Indexes on Disk
    1. Operating-System Files (β€œDatafiles”)
    2. Extents & Pages (Allocation Units)
    3. Anatomy of a Data Page
    4. Heap Table Lifecycle
    5. Clustered (Index-Organized) Tables
    6. Secondary (Non-Clustered) Index Walk
    7. Other Index Types
    8. Free-Space & Extent Maps
    9. Write-Ahead Log (WAL) Relationship
    10. Implications for Performance
      1. Data Locality
      2. Choosing the Right Storage Model
      3. Effective Indexing Strategies
    11. Practical Examples and Commands
      1. PostgreSQL Commands
      2. MySQL Commands
    12. Other Considerations