Last modified: December 03, 2024

This article is written in: 🇺🇸

Understanding the Storage of Tables and Indexes on Disk

Exploring how databases store tables and indexes on disk can provide valuable insights into optimizing performance and managing data efficiently. Let's delve into the fundamental concepts of disk storage in relational databases, focusing on the structures and mechanisms that underlie data organization.

Storage Structures

Databases organize data on disk using structured approaches to ensure efficient access and manipulation.

Pages and Extents

At the core of data storage are pages, sometimes called blocks. These are fixed-size chunks of data, commonly 4KB, 8KB, or 16KB in size, depending on the database system. Pages serve as the basic units for reading from and writing to the disk.

To manage storage more effectively, pages are grouped into extents. An extent is a collection of contiguous pages, which helps reduce fragmentation and improves read/write performance by allowing larger chunks of data to be processed in a single operation.

Here's a simple illustration of pages grouped into extents:

+-------------------+
|      Extent       |
| +-----+  +-----+  |
| |Page1|  |Page2|  |
| +-----+  +-----+  |
| +-----+  +-----+  |
| |Page3|  |Page4|  |
| +-----+  +-----+  |
+-------------------+

In this diagram, we see an extent containing four pages, each holding part of the table's data.

Table Storage Models

Tables are stored on disk as collections of pages, but the way data is organized within these pages can vary.

Here's how a page might look in a row-oriented storage:

+-----------------------------------+
| Row1: [Col1, Col2, Col3, Col4]    |
| Row2: [Col1, Col2, Col3, Col4]    |
| Row3: [Col1, Col2, Col3, Col4]    |
+-----------------------------------+

And in a column-oriented storage:

+--------------------+
| Column1 Data       |
| [Value1, Value2,   |
|  Value3, ...]      |
+--------------------+

Indexes

Indexes are data structures that improve the speed of data retrieval operations on a database table. They are essential for efficient querying, especially in large databases.

Different types of indexes include:

A simplified illustration of a B-tree index might look like this:

Tree:
        [50]
       /    \
    [25]    [75]
   /   \    /   \
[10] [30] [60] [90]

Each node in the tree represents a page, and the structure allows for quick navigation to the desired data.

Implications for Performance

Understanding how tables and indexes are stored can have significant implications for database performance and optimization strategies.

Data Locality

Storing related data close together on disk improves data locality, which enhances performance by reducing the number of disk I/O operations required to retrieve data.

For example, when a query requests several rows that are stored sequentially on the same page or extent, the database can read them all in a single disk operation, speeding up the retrieval process.

Choosing the Right Storage Model

Selecting between row-oriented and column-oriented storage depends on the workload and query patterns.

Effective Indexing Strategies

Indexes improve query performance by allowing the database to locate data without scanning every row in a table. However, they also consume disk space and can slow down write operations because the index must be updated whenever data is modified.

Balancing the number and types of indexes is crucial. Over-indexing can lead to unnecessary overhead, while under-indexing can result in slow query performance.

Practical Examples and Commands

In systems like PostgreSQL, you can inspect and manage storage aspects using specific commands.

To check the size of a table and its indexes:

SELECT
  pg_size_pretty(pg_relation_size('your_table')) AS table_size,
  pg_size_pretty(pg_indexes_size('your_table')) AS indexes_size,
  pg_size_pretty(pg_total_relation_size('your_table')) AS total_size;

This command returns the sizes in a human-readable format.

Example output:

table_size indexes_size total_size
120 MB 30 MB 150 MB

Interpreting the results:

Table of Contents

    Understanding the Storage of Tables and Indexes on Disk
    1. Storage Structures
      1. Pages and Extents
      2. Table Storage Models
      3. Indexes
    2. Implications for Performance
      1. Data Locality
      2. Choosing the Right Storage Model
      3. Effective Indexing Strategies
    3. Practical Examples and Commands