Last modified: July 06, 2018
This article is written in: πΊπΈ
Exploring how databases store tables and indexes on disk can provide valuable insights into optimizing performance and managing data efficiently. Let's delve into the fundamental concepts of disk storage in relational databases, focusing on the structures and mechanisms that underlie data organization.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β tablespace_sales β
ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
ββ sales01.dbf ββ sales02.dbf ββ sales03.dbf β β ordinary files β
ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
DATAFILE (sales01.dbf)
β
ββ Extent #17 (8 pages @ 8 KB = 64 KB)
β ββ Page 0 (heap rows)
β ββ Page 1 (heap rows)
β ββ Page 2 (B-tree branch)
β ββ Page 3 (free)
β ββ Page 4 (B-tree leaf)
β ββ Page 5 (row-overflow chain)
β ββ Page 6 (heap rows)
β ββ Page 7 (free)
ββ Extent #18 β¦
8 KB PAGE (heap)
ββββββββββββββββββββ 0x0000
β Page Header β β checksum, LSN, page-type flag, free-space pointers
ββββββββββββββββββββ€
β Item Slot Array β β 2-byte offsets, newest rows grow this downward
β ββ¬ββ¬ββ¬ββ¬ββ¬ββ¬ββ¬β β
ββββββββββββββββββββ€
β β
β Tuple Data β β rows grow upward
β β
ββββββββββββββββββββ 0x2000 (end)
INSERT PATH
Client β Buffer Pool β find page w/ β₯ row_size free β write row β mark page dirty
β
checkpoint writes dirty page back to disk
CLUSTERED B-TREE
(root p500)
/ \
(p510) (p560) β branch nodes
/ \ / \
p512 p518 p562 p570 β leaf nodes = full rows
BETWEEN, >=) read consecutive leaf pages with almost no random I/O.SELECT β¦ WHERE sku = 'A42';
βββββββββββββββ
β index root β
ββββββββ¬βββββββ
β (binary search)
βΌ
βββββββββββββββ
β index leaf β contains (sku, page#, slot#)
ββββββββ¬βββββββ
β (single pointer hop)
βΌ
βββββββββββββββ
β heap page β contains actual row
βββββββββββββββ
SELECT list, the engine performs an index-only scan and never touches the heap page.βββββββββββββ HASH βββββββββββββ equality only
β key β hash β bucket page β
ββββββββββββββββββββββββββββββββ
βββββββββββ BITMAP βββββββββββββ low-cardinality columns
β value1 1010011001 β
β value2 0101100100 β
ββββββββββββββββββββββββββββββββ
ββββββ GIN / INVERTED ββββββββββ arrays, JSON, full-text
β term β posting list of RIDs β
ββββββββββββββββββββββββββββββββ
WHERE clause; rows outside predicate are invisible to the index.FREE SPACE MAP (FSM)
page_id β free_bytes
βββββββββΌβββββββββββββ
100 β 800
101 β 0
102 β 1600
β¦
ββββββββββββββββ (1) change row
β client query β ββββββββββββββββββββββββββββββββββββ
ββββββββββββββββ β
(2) append redo record to WAL file β (sequential)
βΌ
(3) commit-ack
β
(4) later: flush dirty page to datafile
Understanding how tables and indexes are stored can have significant implications for database performance and optimization strategies.
Storing related data close together on disk improves data locality, which enhances performance by reducing the number of disk I/O operations required to retrieve data.
For example, when a query requests several rows that are stored sequentially on the same page or extent, the database can read them all in a single disk operation, speeding up the retrieval process.
Selecting between row-oriented and column-oriented storage depends on the workload and query patterns.
Indexes improve query performance by allowing the database to locate data without scanning every row in a table. However, they also consume disk space and can slow down write operations because the index must be updated whenever data is modified.
Balancing the number and types of indexes is crucial. Over-indexing can lead to unnecessary overhead, while under-indexing can result in slow query performance.
Managing and optimizing database storage is crucial for maintaining performance and ensuring efficient use of resources. Below are practical commands for PostgreSQL and MySQL that help inspect and manage storage aspects. Each command includes its purpose, when to use it, expected output, and how to interpret the results.
I. Check Table and Index Sizes
SELECT
pg_size_pretty(pg_relation_size('your_table')) AS table_size,
pg_size_pretty(pg_indexes_size('your_table')) AS indexes_size,
pg_size_pretty(pg_total_relation_size('your_table')) AS total_size;
Expected Output:
| table_size | indexes_size | total_size |
| 120 MB | 30 MB | 150 MB |
II. Analyze Query Execution with Buffer Usage
EXPLAIN (ANALYZE, BUFFERS) your_query;
Expected Output:
Seq Scan on your_table (cost=0.00..431.00 rows=21000 width=...)
Buffers: shared hit=123 read=45
...
III. Reclaim Dead Space with VACUUM
VACUUM your_table;
Expected Output:
Successful execution indicates that dead space has been reclaimed. Use VACUUM VERBOSE your_table; for detailed output.
IV. Check Space Usage with pgstattuple
VACUUM.SELECT * FROM pgstattuple('your_table');
Expected Output:
table_len | 1000000
tuple_count | 20000
tuple_len | 800000
dead_tuple_count | 5000
dead_tuple_len | 200000
free_space | 0
V. Enable Page-Level Checksums
initdb --data-checksums -D /path/to/data
Expected Output:
Data page checksums are enabled.
Ensures that any corrupted data pages are identified, enhancing reliability and data integrity.
VI. Monitor I/O Statistics with pg_stat_io
SELECT * FROM pg_stat_io;
Expected Output:
query_id | read_pages | write_pages
---------|------------|------------
1 | 500 | 20
2 | 1500 | 50
VII. Optimize Memory Usage with shared_buffers
shared_buffers = 2GB
shared_buffers can improve performance by reducing disk reads.VIII. Update Statistics with ANALYZE
ANALYZE your_table;
Expected Output:
Successful execution means statistics are updated. Use EXPLAIN to see how updated statistics affect query plans.
IX. Reorder Table Data with CLUSTER
CLUSTER your_table USING your_index;
Expected Output:
The table is physically reordered, which can lead to faster sequential scans and better cache utilization for indexed queries.
I. Show Table Status
SHOW TABLE STATUS LIKE 'your_table';
Expected Output:
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | ... |
| your_table | InnoDB | 10 | Compact | 20000 | 100 | 2000000 | ... | 500000 | ... |
Data_length or Index_length may indicate the need for optimization.II. Monitor InnoDB Buffer Pool Metrics
Used to determine how effectively the buffer pool is caching data and to identify if more memory allocation is needed.
SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_read_requests';
SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_reads';
Expected Output:
| Variable_name | Value |
+---------------------------------+---------+
| Innodb_buffer_pool_read_requests | 100000 |
+---------------------------------+---------+
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| Innodb_buffer_pool_reads | 500 |
+--------------------------+-------+
Innodb_buffer_pool_reads / Innodb_buffer_pool_read_requestsIII. Adjust Buffer Pool Size
innodb_buffer_pool_size = 4G
innodb_buffer_pool_size can reduce disk reads.VACUUM FULL tables that show > 20 % dead rows or 2Γ bloat.