Last modified: July 08, 2022

This article is written in: πŸ‡ΊπŸ‡Έ

Understanding Indexing in Databases

Indexes serve as a roadmap for the database engine, allowing it to find data swiftly based on the values of one or more columns. They are important for speeding up query execution, enforcing unique constraints on columns, and enabling quick information retrieval. Different types of indexes are available, such as B-tree indexes, bitmap indexes, and hash indexes, each suited to specific data types and query patterns. The choice of index depends on the nature of the data, the type of queries executed, and the database management system (DBMS) in use.

After reading this material, you should be able to answer the following questions:

Types of Indexes

Indexes are essential in databases to enhance query performance and data retrieval speed. There are several types of indexes, each designed for specific use cases and database systems. Below are detailed explanations of commonly used index types and their applications across various database platforms.

Clustered Indexes

A clustered index determines the physical order of data in a table. Essentially, the table's rows are stored on disk in the same sequence as the index.

Advantages:

Limitations:

Example: In SQL Server, when a primary key is defined, a clustered index is created by default. Similarly, in MySQL's InnoDB engine, the primary key is stored as a clustered index.

Non-Clustered Indexes

A non-clustered index creates a separate structure that contains the indexed columns and pointers (row locators) to the actual data rows. Unlike clustered indexes, they do not alter the physical order of the data on disk.

Advantages:

Limitations:

Example: In databases like PostgreSQL and SQL Server, you can create multiple non-clustered indexes to improve performance on frequently queried columns.

Other Types of Indexes

I. Unique Index:

II. Bitmap Index:

III. Full-Text Index:

IV. Spatial Index:

V. XML and JSON Indexes:

VI. Clustered vs. Non-Clustered Index Comparison:

When to Use Which Index

Choosing the right type of index depends on the database workload and query patterns:

Index Type Physical Data Order Number of Indexes Allowed Best for Queries Limitations
Clustered Index Matches the physical order of the table. 1 per table Range-based queries (e.g., BETWEEN, ORDER BY), primary key lookups. Only one per table; updating data can be slow if frequent reordering is needed.
Non-Clustered Index Does not alter physical data order. Multiple per table Filtering with WHERE, JOIN, GROUP BY; specific column lookups. Slower for range queries; requires additional storage and lookup steps.
Unique Index Can enforce unique constraints. Multiple per table Columns requiring uniqueness (e.g., email, username). Cannot handle duplicate values; limited to specific use cases.
Bitmap Index Does not alter physical data order. Multiple per table Low-cardinality columns (e.g., gender, status) in analytics workloads. Not suitable for high-cardinality columns; updates can be resource-intensive.
Full-Text Index Does not alter physical data order. Depends on database system Text search queries (CONTAINS, MATCH) in large text fields. Limited to text-based columns; not ideal for frequent updates.
Spatial Index Does not alter physical data order. Multiple per table Geographical or spatial queries (e.g., location-based filtering). Specialized to spatial data types; less useful for general-purpose queries.
XML/JSON Index Does not alter physical data order. Multiple per table Queries on semi-structured data in XML or JSON format. Specific to XML/JSON; performance depends on the database's native support for these formats.

Visualizing Index Concepts

Visual representations of index structures, such as B-trees, provide valuable insights into how data is organized and accessed efficiently. This section delves deeper into the B-tree index structure, elucidating its components, operational mechanics, and advantages in database management.

B-Tree Index Structure

A fundamental and widely used index type is the B-tree (Balanced Tree), which organizes data in a hierarchical, balanced tree structure. This design facilitates rapid data retrieval, insertion, and deletion operations, making it highly effective for databases that handle large volumes of data with frequent read and write operations.

# B-Tree Structure:
          [M]
         /   \
     [G]       [T]
    /  \       /  \
[A-F][H-L] [N-S][U-Z]

Operational Mechanics:

Advantages of B-Tree Indexes:

Hash Index Structure

Hash Indexes use a hash table where keys are hashed to determine their storage location. They are optimized for exact-match queries.

# Hash Index Structure:
Hash Function: h(key) = hash_value

Key 'A' β†’ h(A) β†’ Bucket 1
Key 'B' β†’ h(B) β†’ Bucket 2
Key 'C' β†’ h(C) β†’ Bucket 3
...

Operational Mechanics

Advantages

Disadvantages

Bitmap Indexes use bitmaps (arrays of bits) to represent the presence or absence of a value in a dataset. They are highly efficient for columns with low cardinality.

# Bitmap Index Structure for Gender Column:
Value 'M': 101010
Value 'F': 010101

Operational Mechanics

Advantages

Disadvantages

GiST (Generalized Search Tree) Index Structure

GiST (Generalized Search Tree) indexes are flexible, supporting a variety of data types and query operations. They extend the B-Tree structure to accommodate complex data types.

# GiST Index Structure for Geospatial Data:
          [BBox1]
         /       \
    [BBox2]      [BBox3]
    /    \        /    \
[A-F] [G-L]  [M-S]  [T-Z]

Operational Mechanics

Advantages

Disadvantages

Use Cases:

GIN (Generalized Inverted Index) Structure

GIN (Generalized Inverted Index) is optimized for indexing composite values, such as arrays and full-text search data. It efficiently handles multiple keys per row.

# GIN Index Structure for Tags Column:
Tags: ['SQL', 'Database']
Tags: ['Index', 'Performance']
Tags: ['SQL', 'Optimization']

Operational Mechanics

Advantages

Disadvantages

Use Cases:

R-Tree Index Structure

R-Tree (Rectangle Tree) indexes are designed for spatial access methods, efficiently handling multi-dimensional data such as geographical coordinates.

# R-Tree Structure for Spatial Data:
          [Rect1]
         /       \
    [Rect2]      [Rect3]
    /    \        /    \
[ShapeA] [ShapeB] [ShapeC] [ShapeD]

Operational Mechanics

Advantages

Disadvantages

Use Cases:

Full-Text Index Structure

Full-Text Indexes are specialized indexes designed to facilitate efficient searching of text data, supporting complex query patterns like phrase searches and relevance ranking.

# Full-Text Index Structure for Articles:
Term 'SQL' β†’ [Article1, Article3]
Term 'Database' β†’ [Article1, Article2]
Term 'Optimization' β†’ [Article3]
...

Operational Mechanics

Advantages

Disadvantages

Use Cases:

Bitmap Join Index

Bitmap Join Indexes combine the functionalities of bitmap indexes with join operations, optimizing queries that involve joining multiple tables based on low-cardinality columns.

# Bitmap Join Index Structure:
Table A: [ID, Category]
Table B: [ID, Description]

Bitmap for Category 'Electronics' in Table A:
1 β†’ 1
2 β†’ 0
3 β†’ 1
...

Use Cases:

Spatial Indexes

Spatial Indexes are tailored for indexing spatial data types, enabling efficient querying of geographical and geometric information.

# Spatial Index Structure for Locations:
          [MBR1]
         /      \
    [MBR2]        [MBR3]
    /    \          /    \
[LocA] [LocB]    [LocC] [LocD]

Use Cases:

Comparative Summary of Index Structures

To provide a clear overview of the various index types and their characteristics, the following table summarizes the key features, advantages, and suitable use cases for each index structure.

Index Type Structure Key Features Advantages Disadvantages Use Cases
B-Tree Balanced Tree Hierarchical, sorted keys, supports range queries Versatile, efficient for a wide range of queries Not optimized for exact-match only Primary keys, general-purpose indexing
Hash Hash Table Exact-match lookups, hash function mapping keys to buckets Extremely fast for exact matches Ineffective for range queries, potential collisions Unique identifier lookups
Bitmap Bitmaps per Value Low cardinality, uses bits to represent presence Efficient for categorical data, fast for multi-condition queries Not suitable for high cardinality, storage overhead Data warehousing, analytical queries
GiST Generalized Search Tree Supports various data types, bounding boxes Highly flexible, efficient for complex and multi-dimensional queries More complex to implement and maintain Geospatial data, full-text search
GIN Inverted Index Multi-key support, handles composite values Excellent for multi-value columns and full-text search High storage requirements, complex updates Full-text search, array indexing
R-Tree Rectangle Tree Optimized for spatial data, bounding rectangles Efficient for spatial queries, handles multi-dimensional data Overlapping bounding boxes can increase search paths GIS, location-based services
Full-Text Inverted Index Tokenization, supports phrase and relevance-based searches Advanced text search capabilities, relevance ranking High storage and maintenance overhead Search engines, content management systems
Bitmap Join Bitmap with Joins Combines bitmap indexing with join operations Optimizes complex joins on categorical data Limited to specific scenarios, maintenance complexity Data warehousing, star schema queries
Spatial Spatial Hierarchical Minimum Bounding Rectangles, optimized for geographical data Efficient spatial querying, supports proximity and containment queries Specialized for spatial data, not suitable for general use GIS, mapping applications

Managing Indexes

Managing indexes is a critical aspect of database optimization and performance tuning. This section delves into the comprehensive range of actions involved in handling indexes, including their creation, usage, monitoring, and removal. Additionally, it highlights how these operations may vary across different database systems and outlines key considerations to ensure effective index management.

Effective index management encompasses several activities:

  1. Creation involves designing and implementing indexes to optimize database query performance for specific workloads and query patterns.
  2. Usage focuses on leveraging existing indexes to improve the speed and efficiency of data retrieval operations during query execution.
  3. Monitoring includes tracking the performance and health of indexes to ensure they remain effective, typically by using database tools or analyzing query execution plans.
  4. Maintenance refers to actions like rebuilding or reorganizing indexes to address issues such as fragmentation, which can degrade performance over time.
  5. Dropping is the process of removing indexes that are unnecessary or detrimental, either due to lack of use or negative impact on system performance, such as added maintenance overhead.

Index Creation

Indexes can be created using SQL commands, specifying the type of index and the columns to include. The choice of index type and the columns selected play a pivotal role in query optimization. Properly designed indexes can drastically reduce query execution time, while poorly designed ones can have the opposite effect.

Creating a Clustered Index

A clustered index determines the physical order of data in a table. Since data rows are stored in the order of the clustered index, there can only be one clustered index per table. This type of index is typically applied to primary key columns to ensure data is stored in a logical and efficient manner.

CREATE CLUSTERED INDEX index_name ON table_name(column_name);

Example:

To create a clustered index on the EmployeeID column of the employees table:

CREATE CLUSTERED INDEX idx_employee_id ON employees(EmployeeID);

Creating a Non-Clustered Index

A non-clustered index does not alter the physical order of the data. Instead, it creates a separate structure that references the data rows, allowing multiple non-clustered indexes per table. This flexibility enables optimized access paths for various query patterns.

CREATE NONCLUSTERED INDEX index_name ON table_name(column_name);

Example:

To create a non-clustered index on the Department column of the employees table:

CREATE NONCLUSTERED INDEX idx_department ON employees(Department);

Explanation:

Creating Composite Indexes

Composite indexes involve multiple columns and are beneficial for queries that filter or sort based on multiple fields. They allow the database engine to efficiently handle complex query conditions by leveraging the combined index.

CREATE NONCLUSTERED INDEX index_name ON table_name(column1, column2);

Example:

To create a composite index on LastName and FirstName:

CREATE NONCLUSTERED INDEX idx_name ON employees(LastName, FirstName);

Database-Specific Index Creation

Different database systems may have unique syntax or additional index types. Understanding these differences is crucial for effective index management. Below is a comparison of index creation across popular databases:

Feature MySQL PostgreSQL Oracle SQL Server
Basic Syntax CREATE INDEX index_name ON table_name(column_name); CREATE INDEX index_name ON table_name USING btree(column_name); CREATE INDEX index_name ON table_name(column_name); CREATE NONCLUSTERED INDEX index_name ON table_name(column_name);
Clustered Index Supported via InnoDB's primary key. Not directly supported; similar behavior through table organization. Supported using Index-Organized Tables (IOT). CREATE CLUSTERED INDEX index_name ON table_name(column_name);
Index Types B-tree, Hash, Full-text, Spatial B-tree, Hash, GiST, SP-GiST, GIN, BRIN B-tree, Bitmap, Function-based, Reverse key Clustered, Non-Clustered, Unique, Filtered, Columnstore, XML
Advanced Options Partial indexes via generated columns, Spatial indexes for geospatial data. Partial indexes, Expression indexes, Multi-column indexes. Function-based indexes, Bitmap indexes for data warehousing scenarios. Included columns (INCLUDE), Online index operations, Index compression, Filtered indexes.
Unique Index CREATE UNIQUE INDEX index_name ON table_name(column_name); CREATE UNIQUE INDEX index_name ON table_name(column_name); CREATE UNIQUE INDEX index_name ON table_name(column_name); CREATE UNIQUE NONCLUSTERED INDEX index_name ON table_name(column_name);
Example CREATE INDEX idx_department ON employees(Department); CREATE INDEX idx_department ON employees USING btree(Department); CREATE INDEX idx_department ON employees(Department); CREATE NONCLUSTERED INDEX idx_department ON employees(Department) INCLUDE (AnotherColumn);

Index Usage

Indexes enhance query performance by allowing the database engine to locate and access data more efficiently. Proper usage involves:

SELECT FirstName, LastName FROM employees WHERE Department = 'Sales';

Explanation:

With an index on Department, the database can quickly locate relevant rows without scanning the entire table, significantly improving query performance.

SELECT e.FirstName, d.DepartmentName
  FROM employees e
  JOIN departments d ON e.DepartmentID = d.DepartmentID;

Explanation:

Indexes on employees.DepartmentID and departments.DepartmentID expedite the join process by allowing rapid matching of related records.

SELECT Department, COUNT(*) FROM employees GROUP BY Department;

Explanation:

An index on Department allows the database to quickly group records, enhancing the efficiency of the aggregation.

Additional Usage Scenarios:

CREATE NONCLUSTERED INDEX idx_covering ON employees(Department) INCLUDE (FirstName, LastName);
  
  SELECT FirstName, LastName FROM employees WHERE Department = 'Sales';

Explanation:

CREATE NONCLUSTERED INDEX idx_active_employees ON employees(EmployeeStatus) WHERE EmployeeStatus = 'Active';

Explanation:

Monitoring Indexes

Regular monitoring ensures that indexes remain effective and do not degrade performance over time. Key monitoring activities include:

Index Fragmentation

Fragmentation occurs when the physical order of pages within an index becomes disorganized, leading to inefficient data access. High fragmentation can slow down query performance and increase I/O operations.

SELECT
      OBJECT_NAME(ps.object_id) AS TableName,
      i.name AS IndexName,
      ps.avg_fragmentation_in_percent
    FROM
      sys.dm_db_index_physical_stats(DB_ID(), 'table_name', NULL, NULL, 'LIMITED') ps
      INNER JOIN sys.indexes i ON ps.object_id = i.object_id AND ps.index_id = i.index_id
    WHERE
      i.type_desc IN ('CLUSTERED', 'NONCLUSTERED');

**Explanation:**

- **sys.dm_db_index_physical_stats**: Provides physical statistics about indexes, including fragmentation levels.
- **avg_fragmentation_in_percent**: Indicates the percentage of fragmentation; values above 30% typically warrant maintenance.

SELECT
      indexrelid::regclass AS index_name,
      pg_stat_user_indexes.idx_scan,
      pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
    FROM
      pg_stat_user_indexes
    WHERE
      schemaname = 'public';

**Explanation:**

- **pg_stat_user_indexes**: Provides statistics about index usage.
- **pg_relation_size**: Returns the size of the index, helping identify potential bloat.

ALTER INDEX index_name ON table_name REBUILD;

**Explanation:**

- **ALTER INDEX idx_department ON employees REBUILD;**: Reconstructs the idx_department index on the employees table, eliminating fragmentation.

ALTER INDEX index_name ON table_name REORGANIZE;

**Explanation:**

- **ALTER INDEX idx_department ON employees REORGANIZE;**: Defragments the idx_department index without fully rebuilding it, which is faster and less resource-intensive.
Index Usage Statistics

Understanding how often indexes are used helps identify unused or rarely used indexes that may be candidates for removal, thereby reducing maintenance overhead and storage costs.

SELECT
    OBJECT_NAME(s.[object_id]) AS TableName,
    i.name AS IndexName,
    s.user_seeks,
    s.user_scans,
    s.user_lookups,
    s.user_updates
  FROM
    sys.dm_db_index_usage_stats s
    INNER JOIN sys.indexes i ON i.object_id = s.object_id AND i.index_id = s.index_id
  WHERE
    OBJECTPROPERTY(s.object_id, 'IsUserTable') = 1
    AND s.database_id = DB_ID();

Query pg_stat_user_indexes and pg_index for usage patterns.

SELECT
    relname AS table_name,
    indexrelname AS index_name,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch
  FROM
    pg_stat_user_indexes
  WHERE
    schemaname = 'public';

Explanation:

Interpretation: - Active Indexes: High idx_scan and idx_tup_fetch values indicate active usage. - Unused Indexes: Low or zero values suggest the index may be redundant.

Action Steps:

Index Bloat

Excessive unused space within indexes, known as index bloat, can lead to increased storage costs and reduced performance. Regularly identifying and addressing index bloat is essential for maintaining efficient database operations.

SELECT
      relname AS index_name,
      pg_size_pretty(pg_relation_size(indexrelid)) AS size
    FROM
      pg_stat_user_indexes
      JOIN pg_index ON pg_stat_user_indexes.indexrelid = pg_index.indexrelid
    WHERE
      pg_stat_user_indexes.idx_scan = 0;

**Explanation:**

- **pg_relation_size(indexrelid)**: Retrieves the size of the index.
- **idx_scan = 0**: Filters for indexes that have never been scanned, indicating potential bloat.

Regularly rebuild or drop and recreate bloated indexes to reclaim space and improve performance.

ALTER INDEX index_name ON table_name REBUILD;

Explanation:

Alternatively, if the index is deemed unnecessary:

DROP INDEX table_name.index_name;

Explanation:

Index Maintenance

Maintaining indexes involves regular tasks to ensure they remain efficient and do not negatively impact database performance. Proper maintenance helps in preventing fragmentation, optimizing storage, and ensuring indexes continue to serve their intended purpose.

Rebuilding Indexes

Rebuilding indexes reorganizes the data and removes fragmentation, leading to improved performance. This process recreates the index from scratch, ensuring that data pages are contiguous and optimally ordered.

ALTER INDEX index_name ON table_name REBUILD;

Example:

To rebuild the idx_department index on the employees table:

ALTER INDEX idx_department ON employees REBUILD;

Benefits:

Considerations:

Automated Maintenance:

Reorganizing Indexes

Reorganizing indexes is a lighter operation compared to rebuilding and is suitable for addressing minor fragmentation. This process defragments the index pages without fully rebuilding the index structure.

ALTER INDEX index_name ON table_name REORGANIZE;

Example:

To reorganize the idx_department index on the employees table:

ALTER INDEX idx_department ON employees REORGANIZE;

Detailed Explanation: - ALTER INDEX idx_department: Specifies the index to be altered. - ON employees: Indicates the table on which the index exists. - REORGANIZE: Initiates the reorganize operation, which: - Defragments the index pages. - Compactly orders the index without a full rebuild. - Minimizes locking and resource usage.

Benefits: - Low Impact: Less resource-intensive and allows more concurrent operations compared to rebuilding. - Continuous Availability: Often performed online without significant downtime. - Incremental: Suitable for addressing fragmentation gradually.

When to Use: - Low to Moderate Fragmentation: Ideal when fragmentation levels are between 10% and 30%. - Frequent Updates: Helps maintain index health without the overhead of full rebuilds.

Updating Statistics

Accurate statistics help the query optimizer make informed decisions about index usage. Statistics provide information about data distribution within indexed columns, enabling the optimizer to choose the most efficient query execution plans.

UPDATE STATISTICS table_name index_name;

Example:

To update statistics for the idx_department index on the employees table:

UPDATE STATISTICS employees idx_department;

Detailed Explanation: - UPDATE STATISTICS employees idx_department;: Refreshes the statistics for the specified index, ensuring the optimizer has the latest data distribution information.

Benefits: - Optimized Query Plans: Ensures the query optimizer can generate efficient execution plans based on current data distributions. - Performance Stability: Prevents performance degradation due to outdated or inaccurate statistics.

Dropping Indexes

If an index is no longer needed or is adversely affecting performance, it can be removed to reclaim resources and streamline operations. Dropping unnecessary indexes reduces storage overhead and minimizes maintenance tasks, particularly for write-heavy tables.

DROP INDEX table_name.index_name;

Example:

To drop the idx_department index from the employees table:

DROP INDEX employees.idx_department;

Detailed Explanation: - DROP INDEX employees.idx_department;: Removes the idx_department index from the employees table.

Considerations: - Impact Analysis: - Query Dependencies: Assess which queries rely on the index to prevent unintended performance degradation. - Application Dependencies: Ensure that no application components (e.g., ORM mappings, stored procedures) depend on the index.

Example Scenario:

Suppose the idx_department index on the employees table is rarely used and has a high maintenance overhead due to frequent updates. After conducting an impact analysis and confirming that no critical queries depend on it, you decide to drop the index:

DROP INDEX employees.idx_department;

Post-Drop Actions: - Monitor Queries: Ensure that queries previously relying on idx_department still perform adequately, potentially adjusting query structures or adding alternative indexes if necessary. - Update Documentation: Remove references to the dropped index from database documentation and maintenance plans.

Database-Specific Considerations

Different database systems offer various index types and management features. Understanding these differences is crucial for effective index management. The following table summarizes key index features across popular databases:

Feature MySQL PostgreSQL Oracle SQL Server
Basic Index Creation CREATE INDEX index_name ON table_name(column_name); CREATE INDEX index_name ON table_name USING btree(column_name); CREATE INDEX index_name ON table_name(column_name); CREATE NONCLUSTERED INDEX index_name ON table_name(column_name);
Clustered Index Supported via InnoDB's primary key. Not directly supported; similar behavior through table organization. Supported using Index-Organized Tables (IOT). CREATE CLUSTERED INDEX index_name ON table_name(column_name);
Index Types B-tree, Hash, Full-text, Spatial B-tree, Hash, GiST, SP-GiST, GIN, BRIN B-tree, Bitmap, Function-based, Reverse key Clustered, Non-Clustered, Unique, Filtered, Columnstore, XML
Advanced Features Partial indexes via generated columns, Spatial indexes for geospatial data. Partial indexes, Expression indexes, Multi-column indexes. Function-based indexes, Bitmap indexes for data warehousing scenarios. Included columns (INCLUDE), Online index operations, Index compression, Filtered indexes.
Unique Index CREATE UNIQUE INDEX index_name ON table_name(column_name); CREATE UNIQUE INDEX index_name ON table_name(column_name); CREATE UNIQUE INDEX index_name ON table_name(column_name); CREATE UNIQUE NONCLUSTERED INDEX index_name ON table_name(column_name);
Full-Text Index CREATE FULLTEXT INDEX index_name ON table_name(column_name); CREATE INDEX index_name ON table_name USING gin(to_tsvector('english', column_name)); CREATE INDEX index_name ON table_name(column_name) INDEXTYPE IS CTXSYS.CONTEXT; CREATE FULLTEXT INDEX ON table_name(column_name) KEY INDEX pk_index;
Spatial Index CREATE SPATIAL INDEX index_name ON table_name(geometry_column); CREATE INDEX index_name ON table_name USING gist(geometry_column); CREATE INDEX index_name ON table_name(geometry_column) INDEXTYPE IS MDSYS.SPATIAL_INDEX; CREATE SPATIAL INDEX index_name ON table_name(geometry_column) USING GEOMETRY_AUTO_GRID;
Index Compression Not natively supported; relies on storage engine capabilities. Not natively supported; relies on table-level compression settings. Supports compressed indexes to reduce storage footprint. CREATE INDEX index_name ON table_name(column_name) WITH (DATA_COMPRESSION = PAGE);
Maintenance Tools MySQL Workbench provides graphical tools for index management. pgAdmin offers tools for index management; extensions like pg_repack assist in maintenance. Oracle Enterprise Manager offers robust index management capabilities. SQL Server Management Studio (SSMS) provides comprehensive graphical tools for index management, monitoring, and maintenance.

Key vs. Non-Key Column Indexing

Application of indexes differs depending on whether they are applied to key columns or non-key columns. Understanding when and how to use these types of indexing is important for efficient database design.

Key Column Indexing

Key column indexing involves creating indexes on columns used as primary keys or unique constraints.

Non-Key Column Indexing

Non-key column indexing focuses on indexing columns that are not unique but are used frequently in query filtering, sorting, or grouping.

Avoid Indexing Low-Cardinality Columns

Be cautious when indexing columns with low cardinality (e.g., boolean fields, columns with only a few distinct values like "Yes/No" or "Active/Inactive"):

Index Scan vs. Index-Only Scan

Understanding how indexes are utilized during query execution can help optimize performance.

Index Scan

Index-Only Scan

Benefits of Index-Only Scan

Trade-offs of Index-Only Scan

Practical Example

Consider an employees table:

EmployeeID FirstName LastName Department
1 John Doe HR
2 Jane Smith IT
3 Michael Brown Finance
4 Emily White IT
5 Robert Green HR

Imagine you need to filter employees by their Department. Without optimization, every query would require scanning the entire table, which becomes inefficient as the number of rows increases. To address this, we can create a non-clustered index on the Department column:

CREATE NONCLUSTERED INDEX idx_department ON employees(Department);

This index works by creating an additional data structure that maps each department to its corresponding rows. Here's an example representation of how the index organizes the data:

Department Index:
+------------+------------------+
| Department | Row Pointer      |
+------------+------------------+
| Finance    | Points to Row 3  |
| HR         | Points to Row 1  |
| HR         | Points to Row 5  |
| IT         | Points to Row 2  |
| IT         | Points to Row 4  |
+------------+------------------+

Before creating the index:

After creating the index:

The graph below illustrates the resource consumption before and after applying the fix:

Resource Usage Impact

  1. High Disk IO and CPU utilization are seen in the first part of the graph, representing inefficient queries before the index was applied.
  2. The green vertical line marks the point where the index was introduced.
  3. Low Disk IO and CPU utilization follow, demonstrating the effectiveness of the index in reducing resource usage.

Benefits of Indexing

Drawbacks of Indexing

Best Practices for Indexing

Table of Contents

    Understanding Indexing in Databases
    1. Types of Indexes
      1. Clustered Indexes
      2. Non-Clustered Indexes
      3. Other Types of Indexes
      4. When to Use Which Index
    2. Visualizing Index Concepts
      1. B-Tree Index Structure
      2. Hash Index Structure
      3. GiST (Generalized Search Tree) Index Structure
      4. GIN (Generalized Inverted Index) Structure
      5. R-Tree Index Structure
      6. Full-Text Index Structure
      7. Bitmap Join Index
      8. Spatial Indexes
    3. Comparative Summary of Index Structures
    4. Managing Indexes
      1. Index Creation
      2. Index Usage
      3. Monitoring Indexes
      4. Index Maintenance
      5. Dropping Indexes
      6. Database-Specific Considerations
    5. Key vs. Non-Key Column Indexing
      1. Key Column Indexing
      2. Non-Key Column Indexing
      3. Avoid Indexing Low-Cardinality Columns
    6. Index Scan vs. Index-Only Scan
      1. Index Scan
      2. Index-Only Scan
      3. Benefits of Index-Only Scan
      4. Trade-offs of Index-Only Scan
    7. Practical Example
    8. Benefits of Indexing
    9. Drawbacks of Indexing
    10. Best Practices for Indexing