Database Indexing Overview

8 min readdatabase, indexing, performance, backend, sql
Database Indexing Overview

Databases store data in pages (commonly 8 KB each) containing multiple rows. Without indexes, finding a specific record can require scanning many pages sequentially, which is inefficient and slow. Indexes act as maps, guiding the query engine directly to the required data, dramatically reducing lookup times.

The Indexing Problem & Its Solution

The Problem Without Indexing

When a database query is executed without an index, the system loads pages sequentially into memory, scanning each for the target data. For large tables, this can mean reading hundreds of thousands or even millions of pages, resulting in high latency.

Without Index - Full Table Scan:

Loading diagram...

How Indexes Solve the Problem

Indexes are specialized data structures stored on disk. They provide a pointer (or a set of pointers) to the disk pages where the actual data resides. This means the system can quickly locate and load only the relevant page(s), bypassing a full table scan.

With Index - Direct Lookup:

Loading diagram...

Index Types and Visualizations

B-Tree Index

B-Tree indexes are the most common type. They maintain sorted order and support both exact match and range queries efficiently.

How It Works:

  1. Load the Root Node: The system reads the root of the B-Tree
  2. Traverse Internal Nodes: Navigate down based on key comparisons
  3. Reach the Leaf Node: Contains pointers to data pages

B-Tree Structure:

Loading diagram...

Hash Index

Hash indexes use a hash function to convert a key into a hash value, which then maps directly to a data page. They provide O(1) lookups for exact matches but do not support range queries well.

Hash Index Flow:

Loading diagram...

Note: Hash indexes are commonly used in in-memory data stores rather than on-disk databases.

Geospatial Indexes

Geospatial indexes are optimized for two-dimensional data (e.g., latitude and longitude). The three popular types include:

  • Geohashing: Converts 2D coordinates into 1D strings while preserving spatial locality
  • Quad Trees: Recursively partitions the space into quadrants, splitting cells only where data density is high
  • R-Trees / Archeries: A dynamic variant that supports overlapping regions, optimizing spatial queries

Geospatial Indexing - Quad Tree Structure:

Loading diagram...

Inverted Index

Inverted indexes are designed for full-text search. They map each term (or token) to the documents or data pages in which they appear.

Inverted Index Structure:

Loading diagram...

Indexing Decision Flowchart

The following flowchart helps determine the appropriate indexing strategy based on the query characteristics and data type.

Loading diagram...

Quick Reference Table:

ScenarioRecommended IndexWhy
Primary key lookupsB-TreeBalanced performance, supports range queries
Full-text searchInverted IndexOptimized for text token matching
Location-based queriesGeospatialEfficient 2D coordinate lookups
Exact match cache lookupsHash IndexO(1) constant time access
Range queries (BETWEEN, >, <)B-TreeSorted structure enables ranges
Join operationsB-Tree on FK columnsFast lookups during joins

Additional Considerations

Performance

Indexes reduce disk I/O, leading to faster query responses.

Performance Comparison:

Table SizeWithout IndexWith B-Tree IndexSpeedup
1,000 rows10 ms2 ms5x
100,000 rows500 ms3 ms167x
10,000,000 rows50,000 ms4 ms12,500x

Trade-Offs

While indexes speed up data retrieval, they add overhead to write operations (inserts, updates, and deletes).

Write Operation Impact:

OperationWithout IndexWith 3 IndexesOverhead
INSERT1 unit3.5 units3.5x slower
UPDATE (indexed columns)1 unit4 units4x slower
UPDATE (non-indexed)1 unit1.2 units1.2x slower
DELETE1 unit3.2 units3.2x slower

Best Practice: Only index columns frequently used in WHERE, JOIN, and ORDER BY clauses.

Real-World Usage

Index TypeCommon Use CasesExample Technologies
B-TreesRelational databases for general-purpose indexingPostgreSQL, MySQL, SQL Server
Geospatial IndexesMapping or location-based searchesPostGIS, MongoDB 2dsphere
Inverted IndexesFull-text search enginesElasticsearch, PostgreSQL FTS
Hash IndexesIn-memory stores or specific exact-match use casesRedis, Memcached
Bitmap IndexesData warehousing with low cardinality columnsOracle, PostgreSQL
GiST/GINComplex data types (arrays, JSON, ranges)PostgreSQL

Conclusion

Effective database indexing is essential for optimizing query performance and building scalable systems. By understanding the strengths and limitations of each index type, you can make informed decisions on which strategy to implement for a given application scenario.

Key Takeaways:

  1. Start with B-Tree indexes - They handle 95% of use cases effectively
  2. Index only what you query - Every index adds write overhead
  3. Monitor query performance - Use EXPLAIN to verify index usage
  4. Choose specialized indexes for specialized data - Geospatial for locations, inverted for text
  5. Keep indexes maintained - Rebuild periodically to prevent bloat

Remember: The best index is the one that gets used by your query planner!