Database Indexing Overview

Databases store data in pages (commonly 8 KB each) containing multiple rows. Without indexes, finding a specific record can require scanning many pages sequentially, which is inefficient and slow. Indexes act as maps, guiding the query engine directly to the required data, dramatically reducing lookup times.

The Indexing Problem & Its Solution

The Problem Without Indexing

When a database query is executed without an index, the system loads pages sequentially into memory, scanning each for the target data. For large tables, this can mean reading hundreds of thousands or even millions of pages, resulting in high latency.

Without Index - Full Table Scan:

Loading diagram...

How Indexes Solve the Problem

Indexes are specialized data structures stored on disk. They provide a pointer (or a set of pointers) to the disk pages where the actual data resides. This means the system can quickly locate and load only the relevant page(s), bypassing a full table scan.

With Index - Direct Lookup:

Loading diagram...

Index Types and Visualizations

B-Tree Index

B-Tree indexes are the most common type. They maintain sorted order and support both exact match and range queries efficiently.

How It Works:

Load the Root Node: The system reads the root of the B-Tree
Traverse Internal Nodes: Navigate down based on key comparisons
Reach the Leaf Node: Contains pointers to data pages

B-Tree Structure:

Loading diagram...

Hash Index

Hash indexes use a hash function to convert a key into a hash value, which then maps directly to a data page. They provide O(1) lookups for exact matches but do not support range queries well.

Hash Index Flow:

Loading diagram...

Note: Hash indexes are commonly used in in-memory data stores rather than on-disk databases.

Geospatial Indexes

Geospatial indexes are optimized for two-dimensional data (e.g., latitude and longitude). The three popular types include:

Geohashing: Converts 2D coordinates into 1D strings while preserving spatial locality
Quad Trees: Recursively partitions the space into quadrants, splitting cells only where data density is high
R-Trees / Archeries: A dynamic variant that supports overlapping regions, optimizing spatial queries

Geospatial Indexing - Quad Tree Structure:

Loading diagram...

Inverted Index

Inverted indexes are designed for full-text search. They map each term (or token) to the documents or data pages in which they appear.

Inverted Index Structure:

Loading diagram...

Indexing Decision Flowchart

The following flowchart helps determine the appropriate indexing strategy based on the query characteristics and data type.

Loading diagram...

Quick Reference Table:

Scenario	Recommended Index	Why
Primary key lookups	B-Tree	Balanced performance, supports range queries
Full-text search	Inverted Index	Optimized for text token matching
Location-based queries	Geospatial	Efficient 2D coordinate lookups
Exact match cache lookups	Hash Index	O(1) constant time access
Range queries (BETWEEN, >, <)	B-Tree	Sorted structure enables ranges
Join operations	B-Tree on FK columns	Fast lookups during joins

Additional Considerations

Performance

Indexes reduce disk I/O, leading to faster query responses.

Performance Comparison:

Table Size	Without Index	With B-Tree Index	Speedup
1,000 rows	10 ms	2 ms	5x
100,000 rows	500 ms	3 ms	167x
10,000,000 rows	50,000 ms	4 ms	12,500x

Trade-Offs

While indexes speed up data retrieval, they add overhead to write operations (inserts, updates, and deletes).

Write Operation Impact:

Operation	Without Index	With 3 Indexes	Overhead
INSERT	1 unit	3.5 units	3.5x slower
UPDATE (indexed columns)	1 unit	4 units	4x slower
UPDATE (non-indexed)	1 unit	1.2 units	1.2x slower
DELETE	1 unit	3.2 units	3.2x slower

Best Practice: Only index columns frequently used in WHERE, JOIN, and ORDER BY clauses.

Real-World Usage

Index Type	Common Use Cases	Example Technologies
B-Trees	Relational databases for general-purpose indexing	PostgreSQL, MySQL, SQL Server
Geospatial Indexes	Mapping or location-based searches	PostGIS, MongoDB 2dsphere
Inverted Indexes	Full-text search engines	Elasticsearch, PostgreSQL FTS
Hash Indexes	In-memory stores or specific exact-match use cases	Redis, Memcached
Bitmap Indexes	Data warehousing with low cardinality columns	Oracle, PostgreSQL
GiST/GIN	Complex data types (arrays, JSON, ranges)	PostgreSQL

Conclusion

Effective database indexing is essential for optimizing query performance and building scalable systems. By understanding the strengths and limitations of each index type, you can make informed decisions on which strategy to implement for a given application scenario.

Key Takeaways:

Start with B-Tree indexes - They handle 95% of use cases effectively
Index only what you query - Every index adds write overhead
Monitor query performance - Use EXPLAIN to verify index usage
Choose specialized indexes for specialized data - Geospatial for locations, inverted for text
Keep indexes maintained - Rebuild periodically to prevent bloat

Remember: The best index is the one that gets used by your query planner!