Last modified: May 16, 2025
This article is written in: ๐บ๐ธ
Choosing Database
Choosing the right database can significantly influence your project's success. It requires careful evaluation of factors such as the data model, performance requirements, scalability, availability, and cost. Understanding your specific use case and its limitations helps ensure that your choice supports both immediate needs and future growth.
Data Models
Each model handles specific workloads and data shapes, so understanding their strengths helps you make informed architectural decisions.
Choose the model that matches the shape of your data and the way it will be queriedโthe engine comes later.
Relational (Row-Store SQL)
Relational databases structure data into tables with fixed schemas, enforcing strong consistency and rich relational integrity through ACID transactions.
Structured rows, strong consistency, rich joins.
Best for | Typical engines | Key traits |
Financial ledgers, inventory, order processing | PostgreSQL, MySQL, SQL Server | ACID, referential integrity, mature tooling |
+----+----------+----------+
| id | customer | balance |
+----+----------+----------+
| 1 | Omar | 1 250.00|
| 2 | Layla | 93.70 |
+----+----------+----------+
Document & Key-Value (NoSQL)
Document and key-value stores offer flexible schema designs, storing JSON-like documents or simple key-value pairs for scalable and agile development.
Flexible JSON blobs or simple keys, line-out scalability.
Best for | Typical engines | Key traits |
Content feeds, user profiles, catalogues | MongoDB, DynamoDB, Couchbase | Schema-optional, sharding built-in, eventual or tunable consistency |
{
"user": "Omar",
"posts": [
{"id": 1, "text": "Hello"},
{"id": 2, "text": "World"}
]
}
Time-Series
Time-series databases are optimized for storing and querying time-stamped or sequential data, enabling efficient analysis of trends and temporal patterns.
Append-only metrics indexed by timestamp.
Best for | Typical engines | Key traits |
IoT telemetry, server metrics, market ticks | TimescaleDB, InfluxDB, Amazon Timestream | High ingest rate, roll-ups, window queries |
value
20 โค *
18 โค *
16 โค *
14 โค *
12 โค *
10 โค *
8 โค *
0 โผโโโโโโโโโโโโโโโโโโโโโโโโโโ time
Graph
Graph databases represent data as nodes and edges, enabling fast traversals and complex relationship queries for connected data scenarios.
Nodes and edges with millisecond traversals.
Best for | Typical engines | Key traits |
Social graphs, fraud rings, recommendations | Neo4j, Amazon Neptune, ArangoDB | Index-free adjacency, path finding, pattern matching |
(Omar)โโโknowsโโโ(Layla)
โ โฒ โ
โ โโโknowsโโโโโ โ
โผ โผ โผ
(Zayn) (Noura)
In-Memory / Cache
In-memory and caching systems load data directly into RAM, delivering sub-millisecond access times for high-performance and transient workloads.
Entire datasetโor the hot subsetโin RAM.
Best for | Typical engines | Key traits |
Session state, leaderboard counters, low-latency look-ups | Redis, Memcached | ยตs read/write, TTL eviction, optional persistence |
Client โ [Redis] โ (TTL 60 s) โ Primary DB
Scalability
Scalability involves deciding between horizontal scaling (adding nodes) or vertical scaling (enhancing existing nodes). Managed services often offer auto-scaling to adjust capacity dynamically based on demand, supporting seamless growth without manual intervention.
Scale-Up (Vertical)
Vertical scaling, or scaling up, increases the capacity of a single node by adding more CPU, RAM, or IOPS, simplifying operations at the expense of hitting hardware limits.
Best for | How it scales | Trade-offs |
Early MVPs, small OLTP apps | Add CPU / RAM / IOPS to one box | Simpler ops, but hard ceiling and cost curve is exponential |
+------------+
| 4 vCPU, 8G |
+------------+
โ upgrade
โผ
+------------+
| 32 vCPU, |
| 256 G RAM |
+------------+
Read Replication
Read replication distributes query load by creating copies of the primary database, improving read throughput while maintaining a single write master.
Best for | How it scales | Trade-offs |
Read-heavy workloads, dashboards | Fans out read traffic to N replicas | Write master can still bottleneck; replica lag |
Clients
โ
โโโบ Replica 1 (SELECT)
โโโบ Replica 2 (SELECT)
โโโบ Primary (RW)
Sharding / Partitioning
Sharding splits data across multiple nodes based on key ranges or hash functions, enabling vast datasets to be partitioned for parallel processing.
Best for | How it scales | Trade-offs |
Large OLTP sets, multitenant SaaS | Split data by key range or hash across nodes | Cross-shard joins/txns become complex |
Shard Key = user_id % 3
โโโโโโโโโโโโโโฌโโโโโโโโโโโโโฌโโโโโโโโโโโโโ
โ Shard 0 โ Shard 1 โ Shard 2 โ
โ users 0,3โฆ โ users 1,4โฆ โ users 2,5โฆ โ
โโโโโโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโโโโโ
Geo-Distributed / Active-Active
Geo-distributed active-active configurations place writable nodes across regions to reduce latency and enhance resilience, requiring careful consensus and CAP trade-offs.
Best for | How it scales | Trade-offs |
Low-latency global apps, BCDR | Writes land in nearest region; consensus syncs across sites | Network partitions โ CAP decisions; higher cost |
โญโโ us-east โโโฎ โญโโ eu-west โโฎ
โ RW node โโโโโโโโโถโ RW node โ
โฐโโโโโโโโโโโโโโฏ โฐโโโโโโโโโโโโโฏ
Serverless / Elastic Autoscaling
Serverless and elastic autoscaling platforms automatically adjust compute resources in response to traffic spikes, offering rapid scaling at the cost of cold start delays.
Best for | How it scales | Trade-offs |
Spiky, unpredictable traffic | Capacity grows & shrinks in seconds | Cold starts, price premium per request |
Load
โ
โโโโโ low โโโบ 1 container
โโโโโ burst โบ 50 containers
Dedicated Cache Layer
Adding a dedicated caching layer offloads frequent reads to in-memory stores, accelerating data access but introducing extra hops and eviction complexities.
Best for | How it scales | Trade-offs |
Hot-read amplification, session state | Offload reads to in-memory tier; add nodes linearly | Extra hop, eviction strategy required |
Client โ Redis Cluster โ Primary DB
(1 ms) (10 ms)
Availability and Reliability
Maintaining uptime and reliability requires features like Multi-AZ deployments, read replicas, and robust backup and recovery options. Managed services with built-in failover capabilities can ensure your database remains accessible during hardware failures or other disruptions.
Resiliency is architecture, not luckโchoose a pattern that meets your SLA and your budget.*
Single-Node + Local Backup
Local backups provide a basic level of protection by dumping data to the same infrastructure, suitable for non-critical environments with relaxed recovery objectives.
Best for | How it protects | Trade-offs |
Dev, POCs, throw-away demos | Nightly dump to same data center | RTO hours, RPO โค 24 h; hardware failure = downtime |
App โโโบ DB โโโโบ local backup.gz
โ
same rack
Synchronous Failover (Same Zone)
Synchronous failover replicates data in real time within the same availability zone, providing fast recovery but remaining vulnerable to zone-level outages.
Best for | How it protects | Trade-offs |
Small prod apps, cost-sensitive OLTP | Sync replica in same AZ; auto-promote on failure | AZ outage brings both down; write latency โ โข |
Primary โโโบ Sync Replica
โ 1-RTT write ack โฒ
โโโโโ health checkโโ
Cross-Zone Async Replication
Cross-zone asynchronous replication ships updates to a standby in another zone, balancing recovery speed with geographic fault tolerance at the cost of potential data lag.
Best for | How it protects | Trade-offs |
Standard web & mobile backends | Replica in different AZ; failover script | RPO seconds-minutes (async); split-brain risk if promote too early |
AZ-A (RW) โโโasyncโโโบ AZ-B (Stand-by)
Multi-Region Active-Passive (Warm Stand-by)
Warm stand-by setups continuously ship logs to a remote region, enabling fast DNS-based failover while controlling costs compared to fully active designs.
Best for | How it protects | Trade-offs |
Compliance-driven, 99.95 % SLA | Continuous log shipping to remote region; DNS cut-over | RTO 1-15 min; extra 50โ100 % cost; app tier must also fail over |
us-east-1 โโlogsโโโบ eu-west-1
RW Warm Stand-by
Multi-Region Active-Active (Consensus / Global Txn)
Global active-active architectures accept writes in all regions, coordinating via consensus protocols to maintain consistency across geographies at scale.
Best for | How it protects | Trade-offs |
24ร7 global products, <100 ms writes | All regions accept writes; paxos/raft ensures quorum | Higher write latency (N/2+1), complex conflict resolution, premium pricing |
โญโus-eastโโโฎ โ Paxos โ โญโeu-westโโโฎ
โ RW node โ โ RW node โ
โฐโโโโโโโโโโโฏ โฐโโโโโโโโโโโฏ
โ
ap-southeast
Periodic Immutable Backups + Point-in-Time Restore
Immutable backups combined with WAL-based point-in-time restore protect against data corruption and human error by ensuring snapshots cannot be altered.
Best for | How it protects | Trade-offs |
Ransomware, operator error | Snapshots to object store; PITR via WAL | No live HA; downtime during restore |
DB WAL โบ S3 Glacier (immutability)
Chaos & Observability Layer
Integrating chaos testing and observability into your stack validates resilience by proactively injecting failures and monitoring key signals against SLOs.
Best for | How it protects | Trade-offs |
Enforced SLOs, audits | Fault-injection + golden signals (latency, error %, saturation) | Requires culture & tooling investment |
[Chaos Monkey] โ inject fault
โ
โผ
Alerts (SLO burn) โโบ PagerDuty
Decision-Making Trees
Quick-start: decide which branch to follow before you think about engines or vendors. Answer the questions in this one diagramโthen jump straight to the matching decision tree.
#
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ What does your application store? โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโ
โผ โผ โผ
Rows / columns fit a rigid JSON, XML, key-value with Raw binaries, logs,
schema that rarely evolving fields and audio-video, images,
changes? (Yes) partial structure? (Yes) sensor dumps?
โ โ โ
โผ โผ โผ
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ STRUCTURED โ โ SEMI-STRUCTURED โ โ UNSTRUCTURED โ
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ โ โ
โ Go to the **Structured** โ Go to the **Semi-** โ Go to the **Un-**
decision tree decision tree decision tree
Rule of thumb: if even 10 % of your data canโt live in a fixed table, treat the set as semi-structured; if you mostly store files or streams, treat it as unstructured.
Structured Data
#
โโโโโโโโโโโโโโโโโ
โ Structured โ
โโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโ
โผ โผ
OLTP / ACID Analytics / OLAP
(high-QPS, row lookups) (large scans, aggregates)
โ โ
โโโโโโโโโโโโโดโโโโโโโโโโโโ โโโโโโโโโโโดโโโโโโโโโโ
โผ โผ โผ โผ
< 1 TB single node > 1 TB or global Column-store Serverless
MySQL, Postgres NewSQL (TiDB, (Snowflake, column-store
+ read replicas CockroachDB, Redshift, (BigQuery,
or partitions Spanner) ClickHouse) Athena)
Rule of thumb: start simple (row-store) until analytics queries dominate or data tops \~1 TB.
Unstructured Data
#
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Unstructured โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโ
โผ โผ โผ
POSIXโlike FS? Huge blobs? ยตs-latency blocks?
โ Yes โ Yes โ Yes
โผ โผ โผ
Distributed File FS Object Store (S3, GCS) Block Store (EBS)
โ No โ No โ No
โโโถ Need global CDN? โโถ CDN โโถ otherwise FS / OS
Translation: choose by access semantics (files vs. objects vs. blocks) and latency vs. size constraints.
Semi-Structured Data
#
โโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Semi-Structured โ
โโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโ
โผ โผ โผ
Document-style? Simple KV / cache? Graph relations?
โ Yes โ Yes โ Yes
โผ โผ โผ
Document DB (Mongo, In-Memory KV (Redis) Graph DB (Neo4j,
Couchbase) 10 GB-10 TB ยตs latency, TTL keys Neptune)
โ No โ No โ No
โโโโโโโโโโโโโโโฌโโโโโโโโโโโดโโโโโโโโโโโโฌโโโโโโโโโโโโโโ
โผ โผ
Wide/sparse cols? Full-text search?
โ Yes โ Yes
โผ โผ
Wide-Column (Cassandra) Search DB (Elastic,
Time-series OK OpenSearch)
โ No
โผ
Time-ordered writes?
โ Yes
โผ
Time-Series DB (InfluxDB,
TimescaleDB)
Shortcut: pick by dominant access patternโJSON blobs, key-value, graph traversals, etc.
How to Use These Trees
- Locate your data shape (structured, semi-, unstructured).
- Walk the branch that matches latency, size, and query style.
- Prototype quicklyโrun real workloads for a day.
- Re-check every 6-12 months as traffic and cloud services evolve.
Stick to the diagram path and youโll land on a database that fits today without painting yourself into a corner tomorrow.