Datastore Placement Strategy — Combining RDBMS / NoSQL / Cache / Search

About this article

This article is the sixth deep dive in the “System Architecture” category of the Architecture Crash Course for the Generative-AI Era series, covering the overall placement strategy for datastores.

In 2026, combining RDBMS, KVS, search engines, time-series DBs, vector DBs, and object storage by use case is the default. “What goes where” decides operating cost and latency. This article focuses on the system-wide placement strategy — category overviews, Polyglot Persistence, scaling, and backup design (per-app deep selection lives in the “Data Architecture” category).

What is a datastore in the first place

A datastore is, roughly speaking, “the collective term for mechanisms that store and retrieve data for an application.”

Imagine a library. Books go on shelves (RDBMS) in orderly rows, frequently referenced dictionaries sit on the counter (cache), and posters and photos go in a separate warehouse (object storage). Stuffing everything in one place makes it overflow and unsearchable — choosing the right storage location per use is what datastore placement strategy is about.

Scope of this article

Datastore selection happens in two stages: “the system-architecture-level overall strategy” and “the per-application detailed selection in application architecture.” This article focuses on the former: system-wide what-goes-where.

Article	Scope
This article (system-architecture overall strategy)	Overall placement, category overviews, Polyglot Persistence, scaling
Per-app detailed selection (separate category)	Per-app detailed selection, decision flow, volume-based steps
Data modeling (separate category)	Table design, normalization, primary keys, indexes
Data platform (separate category)	DWH, data lake, BI integration

This article’s stance is the overall map for “how to combine RDBMS, NoSQL, cache, search, and object storage at system-architecture design time.” Concrete selection flows by data volume and functional requirements live in the “Data Architecture” category.

The question this article answers is “which datastore types to combine.” Specific selection flows go deeper in another category.

The least reversible decision

Datastore selection is the most irreversible judgment in system architecture. Application code can be rewritten, but DB migration after schemas and large data have piled up is orders of magnitude heavier — sometimes unavoidable days to weeks of downtime.

Covering everything with the familiar RDBMS alone is no longer the modern way. The current default combines RDBMS, NoSQL, cache, search, and object storage. “What you picked here” dictates the next 5-10 years.

This is a domain where “think about it later” doesn’t work. The first decision matters.

The four major categories

Category	Use	Examples
RDBMS	Structured data, integrity priority	PostgreSQL, MySQL
NoSQL	Scale priority, flexible schema	DynamoDB, MongoDB
Cache	High-speed access, transient data	Redis, Memcached
Search engine	Full-text search, aggregation	Elasticsearch, OpenSearch

Plus object storage (S3, etc.) for images, videos, attachments. The design sense of “which data goes where” decides app-wide performance and operating cost.

RDBMS

RDBMS stores data in tables and operates on it via SQL. ACID transactions (Atomicity, Consistency, Isolation, Durability) guarantee strong consistency, with overwhelming track record in finance and business apps where data correctness is non-negotiable.

Product	Trait	Fits
PostgreSQL	OSS standard, JSON / full-text / GIS support	First choice for new development
MySQL	Adoption, lightweight, info density	Web services, legacy compatibility
MariaDB	MySQL fork, fully OSS	MySQL alternative, commercial avoidance
Oracle Database	Enterprise, feature-rich	Finance, core systems
SQL Server	Tight Microsoft integration	.NET / Windows
Amazon Aurora	MySQL/PG compatible, cloud-optimized	New AWS builds

For new projects, PostgreSQL is the default unless there’s a specific reason. JSON type, full-text search, GIS make it stand out in OSS for capability and extensibility. MySQL still works on legacy web services but new selections almost always favor PostgreSQL.

The first choice for new is PostgreSQL. When in doubt, start here.

The four NoSQL families

NoSQL (“Not Only SQL”) covers everything beyond RDBMS. The core traits are scale and flexible structure, splitting into four families:

Family	Examples	Strong at
KVS (key-value)	Redis / DynamoDB / Memcached	Cache, sessions
Document	MongoDB / DocumentDB / Firestore	JSON, hierarchical data
Wide-column	Cassandra / HBase / BigTable	Time-series, write-heavy
Graph	Neo4j / Neptune	Social, relationship analysis

NoSQL’s basic philosophy: “sacrifice RDBMS-grade integrity to scale.” Workloads like e-commerce catalogs, social-feed history, IoT logs — “large volumes written fast” — are where NoSQL shines. Conversely, picking it for an integrity-critical business app is a bad fit.

DynamoDB and Redis

DynamoDB (AWS) and Redis are the two most-used NoSQL products in real work. Both are KVS but with clearly different strong suits.

DynamoDB is a fully managed distributed KVS — “write without thinking about scale” is the headline. It scales theoretically without limit and pairs well with serverless, often picked for high-traffic APIs. Joins and aggregation are weak; the mental model flips from RDBMS — “decide access patterns first, design tables second.”

Redis is the de facto in-memory cache. Sub-millisecond responses from in-memory data; broad utility for sessions, page cache, rate limiting, ranking (ZSET), Pub/Sub.

Use	Recommended
Fixed-pattern, large-scale writes	DynamoDB
Sessions, page cache	Redis / ElastiCache
Real-time leaderboards / rankings	Redis (ZSET)

DynamoDB for scale, Redis for cache. Different roles.

Search engines

Search engines are specialized DBs for fast searches across text content (articles, product descriptions). RDBMS LIKE '%kw%' is possible but unusable past tens of thousands of rows; the standard is to bring in a dedicated search engine.

Product	Trait	Fits
Elasticsearch / OpenSearch	Full-text de facto, also log analytics	Serious search, log aggregation
Meilisearch / Typesense	Lightweight, modern, easy setup	Small/mid apps
PostgreSQL full-text	No extra DB needed; pg_bigm and similar	Small scale, easy start
Algolia	Managed SaaS, rich UI	Speed-priority services

Japanese full-text search hinges on morphological analysis (Kuromoji) configuration. Without proper dictionaries, “Tokyo Metropolis” gets split into “Tokyo” and “Metropolis” with cascading missed matches.

Object storage

Object storage holds images, videos, PDFs, etc. Putting images directly into RDBMS bloats the DB and tanks performance — “metadata in RDBMS, blobs in object storage” is the basic split. Not following this is a textbook landmine.

Service	Provider	Trait
Amazon S3	AWS	Industry de facto, rich integrations
Azure Blob Storage	Azure	Strong tiered storage
Google Cloud Storage	GCP	Strong BigQuery integration
Cloudflare R2	Cloudflare	Free egress, low cost

Combined with CDN (CloudFront, etc.), delivery speed and traffic-cost reduction work together. Cloudflare R2 is S3-compatible without bandwidth charges, drawing attention for bandwidth-dominated workloads like video.

Selection by use

Combining datastores by role is the modern default. Typical configurations:

Use	Recommended
Transaction-centric apps	PostgreSQL / Aurora
Fast sessions / cache	Redis / ElastiCache
Fixed-key ultra-scale writes	DynamoDB
Product search / log aggregation	OpenSearch / Elasticsearch
Time-series metrics	Timestream / InfluxDB
Images, videos, attachments	S3 / R2 (object storage)
Analytics / BI (Business Intelligence)	BigQuery / Redshift / Snowflake

Combining the right DB per use is called Polyglot Persistence.

Polyglot Persistence

Polyglot Persistence is the design philosophy of using multiple datastores in one system. “One DB for everything” looks tidy on paper, but matching the right DB to each use beats it on performance, cost, and ops.

flowchart LR
    APP([App])
    PG[(PostgreSQL<br/>Master, orders, users)]
    RD[(Redis<br/>Sessions, cache)]
    OS[(OpenSearch<br/>Product search, log aggregation)]
    S3[(S3<br/>Images, videos, attachments)]
    DDB[(DynamoDB<br/>High-throughput logs)]
    APP -->|RDB| PG
    APP -->|KVS| RD
    APP -->|Full-text| OS
    APP -->|Files| S3
    APP -->|Time-series| DDB
    classDef app fill:#fef3c7,stroke:#d97706;
    classDef rdb fill:#dbeafe,stroke:#2563eb;
    classDef kvs fill:#fee2e2,stroke:#dc2626;
    classDef search fill:#fae8ff,stroke:#a21caf;
    classDef obj fill:#f0f9ff,stroke:#0369a1;
    class APP app;
    class PG rdb;
    class RD kvs;
    class OS search;
    class S3,DDB obj;

For e-commerce sites and SaaS products this kind of role-split is essentially standard. Connect the datastores via async message queues (SQS / Kafka) rather than synchronous calls — better for failure isolation. Synchronous chains cascade outages when one DB drops.

“One DB for everything” is past tense. Use the right tool per role.

Scaling and replication

As data and traffic grow, a single server can’t keep up. Replication and sharding are the basic scaling strategies.

Method	Mechanism	Use
Read replica	Add read-only copies	Read-heavy apps
Sharding	Distribute data across servers	Distribute writes too
Multi-AZ	Availability-zone redundancy	High-availability
Multi-region	Geographic redundancy / DR	Disaster recovery, global delivery

Cloud-managed DBs (Aurora, Cloud SQL, Cosmos DB) let you toggle these complex configurations with a switch. The operational-cost gap with DIY is overwhelming — “DIY sharding without serious capacity is best left alone” is the practical conclusion.

Backup and restore

Backup strategy is the last line against incidents, mistakes, and ransomware. The point is “can it be restored,” not “is it being taken.” Without periodic restore drills, recovery failures still happen repeatedly.

Aspect	Substance
RPO (Recovery Point Objective)	How far back can you restore (allowable data loss)
RTO (Recovery Time Objective)	How fast you can be back up (allowable downtime)
PITR (Point-in-Time Recovery)	Restore to any past instant
Cross-region backup	Stored in another region for disaster cover
Retention	Meeting legal requirements (e.g. tax: 7 years)

Cloud-managed DBs ship PITR as a standard feature, restorable to any point in the last 5-35 days. Annual / monthly long-term retention often needs separate exports.

DB selection by data volume / traffic

“PostgreSQL by default” is high-accuracy, but the optimum shifts with volume and traffic. Approximations as of April 2026:

Volume	Write RPS	Read RPS	Recommended main DB	Typical companion
Up to 100k rows	up to 10	up to 100	PostgreSQL (single)	None
Up to 10M rows	up to 100	up to 1k	PostgreSQL + 1 read replica	Redis (sessions)
Up to 1B rows	up to 1k	up to 10k	Aurora / Cloud SQL + multiple replicas	Redis + OpenSearch
Up to 10B rows	up to 10k	up to 100k	Aurora + sharding or DynamoDB	Redis + Kafka + ClickHouse
10B rows+	10k+	100k+	DynamoDB / Spanner / Cassandra	Kafka + data lake

Past 10M rows per table, plan index design and partitioning; past 100M rows, consider sharding; past 1B rows, NoSQL becomes a realistic option. Adding cache (Redis) first is often more effective than swapping the DB; verify that cache solves the issue before changing DBs.

When in doubt: PostgreSQL single -> add cache -> add replicas -> shard. Jumping straight to NoSQL is a bad fit.

Schema-change / migration traps

Schema changes are where “DB can’t be changed later” plays out in production. Mistakes here drop production for hours; avoid the forbidden moves rigorously.

Forbidden move	Why
Rename a column in one migration	Old code on running instances dies instantly. Use expand/contract (add column -> dual write -> switch -> drop old) — three steps
Add NOT NULL column to a 10M+ row table immediately	Existing rows violate the constraint; migration hangs for hours
`CREATE INDEX` (non-concurrent) on a hot table	Whole-table lock, writes stop. Use `CREATE INDEX CONCURRENTLY` (PostgreSQL) or `pt-online-schema-change` / `gh-ost` (MySQL)
Apply migrations in production without a backup	The moment something goes wrong, restore is gone
DBA running raw DDL via GUI	No change history, no audit, no replay, no rollback
Bundling migration and code deploy in one transaction	On rollback, schema can’t follow back, or follows but old/new code coexists and breaks
Hand-written SQL without a migration tool (Flyway / Liquibase / Prisma Migrate)	No version control, no CI verification, prod and dev schemas drift
Picking NoSQL as an RDBMS substitute	Joins become necessary and force a major redesign. When transactions are involved, default to RDBMS
Running backups without restore drills	”Can’t restore” surfaces during a production incident. Drills are part of ops, not optional
Deploying to production with no-auth / default config	The 2017 MongoDB / Redis / Elasticsearch ransomware attacks encrypted or wiped tens of thousands of nodes with default public exposure

The GitLab DB deletion of January 31, 2017 (an engineer ran rm -rf on the wrong env, all 5 backup methods failed) demonstrates that beyond backup strategy, DB-operation permission and dual-confirmation also matter.

Schema changes always blow up if you skip “migration tool + PR review + staged application.”

AI decision axes

With AI-driven development as the assumption, the deciding axes become “can the schema be managed in code?” and “can AI generate queries / migrations accurately?”

AI-era favorable	AI-era unfavorable
PostgreSQL + ORM (Prisma/Drizzle/TypeORM)	NoSQL with custom queries (AI struggles)
SQL-based, near-standard	Vendor GUI data manipulation
Migrations managed in code	Schema kept outside code
DB config managed via Infrastructure as Code	Manual console-based DB config

Filter categories by data character (strong / eventual consistency / cache).
Add use-specialized DBs by scale and access pattern (start with normalized DB).
Code-managed schema as an AI-era requirement.
Backup + restore drills as the operational baseline.

The MongoDB “by feel” incident (industry case)

A startup picked MongoDB “to change schema freely,” then six months later started aggregating sales reports from order history — and joins and transactions were suddenly needed everywhere, halting work. Some teams ended up migrating to PostgreSQL and burning three months on the move alone.

Around 2018 I watched a similar case: a startup on MongoDB hit aggregation requirements for orders and couldn’t write the queries; ended up adding daily PostgreSQL sync to feed BI tools.

The lesson: “DB selection isn’t about today’s ease — it’s about a year from now’s requirements.” Especially for transaction-touching apps, default to RDBMS when in doubt. Add NoSQL only when you actually need it. Reversing the order hurts.

“Schemaless is easy” holds until aggregation starts.

What you must decide — what’s your project’s answer?

Articulate your project’s answer in 1-2 sentences for each:

Main DB (PostgreSQL / MySQL / Aurora / DynamoDB)
Cache strategy (Redis / Memcached / in-app / CDN)
Search platform (OpenSearch / PostgreSQL full-text / SaaS)
File store (S3 / R2 / Blob Storage)
Backup policy (RPO / RTO / retention)
Encryption (at-rest, in-transit, key management)
Scaling strategy (replicas / sharding / multi-region)

How to record decision reasons

Datastore selection has long-term impact across the entire system. Record why you picked a specific product as an ADR (Architecture Decision Record).

Field	Content
Title	Adopt PostgreSQL as main DB
Status	Approved
Context	Need to select the main DB for an order-management system. Requirements: transactional consistency + full-text search
Decision	PostgreSQL 16 (Aurora PostgreSQL-compatible)
Rationale	ACID-compliant for monetary calculations; JSON / full-text / GIS built in, reducing the need for extra DBs; Aurora-compatible for easy read-replica scale-out
Rejected alternatives	MySQL: JSON features and partitioning more limited than PostgreSQL. DynamoDB: business requirements need aggregation queries and JOINs
Outcome	Start with PostgreSQL full-text search; add OpenSearch when search volume grows

Months or years later, someone will ask “why PostgreSQL instead of MySQL?” An ADR that answers at a glance is its greatest value.

Summary

This article covered system-architecture-level datastore placement strategy — Polyglot Persistence, scaling, schema-change traps.

For new builds: PostgreSQL-centric, use-specialized additions only as needed (Redis for cache, OpenSearch for search, S3 for blobs). Multi-layer from day one breaks ops; the realistic order is adding when you run short.

The next article covers network (VPC, subnets, CIDR design).

Back to series TOC -> ‘Architecture Crash Course for the Generative-AI Era’: How to Read This Book

I hope you’ll read the next article as well.