About this article
As the third installment of the “Case Studies” category in the series “Architecture Crash Course for the Generative-AI Era,” this article explains the small-mid SaaS case.
5-30 engineers, with paying customers, the phase where downtime directly affects revenue. The turning point graduating from MVP and switching to “design that protects revenue.” This article handles the 3 pillars of managed + IaC + observability, SLA design, SOC 2 compliance, multi-tenant design, and AI utilization.
What is small-mid SaaS architecture in the first place
Imagine a small restaurant expanding into a chain. During the food-stall days, the owner ran everything solo, but once paying customers arrived and “the shop being closed = zero revenue,” new challenges emerged: stable operations, hygiene management, and staff training.
Small-mid SaaS architecture is the design at the turning point of graduating from MVP and switching to “protect revenue”. By leveraging managed services, code-ifying infrastructure with IaC, and embedding monitoring, you build a composition that tolerates production even with few people.
If you grow while keeping the startup-era composition, downtime directly becomes revenue loss, eroding customer trust and accelerating churn.
Why small-mid SaaS-specific design is necessary
Because this is the phase where downtime directly hits revenue
In the MVP era, downtime could be handled with a “sorry.” But with paying customers, even 1 hour of outage directly causes churn and reputation damage. You need to define SLO (Service Level Objective), designing to guarantee 99.9% availability (under 43 minutes of downtime per month).
Because startup-era composition can no longer hold up
A Vercel + Supabase composition hits walls of cost, performance, and multi-tenant isolation once paying customers exceed a few hundred. But importing enterprise full-gear overwhelms the ops team. Realizing “production quality runnable by few people” via the 3 pillars of managed + IaC + observability is the challenge unique to this phase.
Because security certifications like SOC 2 become sales weapons
In B2B SaaS, you need to pass customer companies’ security reviews, and SOC 2 or ISO 27001 certification increasingly becomes a deal prerequisite. Building in logs, access control, and encryption from the design stage smooths certification acquisition and becomes a powerful sales differentiator.
Selection basic policy
The core of this phase is the 3 pillars of managed + IaC + observability. Code-ize with IaC (Infrastructure as Code), thoroughly utilize AWS/GCP/Azure managed services, and create a state where behavior can be grasped via monitoring/logs/traces.
| Prioritize | Postpone |
|---|---|
| Availability 99.9% (43 min monthly down) | Availability 99.99% (4 min monthly down) |
| Managed-centric, light operations | Self-built clusters, full in-house |
| IaC (Terraform / CDK) code-ization | GUI manual operation, Excel management |
| SLO (Service Level Objective) / error-budget operation | Just SLA contract without operation |
Representative profiles are business SaaS, B2B tools, marketplaces, and industry-specialized vertical SaaS. The rational answer is “lean on managed, run with few people” over “do everything yourself.”
Recommended stack (overall picture)
The mainstream is composition of leaning to single cloud (AWS or GCP), managing with IaC, running on containers, and embedding observability from the start. Not an extension of personal dev — create a state of SRE-perspective inspectable from the start.
flowchart TB
USER([Customer])
CDN[CloudFront / CDN]
LB[ALB]
APP[ECS Fargate / Cloud Run<br/>TypeScript / Go / Python]
DB[(Aurora PostgreSQL<br/>managed RDB)]
REDIS[(Redis<br/>ElastiCache)]
AUTH[Auth0 / Cognito<br/>SAML/OIDC support]
MON[Datadog / New Relic<br/>integrated monitoring]
IAC[Terraform / CDK<br/>all infra as code]
USER --> CDN --> LB --> APP
APP --> DB
APP --> REDIS
APP --> AUTH
APP -.metrics/logs/traces.-> MON
IAC -.|provision| CDN
IAC -.| LB
IAC -.| APP
IAC -.| DB
classDef user fill:#fef3c7,stroke:#d97706;
classDef edge fill:#dbeafe,stroke:#2563eb;
classDef app fill:#dcfce7,stroke:#16a34a,stroke-width:2px;
classDef data fill:#fae8ff,stroke:#a21caf;
classDef ops fill:#f0f9ff,stroke:#0369a1;
class USER user;
class CDN,LB edge;
class APP app;
class DB,REDIS,AUTH data;
class MON,IAC ops;
| Area | Recommended | Reason |
|---|---|---|
| Cloud | AWS or GCP (single) | IAM / IaC / monitoring consistent |
| Execution env | ECS Fargate / Cloud Run | No K8s needed, managed |
| Language | TypeScript / Go / Python | Info volume, abundant AI training data |
| DB | Aurora Postgres / Cloud SQL | Managed RDB standard |
| Auth | Auth0 / Cognito / Firebase Auth | For B2B, confirm SAML (industry standard for enterprise SSO) / OIDC (OpenID Connect) support |
| Monitoring | Datadog / New Relic | Integrated view, alert linkage |
| IaC | Terraform / CDK / Pulumi | All infra code-managed |
K8s is unneeded unless there’s a clear reason — the going rate at this scale is ECS Fargate / Cloud Run is enough.
System / deploy choices
Standard is leaning to public cloud (AWS or GCP) single cloud, running container-based. K8s has heavy operations and becomes liability without dedicated SREs at this scale. ECS Fargate (AWS) / Cloud Run (GCP) are managed services where “throw a container and it runs,” internally handling scaling, health checks, and load balancers.
Build CI/CD with GitHub Actions or GitLab CI, creating the flow of auto-deploying on main-branch merge. At minimum, dev / staging / production — 3 environments — with Blue/Green or Canary deploy avoiding downtime at production reflection.
| Choose | Avoid |
|---|---|
| ECS Fargate / Cloud Run | EKS / GKE manual operation |
| Code-ize with Terraform / CDK | Manual resource creation from GUI |
| GitHub Actions / GitLab CI | Self-hosted Jenkins |
| Multi-AZ composition (single region OK) | Multi-region (excessive cost) |
It’s worth self-asking “is K8s really needed” before choosing. Most SaaS get by with Fargate / Cloud Run.
Software / data choices
Monolith or modular monolith is the first choice; microservices is a topic after the org exceeds 50 people. Premature splitting just increases network boundaries, ramping up failure points, latency, and operational cost — premature design for small-mid SaaS.
Base DB on PostgreSQL (Aurora / Cloud SQL), combining Redis (cache), S3 (objects), and Elasticsearch (full-text search) per use case. When analytics use cases emerge, building ELT to BigQuery / Snowflake / Redshift is the standard.
| Choose | Avoid |
|---|---|
| Monolith / modular monolith | Microservices (premature splitting) |
| PostgreSQL (Aurora / Cloud SQL) | Operating with schemaless DB only |
| Redis (cache) + S3 (objects) | Cramming everything in RDB |
| Analytics to separate DWH via ELT | Run analytics queries on production DB |
Microservices is “a prescription after team size becomes a barrier.”
Frontend / auth choices
For frontend, the small-mid SaaS default is Next.js (App Router) + TypeScript + Tailwind for building the main web app. Optionally insert a thin BFF (Backend for Frontend), unifying with React Native / Flutter for mobile apps.
Standard is delegating auth to Auth0 / Cognito / Firebase Auth — self-implementation forbidden here too. For B2B services, SAML / OIDC support emerges early as enterprise-customer requirement, so choosing IdPs supporting these is required. Clerk is optimal at small scale, but with SAML support limited to upper plans, Auth0 is safe at mid-size and up.
| Choose | Avoid |
|---|---|
| Next.js App Router + TS | Old Pages Router, hand-written SSR |
| Tailwind + shadcn/ui + Design Token | Custom CSS design (hard hiring) |
| Auth0 / Cognito (SAML support) | Self-built auth, Clerk free plan only |
| MFA required + Passkey support | Password auth only |
For B2B SaaS, SAML support tends to be a contract condition. Always confirm at auth-foundation selection.
Data / analytics choices
The iron rule from this scale is early separation of business (OLTP = PostgreSQL) and analytics (OLAP = DWH). Throwing analytical queries on production DB continuously slows customer felt speed. Choose any of BigQuery / Snowflake / Redshift, sync daily to hourly via ELT (Fivetran / Airbyte / dbt).
Building data quality / metadata management from the start works in later AI utilization. Decide schema definitions, table-naming conventions, and PII (personal info) masking policy, auto-test-izing with dbt or Great Expectations — the modern standard equipment.
| Choose | Avoid |
|---|---|
| OLTP (Postgres) + OLAP (BigQuery etc.) separation | Analytical queries on production DB |
| ELT (Fivetran / Airbyte + dbt) | Hand-written ETL scripts |
| dbt tests / Great Expectations | No data-quality checks |
| Articulate PII-masking policy | Personal info into analytics DB as is |
Postponing analytics foundation results in crying about customer felt speed.
Security / monitoring choices
At this scale, security is “standard equipment”, no skipping. All of auth (MFA required), authorization (RBAC or ABAC), encryption (TLS required, at-rest encryption), logs, and audit trails are gathered. WAF (Cloudflare / AWS WAF) + DDoS countermeasures are also required at production launch.
For monitoring, install integrated views of Datadog or New Relic, creating a state of seeing metrics / logs / traces / RUM (Real User Monitoring) on one dashboard. Define SLO (e.g., availability 99.9%) and error budget, link alerts to Slack / PagerDuty, and build a regime where night pages come.
| Choose | Avoid |
|---|---|
| MFA required, Passkey support | Password only |
| WAF (Cloudflare / AWS WAF) | Direct origin-server exposure |
| Datadog / New Relic integrated monitoring | Fragmented operations across multiple tools |
| SLO definition + PagerDuty on-call | Monitoring exists but no one looks |
SLO / error-budget operations is the appropriate timing to start at this scale.
Small-mid SaaS numerical gates
Note: Industry baseline values as of April 2026. Will become outdated as technology and the talent market shift, so requires periodic updates.
The standard for small-mid SaaS phase is disciplining operations by numbers. Below are industry-standard metrics.
| Metric | Recommended | Graduation guideline |
|---|---|---|
| Availability SLO | 99.9% (43 min monthly down) | Promote to 99.95% at scale |
| Engineer count | 5-30 | Consider microservices beyond 30 |
| Monthly infrastructure cost | Hundreds of thousands to millions of yen | Dedicated FinOps at tens of millions |
| Paid customers | Hundreds to thousands | Dedicated SRE at tens of thousands |
| Response time (P95) | Within 500ms | Enterprise-class at under 300ms |
| Pentest frequency | Annual | Quarterly with enterprise customers |
| MFA required | All users | - |
| SAML support | Upper plans | Required for B2B contracts |
| Deploy frequency | Several to daily weekly | Aim for Elite level |
| On-call regime | 2-3 concurrent SRE | Move to dedicated SRE |
The graduation guideline from small-mid SaaS is over 30 engineers or tens-of-millions monthly infrastructure cost. Beyond this, regimes close to large-enterprise core systems become needed. SLO 99.9% and 99.95% have several-times build cost difference. 99.9% is enough at this phase.
Pitfalls and forbidden moves
Typical accident patterns at small-mid SaaS phase. All have common causes of copying big enterprises or continuing startup mood.
| Forbidden move | Why it’s bad |
|---|---|
| Premature microservices-ization | Same outcome as Segment 2017, monolith or modular monolith for under 30 |
| Adopt K8s (EKS / GKE manual operation) | Without dedicated SRE, eats operational time, move to ECS Fargate / Cloud Run |
| Multi-cloud composition (using 2 clouds) | IAM / monitoring / IaC duplicated, doubled ops load |
| Don’t prepare analytics DB, aggregate on production DB | Customer felt speed drops causing churn, OLTP / OLAP separation required |
| Don’t define SLO | Can’t discuss quality numerically, ends with “we’ll try” |
| Underestimate SAML support in B2B | Can’t get enterprise contracts, growth stalls |
| No data-quality tests | Inconsistent data pollutes analytics, dbt tests required |
| Skip security standards (MFA / WAF / encryption) | SOC 2 / ISO 27001 acquisition impossible, can’t meet contract requirements |
| 15 people on EKS + 12 services composition | Inter-service-comms incident triage exhausts 1 day weekly, the standard case of integrating to Fargate monolith |
| Inject without PII masking to analytics DB | GDPR / Personal Information Protection Act violation risk |
The contrast of Shopify / Basecamp’s monolith continuation (Shopify reaching hundreds of millions of users on Ruby on Rails, Basecamp running 20+ years on single Rails with dozen-plus engineers) and Segment’s 140-services-to-monolith integration is a typical lesson of asking whether truly needed before splitting.
“Splitting can be done anytime, going back is hell.” Carefully split after revenue grows on monolith.
| “Premature microservices-ization” — splitting | Operational load exceeds division-of-labor effect under 30 people, monolith is enough | | “Should adopt K8s” — over-equipping | Eats operational time without dedicated SRE, Fargate / Cloud Run is enough |
AI decision axes
| AI-favored | AI-disfavored |
|---|---|
| Single cloud + IaC + containers | Multi-cloud + custom PaaS |
| Schema-defined RDB + DWH | Schemaless DB only |
| OpenAPI / GraphQL Schema explicit | Hand-written REST, no type definitions |
| pgvector / RAG-premised design | Classic RDB only |
- Single cloud + managed — light operations
- IaC-ization — manage infra as code
- SLO / observability — run quality numerically
- Security standard equipment — MFA / WAF / encryption all
Author’s note — cases of “going back to monolith” from microservices
Data-foundation SaaS Segment published the blog post “Goodbye Microservices” in 2017, shocking the industry by publishing the path of reintegrating 140+ split microservices to monolith. Causes were complexity of test-environment construction, service-boundary maintenance cost, and debugging difficulty — “premature splitting ate up operations” — still told as the typical case. As the result of small teams running to splitting on idealism, one of the most famous reflection writings in small-mid SaaS circles.
In contrast, Shopify reached hundreds-of-millions-user scale on Ruby on Rails monolith, taking the strategy of phased splitting to modular monolith only when needed. Basecamp (37signals) too has run a single Rails app with a dozen-plus engineers for 20+ years since founding. Asking whether truly needed before splitting — this is the lesson talked about at this scale. The living case of the maxim that distributed systems don’t solve problems but add the problem of distributed systems.
What to decide — what is your project’s answer?
For each of the following, try to articulate your project’s answer in 1-2 sentences. Starting work with these vague always invites later questions like “why did we decide this again?”
When launching small-mid SaaS, appropriate to carefully decide over 1-2 weeks. Many items unchanged for years once decided — worth time spent.
- Cloud vendor (AWS / GCP single selection)
- Execution env (ECS Fargate / Cloud Run / EKS if needed)
- Language / FW (TypeScript + Next.js etc.)
- DB composition (OLTP: Postgres / OLAP: BigQuery etc.)
- Auth foundation (Auth0 / Cognito, confirm SAML support)
- Monitoring tool (Datadog / New Relic)
- IaC (Terraform / CDK)
- SLO definition (availability 99.9% etc.)
Summary
This article covered the small-mid SaaS case, including single-cloud + managed + IaC + SLO, monolith continuation, security standard equipment, and AI-friendly composition.
Lean on managed, manage with IaC, discipline with SLO, grow revenue on monolith. That is the practical answer for small-mid SaaS design in 2026.
Next time we’ll cover the “large-enterprise core” case. Plan to dig into governance-emphasis composition under 1000+ engineers / required audit response / long-term-operation premise, and the practice of phased migration avoiding big-bang reform.
Back to series TOC -> ‘Architecture Crash Course for the Generative-AI Era’: How to Read This Book
I hope you’ll read the next article as well.
📚 Series: Architecture Crash Course for the Generative-AI Era (82/89)