Case Studies

Small-Mid SaaS - Lean on Managed and Run with Few People

Small-Mid SaaS - Lean on Managed and Run with Few People

About this article

As the third installment of the “Case Studies” category in the series “Architecture Crash Course for the Generative-AI Era,” this article explains the small-mid SaaS case.

5-30 engineers, with paying customers, the phase where downtime directly affects revenue. The turning point graduating from MVP and switching to “design that protects revenue.” This article handles the 3 pillars of managed + IaC + observability, SLA design, SOC 2 compliance, multi-tenant design, and AI utilization.

What is small-mid SaaS architecture in the first place

Imagine a small restaurant expanding into a chain. During the food-stall days, the owner ran everything solo, but once paying customers arrived and “the shop being closed = zero revenue,” new challenges emerged: stable operations, hygiene management, and staff training.

Small-mid SaaS architecture is the design at the turning point of graduating from MVP and switching to “protect revenue”. By leveraging managed services, code-ifying infrastructure with IaC, and embedding monitoring, you build a composition that tolerates production even with few people.

If you grow while keeping the startup-era composition, downtime directly becomes revenue loss, eroding customer trust and accelerating churn.

Why small-mid SaaS-specific design is necessary

Because this is the phase where downtime directly hits revenue

In the MVP era, downtime could be handled with a “sorry.” But with paying customers, even 1 hour of outage directly causes churn and reputation damage. You need to define SLO (Service Level Objective), designing to guarantee 99.9% availability (under 43 minutes of downtime per month).

Because startup-era composition can no longer hold up

A Vercel + Supabase composition hits walls of cost, performance, and multi-tenant isolation once paying customers exceed a few hundred. But importing enterprise full-gear overwhelms the ops team. Realizing “production quality runnable by few people” via the 3 pillars of managed + IaC + observability is the challenge unique to this phase.

Because security certifications like SOC 2 become sales weapons

In B2B SaaS, you need to pass customer companies’ security reviews, and SOC 2 or ISO 27001 certification increasingly becomes a deal prerequisite. Building in logs, access control, and encryption from the design stage smooths certification acquisition and becomes a powerful sales differentiator.

Selection basic policy

The core of this phase is the 3 pillars of managed + IaC + observability. Code-ize with IaC (Infrastructure as Code), thoroughly utilize AWS/GCP/Azure managed services, and create a state where behavior can be grasped via monitoring/logs/traces.

PrioritizePostpone
Availability 99.9% (43 min monthly down)Availability 99.99% (4 min monthly down)
Managed-centric, light operationsSelf-built clusters, full in-house
IaC (Terraform / CDK) code-izationGUI manual operation, Excel management
SLO (Service Level Objective) / error-budget operationJust SLA contract without operation

Representative profiles are business SaaS, B2B tools, marketplaces, and industry-specialized vertical SaaS. The rational answer is lean on managed, run with few people over “do everything yourself.”

The mainstream is composition of leaning to single cloud (AWS or GCP), managing with IaC, running on containers, and embedding observability from the start. Not an extension of personal dev — create a state of SRE-perspective inspectable from the start.

flowchart TB
    USER([Customer])
    CDN[CloudFront / CDN]
    LB[ALB]
    APP[ECS Fargate / Cloud Run<br/>TypeScript / Go / Python]
    DB[(Aurora PostgreSQL<br/>managed RDB)]
    REDIS[(Redis<br/>ElastiCache)]
    AUTH[Auth0 / Cognito<br/>SAML/OIDC support]
    MON[Datadog / New Relic<br/>integrated monitoring]
    IAC[Terraform / CDK<br/>all infra as code]
    USER --> CDN --> LB --> APP
    APP --> DB
    APP --> REDIS
    APP --> AUTH
    APP -.metrics/logs/traces.-> MON
    IAC -.|provision| CDN
    IAC -.| LB
    IAC -.| APP
    IAC -.| DB
    classDef user fill:#fef3c7,stroke:#d97706;
    classDef edge fill:#dbeafe,stroke:#2563eb;
    classDef app fill:#dcfce7,stroke:#16a34a,stroke-width:2px;
    classDef data fill:#fae8ff,stroke:#a21caf;
    classDef ops fill:#f0f9ff,stroke:#0369a1;
    class USER user;
    class CDN,LB edge;
    class APP app;
    class DB,REDIS,AUTH data;
    class MON,IAC ops;
AreaRecommendedReason
CloudAWS or GCP (single)IAM / IaC / monitoring consistent
Execution envECS Fargate / Cloud RunNo K8s needed, managed
LanguageTypeScript / Go / PythonInfo volume, abundant AI training data
DBAurora Postgres / Cloud SQLManaged RDB standard
AuthAuth0 / Cognito / Firebase AuthFor B2B, confirm SAML (industry standard for enterprise SSO) / OIDC (OpenID Connect) support
MonitoringDatadog / New RelicIntegrated view, alert linkage
IaCTerraform / CDK / PulumiAll infra code-managed

K8s is unneeded unless there’s a clear reason — the going rate at this scale is ECS Fargate / Cloud Run is enough.

System / deploy choices

Standard is leaning to public cloud (AWS or GCP) single cloud, running container-based. K8s has heavy operations and becomes liability without dedicated SREs at this scale. ECS Fargate (AWS) / Cloud Run (GCP) are managed services where “throw a container and it runs,” internally handling scaling, health checks, and load balancers.

Build CI/CD with GitHub Actions or GitLab CI, creating the flow of auto-deploying on main-branch merge. At minimum, dev / staging / production — 3 environments — with Blue/Green or Canary deploy avoiding downtime at production reflection.

ChooseAvoid
ECS Fargate / Cloud RunEKS / GKE manual operation
Code-ize with Terraform / CDKManual resource creation from GUI
GitHub Actions / GitLab CISelf-hosted Jenkins
Multi-AZ composition (single region OK)Multi-region (excessive cost)

It’s worth self-asking “is K8s really needed” before choosing. Most SaaS get by with Fargate / Cloud Run.

Software / data choices

Monolith or modular monolith is the first choice; microservices is a topic after the org exceeds 50 people. Premature splitting just increases network boundaries, ramping up failure points, latency, and operational cost — premature design for small-mid SaaS.

Base DB on PostgreSQL (Aurora / Cloud SQL), combining Redis (cache), S3 (objects), and Elasticsearch (full-text search) per use case. When analytics use cases emerge, building ELT to BigQuery / Snowflake / Redshift is the standard.

ChooseAvoid
Monolith / modular monolithMicroservices (premature splitting)
PostgreSQL (Aurora / Cloud SQL)Operating with schemaless DB only
Redis (cache) + S3 (objects)Cramming everything in RDB
Analytics to separate DWH via ELTRun analytics queries on production DB

Microservices is a prescription after team size becomes a barrier.”

Frontend / auth choices

For frontend, the small-mid SaaS default is Next.js (App Router) + TypeScript + Tailwind for building the main web app. Optionally insert a thin BFF (Backend for Frontend), unifying with React Native / Flutter for mobile apps.

Standard is delegating auth to Auth0 / Cognito / Firebase Auth — self-implementation forbidden here too. For B2B services, SAML / OIDC support emerges early as enterprise-customer requirement, so choosing IdPs supporting these is required. Clerk is optimal at small scale, but with SAML support limited to upper plans, Auth0 is safe at mid-size and up.

ChooseAvoid
Next.js App Router + TSOld Pages Router, hand-written SSR
Tailwind + shadcn/ui + Design TokenCustom CSS design (hard hiring)
Auth0 / Cognito (SAML support)Self-built auth, Clerk free plan only
MFA required + Passkey supportPassword auth only

For B2B SaaS, SAML support tends to be a contract condition. Always confirm at auth-foundation selection.

Data / analytics choices

The iron rule from this scale is early separation of business (OLTP = PostgreSQL) and analytics (OLAP = DWH). Throwing analytical queries on production DB continuously slows customer felt speed. Choose any of BigQuery / Snowflake / Redshift, sync daily to hourly via ELT (Fivetran / Airbyte / dbt).

Building data quality / metadata management from the start works in later AI utilization. Decide schema definitions, table-naming conventions, and PII (personal info) masking policy, auto-test-izing with dbt or Great Expectations — the modern standard equipment.

ChooseAvoid
OLTP (Postgres) + OLAP (BigQuery etc.) separationAnalytical queries on production DB
ELT (Fivetran / Airbyte + dbt)Hand-written ETL scripts
dbt tests / Great ExpectationsNo data-quality checks
Articulate PII-masking policyPersonal info into analytics DB as is

Postponing analytics foundation results in crying about customer felt speed.

Security / monitoring choices

At this scale, security is “standard equipment”, no skipping. All of auth (MFA required), authorization (RBAC or ABAC), encryption (TLS required, at-rest encryption), logs, and audit trails are gathered. WAF (Cloudflare / AWS WAF) + DDoS countermeasures are also required at production launch.

For monitoring, install integrated views of Datadog or New Relic, creating a state of seeing metrics / logs / traces / RUM (Real User Monitoring) on one dashboard. Define SLO (e.g., availability 99.9%) and error budget, link alerts to Slack / PagerDuty, and build a regime where night pages come.

ChooseAvoid
MFA required, Passkey supportPassword only
WAF (Cloudflare / AWS WAF)Direct origin-server exposure
Datadog / New Relic integrated monitoringFragmented operations across multiple tools
SLO definition + PagerDuty on-callMonitoring exists but no one looks

SLO / error-budget operations is the appropriate timing to start at this scale.

Small-mid SaaS numerical gates

Note: Industry baseline values as of April 2026. Will become outdated as technology and the talent market shift, so requires periodic updates.

The standard for small-mid SaaS phase is disciplining operations by numbers. Below are industry-standard metrics.

MetricRecommendedGraduation guideline
Availability SLO99.9% (43 min monthly down)Promote to 99.95% at scale
Engineer count5-30Consider microservices beyond 30
Monthly infrastructure costHundreds of thousands to millions of yenDedicated FinOps at tens of millions
Paid customersHundreds to thousandsDedicated SRE at tens of thousands
Response time (P95)Within 500msEnterprise-class at under 300ms
Pentest frequencyAnnualQuarterly with enterprise customers
MFA requiredAll users-
SAML supportUpper plansRequired for B2B contracts
Deploy frequencySeveral to daily weeklyAim for Elite level
On-call regime2-3 concurrent SREMove to dedicated SRE

The graduation guideline from small-mid SaaS is over 30 engineers or tens-of-millions monthly infrastructure cost. Beyond this, regimes close to large-enterprise core systems become needed. SLO 99.9% and 99.95% have several-times build cost difference. 99.9% is enough at this phase.

Pitfalls and forbidden moves

Typical accident patterns at small-mid SaaS phase. All have common causes of copying big enterprises or continuing startup mood.

Forbidden moveWhy it’s bad
Premature microservices-izationSame outcome as Segment 2017, monolith or modular monolith for under 30
Adopt K8s (EKS / GKE manual operation)Without dedicated SRE, eats operational time, move to ECS Fargate / Cloud Run
Multi-cloud composition (using 2 clouds)IAM / monitoring / IaC duplicated, doubled ops load
Don’t prepare analytics DB, aggregate on production DBCustomer felt speed drops causing churn, OLTP / OLAP separation required
Don’t define SLOCan’t discuss quality numerically, ends with “we’ll try”
Underestimate SAML support in B2BCan’t get enterprise contracts, growth stalls
No data-quality testsInconsistent data pollutes analytics, dbt tests required
Skip security standards (MFA / WAF / encryption)SOC 2 / ISO 27001 acquisition impossible, can’t meet contract requirements
15 people on EKS + 12 services compositionInter-service-comms incident triage exhausts 1 day weekly, the standard case of integrating to Fargate monolith
Inject without PII masking to analytics DBGDPR / Personal Information Protection Act violation risk

The contrast of Shopify / Basecamp’s monolith continuation (Shopify reaching hundreds of millions of users on Ruby on Rails, Basecamp running 20+ years on single Rails with dozen-plus engineers) and Segment’s 140-services-to-monolith integration is a typical lesson of asking whether truly needed before splitting.

Splitting can be done anytime, going back is hell. Carefully split after revenue grows on monolith.

| “Premature microservices-ization” — splitting | Operational load exceeds division-of-labor effect under 30 people, monolith is enough | | “Should adopt K8s — over-equipping | Eats operational time without dedicated SRE, Fargate / Cloud Run is enough |

AI decision axes

AI-favoredAI-disfavored
Single cloud + IaC + containersMulti-cloud + custom PaaS
Schema-defined RDB + DWHSchemaless DB only
OpenAPI / GraphQL Schema explicitHand-written REST, no type definitions
pgvector / RAG-premised designClassic RDB only
  1. Single cloud + managed — light operations
  2. IaC-ization — manage infra as code
  3. SLO / observability — run quality numerically
  4. Security standard equipmentMFA / WAF / encryption all

Author’s note — cases of “going back to monolith” from microservices

Data-foundation SaaS Segment published the blog post Goodbye Microservices in 2017, shocking the industry by publishing the path of reintegrating 140+ split microservices to monolith. Causes were complexity of test-environment construction, service-boundary maintenance cost, and debugging difficulty — premature splitting ate up operations — still told as the typical case. As the result of small teams running to splitting on idealism, one of the most famous reflection writings in small-mid SaaS circles.

In contrast, Shopify reached hundreds-of-millions-user scale on Ruby on Rails monolith, taking the strategy of phased splitting to modular monolith only when needed. Basecamp (37signals) too has run a single Rails app with a dozen-plus engineers for 20+ years since founding. Asking whether truly needed before splitting — this is the lesson talked about at this scale. The living case of the maxim that distributed systems don’t solve problems but add the problem of distributed systems.

What to decide — what is your project’s answer?

For each of the following, try to articulate your project’s answer in 1-2 sentences. Starting work with these vague always invites later questions like “why did we decide this again?”

When launching small-mid SaaS, appropriate to carefully decide over 1-2 weeks. Many items unchanged for years once decided — worth time spent.

  • Cloud vendor (AWS / GCP single selection)
  • Execution env (ECS Fargate / Cloud Run / EKS if needed)
  • Language / FW (TypeScript + Next.js etc.)
  • DB composition (OLTP: Postgres / OLAP: BigQuery etc.)
  • Auth foundation (Auth0 / Cognito, confirm SAML support)
  • Monitoring tool (Datadog / New Relic)
  • IaC (Terraform / CDK)
  • SLO definition (availability 99.9% etc.)

Summary

This article covered the small-mid SaaS case, including single-cloud + managed + IaC + SLO, monolith continuation, security standard equipment, and AI-friendly composition.

Lean on managed, manage with IaC, discipline with SLO, grow revenue on monolith. That is the practical answer for small-mid SaaS design in 2026.

Next time we’ll cover the “large-enterprise core” case. Plan to dig into governance-emphasis composition under 1000+ engineers / required audit response / long-term-operation premise, and the practice of phased migration avoiding big-bang reform.

Back to series TOC -> ‘Architecture Crash Course for the Generative-AI Era’: How to Read This Book

I hope you’ll read the next article as well.