Best-Practice Catalog — When in Doubt, Lean on These

About this article

This article is the second installment of the “Appendix” category in the Architecture Crash Course for the Generative-AI Era series, covering the best-practice catalog.

Where the anti-pattern catalog was a reverse lookup of “landmines you must not step on”, this article is the forward-lookup catalog of “when in doubt, start here.” Each domain gets a one-page distillation of the boring but reliable standard stack. Use it as the skeleton for new projects, the final check before reviews, or the underlay for explaining choices to other teams.

flowchart TB
    NEW([New project<br/>= when in doubt, lean on standards])
    A[Architecture overall<br/>YAGNI / Modular monolith]
    I[Infra<br/>Managed / IaC / Single cloud]
    D[Data<br/>PostgreSQL / 3NF normalization]
    APP[App<br/>SOLID / Small classes]
    F[Frontend<br/>TypeScript+Next.js+Tailwind]
    S[Security<br/>IDaaS / Passkey / Secrets Manager]
    O[Monitoring & ops<br/>SLO / Datadog / On-call]
    P[Process<br/>GitHub Flow / Squash / Small PRs]
    AI[AI era<br/>Standard FW / Type safety / Rich training data]
    NEW --> A
    NEW --> I
    NEW --> D
    NEW --> APP
    NEW --> F
    NEW --> S
    NEW --> O
    NEW --> P
    NEW --> AI
    classDef new fill:#fef3c7,stroke:#d97706,stroke-width:2px;
    classDef good fill:#dcfce7,stroke:#16a34a;
    class NEW new;
    class A,I,D,APP,F,S,O,P,AI good;

Architecture-wide standards

Before any specific tech choice, these are posture toward design principles that are widely effective in the industry. None are flashy, but obeying them alone keeps you off the “burning project” list.

Practice	Content	Rationale
YAGNI (only what you need now)	Don’t build layers / abstractions of uncertain use	Unused code is the prime suspect for technical debt
Choose Boring Technology	Prefer options with 2-3+ years of track record	Information density and adoption suppress hallucinations
Leave reasoning in ADRs	One-pager per decision	Your largest readers are your future self and your successor
Standard libraries / SaaS first	Avoid reinventing the wheel	Custom implementation breeds vulnerabilities and maintenance cost
Always measure before choosing	”Feels faster” is not evidence	Perceived and measured differ by ~30% routinely

A documented technical compromise is stronger in long-term operations than an undocumented technically correct answer.

Infrastructure / deployment standards

The cloud / runtime defaults that work for startups and mid-sized teams alike 90% of the time. Stretching to K8s or multi-cloud only becomes necessary when revenue and team size grow substantially.

Practice	Content	Phase
Lean on a single cloud	One of AWS / GCP / Azure	All phases (up to ~$100M revenue)
ECS Fargate / Cloud Run	Standard for container ops	MVP-to-mid before reaching for K8s
Manage all resources via Terraform / CDK	Ban manual setup completely	From engineer #1
RDS in private subnet	Never put DBs on public networks	No exceptions
2 AZ, RTO 1h / RPO 15min	Minimum availability bar	Minimum target for business systems

Phase-by-phase in practice: MVP runs on ECS Fargate single-AZ, RDS t4g.small, ~$30/month. Growth phase (DAU 100k+) adds 2 AZ + Auto Scaling + CloudFront. Enterprise (internal business, regulated) layers on Multi-AZ + VPC endpoints + AWS Control Tower.

The default play is single cloud + managed services. Distributed and DIY are too early for 90% of teams.

Data standards

Because data, unlike applications, cannot be rebuilt, the first choice ripples for five years. The current industry default is RDB + strict schema definitions at the core, and AI-era assumptions don’t change that.

Practice	Content	Why
PostgreSQL as first choice	Schema, JSONB, pgvector, extensibility — all present	Closes off the schemaless escape hatch
Separate OLTP and OLAP early	Don’t mix operational and analytical DBs	Analytical queries on production are dangerous
History tables or Event Sourcing	Don’t overwrite-update; keep history	Pays off for audit, AI, incident analysis
dbt tests / Great Expectations	Automate data quality checks	By the time you notice, inconsistencies are in the tens of thousands
Backups must run restore drills	Quarterly recovery rehearsal	Having backups doesn’t mean recovery works

Numeric gates: tables over 10M rows need partitioning; >10k RPS triggers streaming (Kafka / Kinesis); DWH selection from Redshift / BigQuery / Snowflake.

Data architecture failures are 5x heavier than application architecture failures. Compromises here echo for five years.

Application standards

Code design boils down to “≤300 lines per file, ≤50 lines per method, max 3 levels of nesting.” Sticking to those primitive numerics prevents most maintainability problems. More effective than DDD or Clean Architecture theatrics.

Practice	Content	Threshold
Single Responsibility Principle splits	One class / file / method = one responsibility	≤300 lines/file, ≤50 lines/method
Business logic in the app	Don’t push it into stored procedures	Preserves DB migration optionality
Don’t swallow errors	catch -> log + rethrow	Swallowing is fatal for incident detection
Domain-term naming	Avoid `data`, `manager`, `util`	Names that make intent readable
Constructor injection	Avoid all-static	Foundation for testability
Optional / Result types	Replace null with type-encoded states	Eliminates missed null checks

Teams that quietly stick to numeric upper bounds tend to produce more in five years than teams flexing complex theory.

Frontend standards

The current industry standard is the three-piece set: meta-framework + utility CSS + managed authentication. Hand-rolled auth, in-house CSS systems, and raw React routing rarely produce returns proportional to their effort.

Practice	Content	Examples
Use a meta-framework	Don’t hand-roll routing, SSR, build	Next.js / Astro / Remix
JWT in HttpOnly Cookie + BFF	localStorage storage banned	Hide tokens behind a BFF
Auth via SaaS	Clerk / Auth.js / Auth0 / Cognito	DIY auth is a vulnerability factory
Tailwind + shadcn/ui	Don’t build a custom CSS design system	Optimal for hiring and learning cost
Images via CDN transform + WebP / AVIF	Target LCP under 2.5s	Core Web Vitals work
JS bundle ≤ 170KB (after gzip)	Code-split for staged delivery	Initial render under 3s on 3G

For SEO-critical: SSG / ISR. Dashboards: CSR + API. Content + interactivity: SSR + RSC (React Server Components — server-side rendered, ships HTML and minimal JS to the client). That’s the current rate.

“Raw React with hand-rolled routing” is, in 2026, a poor choice. Riding a meta-framework is the standard.

Security standards

Security is cheapest when standardized from day one; bolting it on later costs 100x more. The industry default is delegate, defense-in-depth, least privilege. Avoid in-house implementation thoroughly.

Practice	Content	Required level
Auth delegated to IDaaS	Auth0 / Cognito / Clerk / Okta	Day 1 of new services
MFA mandatory for all users	TOTP / Passkey over SMS	No exception for admins
Standardize on Passkey	FIDO2-based passwordless	Standard for new services today
TLS 1.3 mandatory, 1.2 minimum	Disable 1.0 / 1.1	Enforced on all traffic
Secrets in Vault / Secret Manager	Detect Git contamination via pre-commit hook	Day 1 of development
Zero Trust assumed	Don’t trust the inside of the VPN	Authenticate every request
PII masking in logs	Don’t log raw personal data	Data protection law compliance

Don’t build it, delegate it is the iron rule. The number of organizations in any country where in-house security implementation is justified can be counted on one hand.

Monitoring / operations standards

“Running” and “observable” are different things. Build the visualization stack from the start and run operations on numbers, not gut feel — the SRE-style standard configuration.

Practice	Content	Target
Standardize structured logs (JSON)	Format that assumes search and aggregation	From day 1
SLO / SLI / Error budget	Discuss availability numerically	99.9% (43 min downtime per month)
Three pillars assembled	Metrics, logs, traces unified	Datadog / New Relic / Grafana Stack
On-call + PagerDuty	Reliably reach a human with alerts	Night shifts always rotate
Runbook maintenance	Document procedures for major incidents	Granular enough for new hires to handle nights
Mandatory postmortems	Record cause and countermeasure post-incident	Recurrence prevention, not blame
Production changes only via CI/CD	Ban production SSH	Reproducibility, audit-ability guaranteed

“What’s not visualized may as well not exist” is the operations community’s shared assumption.

Process / organization standards

The most common project failure mode is winning on tech and losing on process. The three things that determine long-term operational success: decision records, phased migration, buyer-side understanding.

Practice	Content	Effect
Leave decisions in ADRs	One-pager per “why”	Prevents the “nobody can answer in 3 years” problem
Strangler Fig phased migration	Avoid big-bang rewrites	Avoids years and millions in burning projects
PoC -> implementation order	Always validate uncertain tech first	Estimation accuracy moves by 2x+
Mix architects and implementers	Avoid isolated idealism	Designs aligned with the field
Use Conway’s Law in reverse	Reverse-engineer system boundaries from team structure	Align team and API boundaries
Buyer also understands architecture	Don’t outsource without understanding	Operations handoff stays possible

A state where “the buyer doesn’t understand what’s running their tech” is a future explosion. This is industry common knowledge.

AI-era standards

When AI-driven development is assumed, “can AI fluently write or read this?” moves to the center of the selection axis. The four things that determine AI-compatibility: mainstream framework, type safety, declarative, CLI-operable.

Practice	Content	AI-era effect
Lean on mainstream frameworks	Next.js / Django / Rails / FastAPI	Volume of training data drives productivity
Make types and schemas explicit	TypeScript / Pydantic / dbt models	Suppresses AI hallucinations
Tools operable via CLI / API / IaC	Avoid GUI-only tools	Lets AI take over operations
Build data catalogs and metadata	Document descriptions, tags, relationships	RAG and AI agents jump in accuracy
AI-generated code goes through normal review	No skip-the-checks deploys	Prevents vulnerabilities from reaching production
Design assuming pgvector / Pinecone	Don’t bolt vector search on later	Cuts cost of adding RAG features

The current design principle centers on “can AI fluently write and read this?” Choosing along that axis ends up producing human-friendly designs as a side effect.

Combinations that win with “boring tech”

Stack Overflow running .NET + SQL Server + Redis on 9 servers for 100M+ monthly hits is the canonical “winner who picked boring tech” story. Conversely, Uber’s post-2,200-microservice consolidation into DOMA (Domain-Oriented Microservice Architecture) is frequently cited as “the post-mortem on going to maximum decomposition.”

The current no-fail default combination is below. For new projects, pull from this stack and only swap the elements you really need to.

Layer	Default
Cloud	AWS (Tokyo region)
Runtime	ECS Fargate
DB	PostgreSQL + pgvector
Backend	Python (FastAPI / Django) or Go
Frontend	Next.js + Tailwind + shadcn/ui
Auth	Clerk or Auth.js
Monitoring	Datadog or Grafana Cloud
IaC	Terraform
CI/CD	GitHub Actions

Teams that can keep a boring stack running straight for five years are, in the end, the strongest teams.

Author’s note — picking “the standard” is a fight against feeling lame

The unexpectedly hard part of architecting, repeatedly mentioned in industry circles, is shaking off the three temptations of “latest,” “cool,” “educational.” Conference-spotlight cases assume “substantial scale, substantial team, substantial budget” — copying them with a 10-person team usually means the core feature work doesn’t happen.

Shopify still running Ruby on Rails as a monolith at massive scale, Basecamp deliberately building HEY’s email service on “boring tech,” Amazon still using C++ and an in-house RPC behind S3 after 20 years — all of these point to the courage to accept being boring as a common trait of winners.

Teams that swallowed the lameness and leaned on standards keep humming five years later. The thing an architect should be proud of is not “how new the technology is” but “the product has been running for five years without stopping.” That, more than anything, is what people who keep doing this work for a long time keep in mind.

Self-check checklist

Confirm whether the standards are in place. Failing 3+ items is a red zone; revisit the relevant references.

Production decisions are recorded in ADRs.
All resources managed via Terraform / CDK (no manual setup).
RDS / DBs always in private subnets.
Auth delegated to IDaaS (Auth0 / Cognito / Clerk / etc.).
MFA mandatory for all users.
Logs output as structured JSON.
SLOs explicitly defined numerically (e.g., 99.9%).
Production changes always via CI/CD.
Runbooks documented for major incidents.
Backup-restore drills run regularly.

How to make the final call

The essence of best practices is adopting the combination most field teams have survived with as your opening move. Other locally superior options exist, but options where information density, talent availability, and operational track record all line up are typically narrowed to one or two at any given moment.

In the AI era this trend deepens. Frameworks AI writes fluently in produce multi-fold productivity boosts, widening the gap with niche options. Boring but standard directly translates into designs friendly to humans and AI alike.

Selection priority

Standard / majority — information density and hire-ability are long-term winners.
Managed / SaaS — wins on vulnerabilities and operational cost over DIY.
Type-safe / declarative — boosts both AI accuracy and maintainability.
Phased migration possible — avoid big-bang rewrites, leave room to swap.

“Choose the standard, and nobody can blame you in 3 years.” Eccentric selections, even when successful, become tribal knowledge; when they fail, they isolate you.

Summary

This article covered the best-practice catalog end-to-end — domain-by-domain defaults, boring tech, the AI-era standard stack, and the design posture for systems that hum quietly for five years.

Lean on the standard, delegate to managed, fasten with types, migrate in phases. That is the realistic answer for best practices in 2026.

The next article covers the “major incident catalog” — Knight Capital, Equifax, SolarWinds, CrowdStrike, and other cases where the industry paid hundreds of millions of dollars. A practical reference for learning from those bills.

Back to series TOC -> ‘Architecture Crash Course for the Generative-AI Era’: How to Read This Book

I hope you’ll read the next article as well.