About this article
This article is the second installment of the “Appendix” category in the Architecture Crash Course for the Generative-AI Era series, covering the best-practice catalog.
Where the anti-pattern catalog was a reverse lookup of “landmines you must not step on”, this article is the forward-lookup catalog of “when in doubt, start here.” Each domain gets a one-page distillation of the boring but reliable standard stack. Use it as the skeleton for new projects, the final check before reviews, or the underlay for explaining choices to other teams.
A full list of all Appendix articles is available at the following page.
What are best practices, anyway?
Think of safe-driving basics. “Keep your distance,” “check your mirrors,” “obey the speed limit” — nothing flashy, but sticking to these alone prevents the vast majority of accidents. Advanced techniques only matter once the basics are down.
Best practices are the architecture version of safe-driving rules. They are the boring-but-reliable standard configurations you can call “when in doubt, start here” for each domain.
If you go your own way without knowing best practices, you reinvent wheels over and over, wasting time on problems others have already solved. The professional path is to nail down the standard first and then customize for your situation.
Why you need to learn best practices
They minimize “time spent wavering”
The biggest time sink in new project design is having too many options and not knowing which to pick. Best practices give you a “when in doubt, start here” starting point. You only need to reason about departures from the standard, so decision speed jumps dramatically.
They prevent reinventing the wheel
Without knowing the standard configurations predecessors arrived at through trial and error, you attack already-solved problems with a home-grown approach, burning time and budget. Custom implementations also breed vulnerabilities and maintenance costs, so riding the standard wherever possible is the rational move.
They create a baseline for explaining “why deviate”
Knowing best practices lets you articulate “the standard is X, but Y fits our context better because…” when you deliberately depart. Without that baseline, every choice becomes “just because”, and neither retrospective review nor improvement is possible.
Architecture-wide standards
Before any specific tech choice, these are posture toward design principles that are widely effective in the industry. None are flashy, but obeying them alone keeps you off the “burning project” list.
| Practice | Content | Rationale |
|---|---|---|
| YAGNI (only what you need now) | Don’t build layers / abstractions of uncertain use | Unused code is the prime suspect for technical debt |
| Choose Boring Technology | Prefer options with 2-3+ years of track record | Information density and adoption suppress hallucinations |
| Leave reasoning in ADRs | One-pager per decision | Your largest readers are your future self and your successor |
| Standard libraries / SaaS first | Avoid reinventing the wheel | Custom implementation breeds vulnerabilities and maintenance cost |
| Always measure before choosing | ”Feels faster” is not evidence | Perceived and measured differ by ~30% routinely |
A documented technical compromise is stronger in long-term operations than an undocumented technically correct answer.
Infrastructure / deployment standards
The cloud / runtime defaults that work for startups and mid-sized teams alike 90% of the time. Stretching to K8s or multi-cloud only becomes necessary when revenue and team size grow substantially.
| Practice | Content | Phase |
|---|---|---|
| Lean on a single cloud | One of AWS / GCP / Azure | All phases (up to ~$100M revenue) |
| ECS Fargate / Cloud Run | Standard for container ops | MVP-to-mid before reaching for K8s |
| Manage all resources via Terraform / CDK | Ban manual setup completely | From engineer #1 |
| RDS in private subnet | Never put DBs on public networks | No exceptions |
| 2 AZ, RTO 1h / RPO 15min | Minimum availability bar | Minimum target for business systems |
Phase-by-phase in practice: MVP runs on ECS Fargate single-AZ, RDS t4g.small, ~$30/month. Growth phase (DAU 100k+) adds 2 AZ + Auto Scaling + CloudFront. Enterprise (internal business, regulated) layers on Multi-AZ + VPC endpoints + AWS Control Tower.
The default play is single cloud + managed services. Distributed and DIY are too early for 90% of teams.
Data standards
Because data, unlike applications, cannot be rebuilt, the first choice ripples for five years. The current industry default is RDB + strict schema definitions at the core, and AI-era assumptions don’t change that.
| Practice | Content | Why |
|---|---|---|
| PostgreSQL as first choice | Schema, JSONB, pgvector, extensibility — all present | Closes off the schemaless escape hatch |
| Separate OLTP and OLAP early | Don’t mix operational and analytical DBs | Analytical queries on production are dangerous |
| History tables or Event Sourcing | Don’t overwrite-update; keep history | Pays off for audit, AI, incident analysis |
| dbt tests / Great Expectations | Automate data quality checks | By the time you notice, inconsistencies are in the tens of thousands |
| Backups must run restore drills | Quarterly recovery rehearsal | Having backups doesn’t mean recovery works |
Numeric gates: tables over 10M rows need partitioning; >10k RPS triggers streaming (Kafka / Kinesis); DWH selection from Redshift / BigQuery / Snowflake.
Data architecture failures are 5x heavier than application architecture failures. Compromises here echo for five years.
Application standards
Code design boils down to “≤300 lines per file, ≤50 lines per method, max 3 levels of nesting.” Sticking to those primitive numerics prevents most maintainability problems. More effective than DDD or Clean Architecture theatrics.
| Practice | Content | Threshold |
|---|---|---|
| Single Responsibility Principle splits | One class / file / method = one responsibility | ≤300 lines/file, ≤50 lines/method |
| Business logic in the app | Don’t push it into stored procedures | Preserves DB migration optionality |
| Don’t swallow errors | catch -> log + rethrow | Swallowing is fatal for incident detection |
| Domain-term naming | Avoid data, manager, util | Names that make intent readable |
| Constructor injection | Avoid all-static | Foundation for testability |
| Optional / Result types | Replace null with type-encoded states | Eliminates missed null checks |
Teams that quietly stick to numeric upper bounds tend to produce more in five years than teams flexing complex theory.
Frontend standards
The current industry standard is the three-piece set: meta-framework + utility CSS + managed authentication. Hand-rolled auth, in-house CSS systems, and raw React routing rarely produce returns proportional to their effort.
| Practice | Content | Examples |
|---|---|---|
| Use a meta-framework | Don’t hand-roll routing, SSR, build | Next.js / Astro / Remix |
| JWT in HttpOnly Cookie + BFF | localStorage storage banned | Hide tokens behind a BFF |
| Auth via SaaS | Clerk / Auth.js / Auth0 / Cognito | DIY auth is a vulnerability factory |
| Tailwind + shadcn/ui | Don’t build a custom CSS design system | Optimal for hiring and learning cost |
| Images via CDN transform + WebP / AVIF | Target LCP under 2.5s | Core Web Vitals work |
| JS bundle ≤ 170KB (after gzip) | Code-split for staged delivery | Initial render under 3s on 3G |
For SEO-critical: SSG / ISR. Dashboards: CSR + API. Content + interactivity: SSR + RSC. That’s the current rate.
“Raw React with hand-rolled routing” is, in 2026, a poor choice. Riding a meta-framework is the standard.
Security standards
Security is cheapest when standardized from day one; bolting it on later costs 100x more. The industry default is delegate, defense-in-depth, least privilege. Avoid in-house implementation thoroughly.
| Practice | Content | Required level |
|---|---|---|
| Auth delegated to IDaaS | Auth0 / Cognito / Clerk / Okta | Day 1 of new services |
| MFA mandatory for all users | TOTP / Passkey over SMS | No exception for admins |
| Standardize on Passkey | FIDO2-based passwordless | Standard for new services today |
| TLS 1.3 mandatory, 1.2 minimum | Disable 1.0 / 1.1 | Enforced on all traffic |
| Secrets in Vault / Secret Manager | Detect Git contamination via pre-commit hook | Day 1 of development |
| Zero Trust assumed | Don’t trust the inside of the VPN | Authenticate every request |
| PII masking in logs | Don’t log raw personal data | Data protection law compliance |
Don’t build it, delegate it is the iron rule. The number of organizations in any country where in-house security implementation is justified can be counted on one hand.
Monitoring / operations standards
“Running” and “observable” are different things. Build the visualization stack from the start and run operations on numbers, not gut feel — the SRE-style standard configuration.
| Practice | Content | Target |
|---|---|---|
| Standardize structured logs (JSON) | Format that assumes search and aggregation | From day 1 |
| SLO / SLI / Error budget | Discuss availability numerically | 99.9% (43 min downtime per month) |
| Three pillars assembled | Metrics, logs, traces unified | Datadog / New Relic / Grafana Stack |
| On-call + PagerDuty | Reliably reach a human with alerts | Night shifts always rotate |
| Runbook maintenance | Document procedures for major incidents | Granular enough for new hires to handle nights |
| Mandatory postmortems | Record cause and countermeasure post-incident | Recurrence prevention, not blame |
| Production changes only via CI/CD | Ban production SSH | Reproducibility, audit-ability guaranteed |
“What’s not visualized may as well not exist” is the operations community’s shared assumption.
Process / organization standards
The most common project failure mode is winning on tech and losing on process. The three things that determine long-term operational success: decision records, phased migration, buyer-side understanding.
| Practice | Content | Effect |
|---|---|---|
| Leave decisions in ADRs | One-pager per “why” | Prevents the “nobody can answer in 3 years” problem |
| Strangler Fig phased migration | Avoid big-bang rewrites | Avoids years and millions in burning projects |
| PoC -> implementation order | Always validate uncertain tech first | Estimation accuracy moves by 2x+ |
| Mix architects and implementers | Avoid isolated idealism | Designs aligned with the field |
| Use Conway’s Law in reverse | Reverse-engineer system boundaries from team structure | Align team and API boundaries |
| Buyer also understands architecture | Don’t outsource without understanding | Operations handoff stays possible |
A state where “the buyer doesn’t understand what’s running their tech” is a future explosion. This is industry common knowledge.
AI-era standards
When AI-driven development is assumed, “can AI fluently write or read this?” moves to the center of the selection axis. The four things that determine AI-compatibility: mainstream framework, type safety, declarative, CLI-operable.
| Practice | Content | AI-era effect |
|---|---|---|
| Lean on mainstream frameworks | Next.js / Django / Rails / FastAPI | Volume of training data drives productivity |
| Make types and schemas explicit | TypeScript / Pydantic / dbt models | Suppresses AI hallucinations |
| Tools operable via CLI / API / IaC | Avoid GUI-only tools | Lets AI take over operations |
| Build data catalogs and metadata | Document descriptions, tags, relationships | RAG and AI agents jump in accuracy |
| AI-generated code goes through normal review | No skip-the-checks deploys | Prevents vulnerabilities from reaching production |
| Design assuming pgvector / Pinecone | Don’t bolt vector search on later | Cuts cost of adding RAG features |
The current design principle centers on “can AI fluently write and read this?” Choosing along that axis ends up producing human-friendly designs as a side effect.
Combinations that win with “boring tech”
Stack Overflow running .NET + SQL Server + Redis on 9 servers for 100M+ monthly hits is the canonical “winner who picked boring tech” story. Conversely, Uber’s post-2,200-microservice consolidation into DOMA (Domain-Oriented Microservice Architecture) is frequently cited as “the post-mortem on going to maximum decomposition.”
The current no-fail default combination is below. For new projects, pull from this stack and only swap the elements you really need to.
| Layer | Default |
|---|---|
| Cloud | AWS (Tokyo region) |
| Runtime | ECS Fargate |
| DB | PostgreSQL + pgvector |
| Backend | Python (FastAPI / Django) or Go |
| Frontend | Next.js + Tailwind + shadcn/ui |
| Auth | Clerk or Auth.js |
| Monitoring | Datadog or Grafana Cloud |
| IaC | Terraform |
| CI/CD | GitHub Actions |
Teams that can keep a boring stack running straight for five years are, in the end, the strongest teams.
| Copying “conference-spotlight cases” with a 10-person team | Scale and budget differ from the premise; core feature work stops getting done | | “Build it ourselves to learn” in production | Vulnerability and maintenance-cost breeding ground; delegate to SaaS is the rule |
AI decision axes
| AI-favorable | AI-unfavorable |
|---|---|
| Mainstream frameworks (Next.js / Django / Rails) | Niche frameworks, custom languages |
| Explicit types and schemas (TypeScript / Pydantic) | No types, schemaless |
| Managed + IaC (Terraform / CDK) | Manual setup, GUI operations |
| Design assuming pgvector / Pinecone | No vector DB support |
- Standard / majority — information density and hire-ability are long-term winners.
- Managed / SaaS — wins on vulnerabilities and operational cost over DIY.
- Type-safe / declarative — boosts both AI accuracy and maintainability.
- Phased migration possible — avoid big-bang rewrites, leave room to swap.
Standard stacks auto-maximize AI benefits
The principle of “choose standard / majority” existed before, but in the AI era its effect has doubled. The reason is that AI’s generation accuracy is directly proportional to the volume of training data.
Next.js, Django, Rails, Spring Boot — these mainstream frameworks have massive amounts of publicly available code on GitHub from developers worldwide, giving AI a higher probability of generating accurate code. Simply following best practices faithfully lets you enjoy AI benefits at zero additional cost.
Conversely, if you adopt a niche framework out of technical curiosity, you end up developing with almost no AI assistance. Five years ago the main risk was “difficulty hiring talent”; in 2026, the cost of “missing out on AI benefits” has been added on top.
Type-safe + declarative is the AI-era best practice core
TypeScript, Pydantic, Zod, Prisma, Terraform — what these share is a “declaratively define types and schemas” design philosophy. This design philosophy has become the core of AI-era best practices.
With declarative definitions, AI can determine “what is correct output” from type information, reducing type errors in generated code. In languages without type definitions (raw Python dicts, JavaScript, etc.), humans must verify the type safety of AI-returned code every time, narrowing the range of productivity gains.
Adding types to an existing project after the fact carries significant cost, so choosing type-safe technologies from the start on new projects has become a long-term best practice.
Author’s note — picking “the standard” is a fight against feeling lame
The unexpectedly hard part of architecting, repeatedly mentioned in industry circles, is shaking off the three temptations of “latest,” “cool,” “educational.” Conference-spotlight cases assume “substantial scale, substantial team, substantial budget” — copying them with a 10-person team usually means the core feature work doesn’t happen.
Shopify still running Ruby on Rails as a monolith at massive scale, Basecamp deliberately building HEY’s email service on “boring tech,” Amazon still using C++ and an in-house RPC behind S3 after 20 years — all of these point to the courage to accept being boring as a common trait of winners.
Teams that swallowed the lameness and leaned on standards keep humming five years later. The thing an architect should be proud of is not “how new the technology is” but “the product has been running for five years without stopping.” That, more than anything, is what people who keep doing this work for a long time keep in mind.
Self-check checklist
Confirm whether the standards are in place. Failing 3+ items is a red zone; revisit the relevant references.
- Production decisions are recorded in ADRs.
- All resources managed via Terraform / CDK (no manual setup).
- RDS / DBs always in private subnets.
- Auth delegated to IDaaS (Auth0 / Cognito / Clerk / etc.).
- MFA mandatory for all users.
- Logs output as structured JSON.
- SLOs explicitly defined numerically (e.g., 99.9%).
- Production changes always via CI/CD.
- Runbooks documented for major incidents.
- Backup-restore drills run regularly.
Summary
This article covered the best-practice catalog end-to-end — domain-by-domain defaults, boring tech, the AI-era standard stack, and the design posture for systems that hum quietly for five years.
Lean on the standard, delegate to managed, fasten with types, migrate in phases. That is the realistic answer for best practices in 2026.
The next article covers the “major incident catalog” — Knight Capital, Equifax, SolarWinds, CrowdStrike, and other cases where the industry paid hundreds of millions of dollars. A practical reference for learning from those bills.
Back to series TOC -> ‘Architecture Crash Course for the Generative-AI Era’: How to Read This Book
I hope you’ll read the next article as well.
📚 Series: Architecture Crash Course for the Generative-AI Era (88/89)