About this article
This article is the second installment of the “Appendix” category in the Architecture Crash Course for the Generative-AI Era series, covering the best-practice catalog.
Where the anti-pattern catalog was a reverse lookup of “landmines you must not step on”, this article is the forward-lookup catalog of “when in doubt, start here.” Each domain gets a one-page distillation of the boring but reliable standard stack. Use it as the skeleton for new projects, the final check before reviews, or the underlay for explaining choices to other teams.
flowchart TB
NEW([New project<br/>= when in doubt, lean on standards])
A[Architecture overall<br/>YAGNI / Modular monolith]
I[Infra<br/>Managed / IaC / Single cloud]
D[Data<br/>PostgreSQL / 3NF normalization]
APP[App<br/>SOLID / Small classes]
F[Frontend<br/>TypeScript+Next.js+Tailwind]
S[Security<br/>IDaaS / Passkey / Secrets Manager]
O[Monitoring & ops<br/>SLO / Datadog / On-call]
P[Process<br/>GitHub Flow / Squash / Small PRs]
AI[AI era<br/>Standard FW / Type safety / Rich training data]
NEW --> A
NEW --> I
NEW --> D
NEW --> APP
NEW --> F
NEW --> S
NEW --> O
NEW --> P
NEW --> AI
classDef new fill:#fef3c7,stroke:#d97706,stroke-width:2px;
classDef good fill:#dcfce7,stroke:#16a34a;
class NEW new;
class A,I,D,APP,F,S,O,P,AI good;
More articles in this category
Architecture-wide standards
Before any specific tech choice, these are posture toward design principles that are widely effective in the industry. None are flashy, but obeying them alone keeps you off the “burning project” list.
| Practice | Content | Rationale |
|---|---|---|
| YAGNI (only what you need now) | Don’t build layers / abstractions of uncertain use | Unused code is the prime suspect for technical debt |
| Choose Boring Technology | Prefer options with 2-3+ years of track record | Information density and adoption suppress hallucinations |
| Leave reasoning in ADRs | One-pager per decision | Your largest readers are your future self and your successor |
| Standard libraries / SaaS first | Avoid reinventing the wheel | Custom implementation breeds vulnerabilities and maintenance cost |
| Always measure before choosing | ”Feels faster” is not evidence | Perceived and measured differ by ~30% routinely |
A documented technical compromise is stronger in long-term operations than an undocumented technically correct answer.
Infrastructure / deployment standards
The cloud / runtime defaults that work for startups and mid-sized teams alike 90% of the time. Stretching to K8s or multi-cloud only becomes necessary when revenue and team size grow substantially.
| Practice | Content | Phase |
|---|---|---|
| Lean on a single cloud | One of AWS / GCP / Azure | All phases (up to ~$100M revenue) |
| ECS Fargate / Cloud Run | Standard for container ops | MVP-to-mid before reaching for K8s |
| Manage all resources via Terraform / CDK | Ban manual setup completely | From engineer #1 |
| RDS in private subnet | Never put DBs on public networks | No exceptions |
| 2 AZ, RTO 1h / RPO 15min | Minimum availability bar | Minimum target for business systems |
Phase-by-phase in practice: MVP runs on ECS Fargate single-AZ, RDS t4g.small, ~$30/month. Growth phase (DAU 100k+) adds 2 AZ + Auto Scaling + CloudFront. Enterprise (internal business, regulated) layers on Multi-AZ + VPC endpoints + AWS Control Tower.
The default play is single cloud + managed services. Distributed and DIY are too early for 90% of teams.
Data standards
Because data, unlike applications, cannot be rebuilt, the first choice ripples for five years. The current industry default is RDB + strict schema definitions at the core, and AI-era assumptions don’t change that.
| Practice | Content | Why |
|---|---|---|
| PostgreSQL as first choice | Schema, JSONB, pgvector, extensibility — all present | Closes off the schemaless escape hatch |
| Separate OLTP and OLAP early | Don’t mix operational and analytical DBs | Analytical queries on production are dangerous |
| History tables or Event Sourcing | Don’t overwrite-update; keep history | Pays off for audit, AI, incident analysis |
| dbt tests / Great Expectations | Automate data quality checks | By the time you notice, inconsistencies are in the tens of thousands |
| Backups must run restore drills | Quarterly recovery rehearsal | Having backups doesn’t mean recovery works |
Numeric gates: tables over 10M rows need partitioning; >10k RPS triggers streaming (Kafka / Kinesis); DWH selection from Redshift / BigQuery / Snowflake.
Data architecture failures are 5x heavier than application architecture failures. Compromises here echo for five years.
Application standards
Code design boils down to “≤300 lines per file, ≤50 lines per method, max 3 levels of nesting.” Sticking to those primitive numerics prevents most maintainability problems. More effective than DDD or Clean Architecture theatrics.
| Practice | Content | Threshold |
|---|---|---|
| Single Responsibility Principle splits | One class / file / method = one responsibility | ≤300 lines/file, ≤50 lines/method |
| Business logic in the app | Don’t push it into stored procedures | Preserves DB migration optionality |
| Don’t swallow errors | catch -> log + rethrow | Swallowing is fatal for incident detection |
| Domain-term naming | Avoid data, manager, util | Names that make intent readable |
| Constructor injection | Avoid all-static | Foundation for testability |
| Optional / Result types | Replace null with type-encoded states | Eliminates missed null checks |
Teams that quietly stick to numeric upper bounds tend to produce more in five years than teams flexing complex theory.
Frontend standards
The current industry standard is the three-piece set: meta-framework + utility CSS + managed authentication. Hand-rolled auth, in-house CSS systems, and raw React routing rarely produce returns proportional to their effort.
| Practice | Content | Examples |
|---|---|---|
| Use a meta-framework | Don’t hand-roll routing, SSR, build | Next.js / Astro / Remix |
| JWT in HttpOnly Cookie + BFF | localStorage storage banned | Hide tokens behind a BFF |
| Auth via SaaS | Clerk / Auth.js / Auth0 / Cognito | DIY auth is a vulnerability factory |
| Tailwind + shadcn/ui | Don’t build a custom CSS design system | Optimal for hiring and learning cost |
| Images via CDN transform + WebP / AVIF | Target LCP under 2.5s | Core Web Vitals work |
| JS bundle ≤ 170KB (after gzip) | Code-split for staged delivery | Initial render under 3s on 3G |
For SEO-critical: SSG / ISR. Dashboards: CSR + API. Content + interactivity: SSR + RSC (React Server Components — server-side rendered, ships HTML and minimal JS to the client). That’s the current rate.
“Raw React with hand-rolled routing” is, in 2026, a poor choice. Riding a meta-framework is the standard.
Security standards
Security is cheapest when standardized from day one; bolting it on later costs 100x more. The industry default is delegate, defense-in-depth, least privilege. Avoid in-house implementation thoroughly.
| Practice | Content | Required level |
|---|---|---|
| Auth delegated to IDaaS | Auth0 / Cognito / Clerk / Okta | Day 1 of new services |
| MFA mandatory for all users | TOTP / Passkey over SMS | No exception for admins |
| Standardize on Passkey | FIDO2-based passwordless | Standard for new services today |
| TLS 1.3 mandatory, 1.2 minimum | Disable 1.0 / 1.1 | Enforced on all traffic |
| Secrets in Vault / Secret Manager | Detect Git contamination via pre-commit hook | Day 1 of development |
| Zero Trust assumed | Don’t trust the inside of the VPN | Authenticate every request |
| PII masking in logs | Don’t log raw personal data | Data protection law compliance |
Don’t build it, delegate it is the iron rule. The number of organizations in any country where in-house security implementation is justified can be counted on one hand.
Monitoring / operations standards
“Running” and “observable” are different things. Build the visualization stack from the start and run operations on numbers, not gut feel — the SRE-style standard configuration.
| Practice | Content | Target |
|---|---|---|
| Standardize structured logs (JSON) | Format that assumes search and aggregation | From day 1 |
| SLO / SLI / Error budget | Discuss availability numerically | 99.9% (43 min downtime per month) |
| Three pillars assembled | Metrics, logs, traces unified | Datadog / New Relic / Grafana Stack |
| On-call + PagerDuty | Reliably reach a human with alerts | Night shifts always rotate |
| Runbook maintenance | Document procedures for major incidents | Granular enough for new hires to handle nights |
| Mandatory postmortems | Record cause and countermeasure post-incident | Recurrence prevention, not blame |
| Production changes only via CI/CD | Ban production SSH | Reproducibility, audit-ability guaranteed |
“What’s not visualized may as well not exist” is the operations community’s shared assumption.
Process / organization standards
The most common project failure mode is winning on tech and losing on process. The three things that determine long-term operational success: decision records, phased migration, buyer-side understanding.
| Practice | Content | Effect |
|---|---|---|
| Leave decisions in ADRs | One-pager per “why” | Prevents the “nobody can answer in 3 years” problem |
| Strangler Fig phased migration | Avoid big-bang rewrites | Avoids years and millions in burning projects |
| PoC -> implementation order | Always validate uncertain tech first | Estimation accuracy moves by 2x+ |
| Mix architects and implementers | Avoid isolated idealism | Designs aligned with the field |
| Use Conway’s Law in reverse | Reverse-engineer system boundaries from team structure | Align team and API boundaries |
| Buyer also understands architecture | Don’t outsource without understanding | Operations handoff stays possible |
A state where “the buyer doesn’t understand what’s running their tech” is a future explosion. This is industry common knowledge.
AI-era standards
When AI-driven development is assumed, “can AI fluently write or read this?” moves to the center of the selection axis. The four things that determine AI-compatibility: mainstream framework, type safety, declarative, CLI-operable.
| Practice | Content | AI-era effect |
|---|---|---|
| Lean on mainstream frameworks | Next.js / Django / Rails / FastAPI | Volume of training data drives productivity |
| Make types and schemas explicit | TypeScript / Pydantic / dbt models | Suppresses AI hallucinations |
| Tools operable via CLI / API / IaC | Avoid GUI-only tools | Lets AI take over operations |
| Build data catalogs and metadata | Document descriptions, tags, relationships | RAG and AI agents jump in accuracy |
| AI-generated code goes through normal review | No skip-the-checks deploys | Prevents vulnerabilities from reaching production |
| Design assuming pgvector / Pinecone | Don’t bolt vector search on later | Cuts cost of adding RAG features |
The current design principle centers on “can AI fluently write and read this?” Choosing along that axis ends up producing human-friendly designs as a side effect.
Combinations that win with “boring tech”
Stack Overflow running .NET + SQL Server + Redis on 9 servers for 100M+ monthly hits is the canonical “winner who picked boring tech” story. Conversely, Uber’s post-2,200-microservice consolidation into DOMA (Domain-Oriented Microservice Architecture) is frequently cited as “the post-mortem on going to maximum decomposition.”
The current no-fail default combination is below. For new projects, pull from this stack and only swap the elements you really need to.
| Layer | Default |
|---|---|
| Cloud | AWS (Tokyo region) |
| Runtime | ECS Fargate |
| DB | PostgreSQL + pgvector |
| Backend | Python (FastAPI / Django) or Go |
| Frontend | Next.js + Tailwind + shadcn/ui |
| Auth | Clerk or Auth.js |
| Monitoring | Datadog or Grafana Cloud |
| IaC | Terraform |
| CI/CD | GitHub Actions |
Teams that can keep a boring stack running straight for five years are, in the end, the strongest teams.
Author’s note — picking “the standard” is a fight against feeling lame
The unexpectedly hard part of architecting, repeatedly mentioned in industry circles, is shaking off the three temptations of “latest,” “cool,” “educational.” Conference-spotlight cases assume “substantial scale, substantial team, substantial budget” — copying them with a 10-person team usually means the core feature work doesn’t happen.
Shopify still running Ruby on Rails as a monolith at massive scale, Basecamp deliberately building HEY’s email service on “boring tech,” Amazon still using C++ and an in-house RPC behind S3 after 20 years — all of these point to the courage to accept being boring as a common trait of winners.
Teams that swallowed the lameness and leaned on standards keep humming five years later. The thing an architect should be proud of is not “how new the technology is” but “the product has been running for five years without stopping.” That, more than anything, is what people who keep doing this work for a long time keep in mind.
Self-check checklist
Confirm whether the standards are in place. Failing 3+ items is a red zone; revisit the relevant references.
- Production decisions are recorded in ADRs.
- All resources managed via Terraform / CDK (no manual setup).
- RDS / DBs always in private subnets.
- Auth delegated to IDaaS (Auth0 / Cognito / Clerk / etc.).
- MFA mandatory for all users.
- Logs output as structured JSON.
- SLOs explicitly defined numerically (e.g., 99.9%).
- Production changes always via CI/CD.
- Runbooks documented for major incidents.
- Backup-restore drills run regularly.
How to make the final call
The essence of best practices is adopting the combination most field teams have survived with as your opening move. Other locally superior options exist, but options where information density, talent availability, and operational track record all line up are typically narrowed to one or two at any given moment.
In the AI era this trend deepens. Frameworks AI writes fluently in produce multi-fold productivity boosts, widening the gap with niche options. Boring but standard directly translates into designs friendly to humans and AI alike.
Selection priority
- Standard / majority — information density and hire-ability are long-term winners.
- Managed / SaaS — wins on vulnerabilities and operational cost over DIY.
- Type-safe / declarative — boosts both AI accuracy and maintainability.
- Phased migration possible — avoid big-bang rewrites, leave room to swap.
“Choose the standard, and nobody can blame you in 3 years.” Eccentric selections, even when successful, become tribal knowledge; when they fail, they isolate you.
Summary
This article covered the best-practice catalog end-to-end — domain-by-domain defaults, boring tech, the AI-era standard stack, and the design posture for systems that hum quietly for five years.
Lean on the standard, delegate to managed, fasten with types, migrate in phases. That is the realistic answer for best practices in 2026.
The next article covers the “major incident catalog” — Knight Capital, Equifax, SolarWinds, CrowdStrike, and other cases where the industry paid hundreds of millions of dollars. A practical reference for learning from those bills.
Back to series TOC -> ‘Architecture Crash Course for the Generative-AI Era’: How to Read This Book
I hope you’ll read the next article as well.
📚 Series: Architecture Crash Course for the Generative-AI Era (88/89)