About this article
This article is the System Architecture category’s final installment (11th) in the Architecture Crash Course for the Generative-AI Era series, covering cost management (FinOps) in cloud.
Cloud’s pay-as-you-go can be $1k or $100k a month depending on one decision. The article covers what makes bills balloon, the three phases of FinOps (continuous optimization of cloud-cost operation), pricing models, tag strategy, management levels by scale, and the LLM-era new cost lenses.
More articles in this category
The day your finger trembles opening the invoice
The umbrella term for “continuously visualizing and optimizing cloud usage cost” is FinOps. Not mere savings — investment optimization with the framing “pay where it matters, cut waste.” Cheap doesn’t mean win; whether spend matches business value is what FinOps keeps asking.
Bills hitting 3-10x the estimate are not unusual: too many logs, instances left running, “behind-the-scenes calls” of managed services. Most of these are preventable at design time; “didn’t read the pricing model at design” is the simple omission that causes them.
Cost is a non-functional requirement that hits monthly once operations start. Defending at design time is the rule.
Why think about it first
1. 3-10x billing accidents are common
Too much logging, idle instances, managed-service “behind-the-scenes calls” — bills coming in 3-10x the estimate are recurring. LLM APIs especially scale to thousands of dollars monthly fast; forgetting prompt caching alone spikes the bill.
2. Retroactive fixes are expensive
Reorganizing the architecture to optimize cost is close to a One-way Door. Without thinking at initial design, you regret it. “Think about it after launch” almost always produces months of wasted spending.
3. Pair design with approval authority
Cost is an executive-decision domain. Up to $1k a month: field decision. $10k+: executive approval. “Spending governance” must be agreed at design time. Leaving this vague at launch produces fights between executive and field at month-end.
”Common pitfalls” that explode bills
The typical causes of bill explosions repeat the same patterns surprisingly often. They share “could have been caught at design time.”
| Pitfall | What happens |
|---|---|
| Forgetting to stop dev environments | Charged 24h including nights and weekends — hundreds of dollars/month |
| Excessive log output | CloudWatch Logs cost exceeds production-app cost |
| Communication via NAT Gateway | Data transfer costs balloon unexpectedly |
| Over-spec managed DB | ”Just in case bigger” can be 10x the cost |
| S3 old objects left | Without deletion policy, charges accumulate |
| Data egress | Cross-region / cross-AZ — hard to see |
Egress (traffic leaving the cloud) is the most overlooked billing item.
FinOps in 3 phases
FinOps spins through “Inform (visualize) -> Optimize -> Operate (operationalize).” Many companies skip even “visualize” and jump to savings tactics that don’t stick.
flowchart LR
INFORM["Inform<br/>(visualize)<br/>Who, what, how much"]
OPT["Optimize<br/>(optimize)<br/>Delete / size / discounts"]
OPER["Operate<br/>(operationalize)<br/>Budget governance, continuous improvement"]
INFORM --> OPT --> OPER
OPER -.continuous improvement loop.-> INFORM
BAD[Common failure:<br/>jump to Optimize<br/>regress quickly]
BAD -.->|untagged| INFORM
classDef inform fill:#dbeafe,stroke:#2563eb;
classDef opt fill:#fef3c7,stroke:#d97706;
classDef oper fill:#fae8ff,stroke:#a21caf;
classDef bad fill:#fee2e2,stroke:#dc2626;
class INFORM inform;
class OPT opt;
class OPER oper;
class BAD bad;
| Phase | Substance | Tools |
|---|---|---|
| Inform | Visualize who / what / how much | AWS Cost Explorer, GCP Billing Reports, Azure Cost Management |
| Optimize | Delete unused, right-size, apply discounts | AWS Compute Optimizer, Trusted Advisor |
| Operate | Governance, budget caps, continuous improvement organization | AWS Budgets, tag conventions, weekly reviews |
Without going “visualize -> optimize -> operationalize” in order, short-term savings don’t stick and revert. Skipping the Operate phase (the org-level mechanism for keeping it going) leaves it as a one-time event.
Major pricing models
Cloud’s base pricing is pay-as-you-go, but long-term commitments offer significant discounts. Understanding models is mandatory because choices can halve the bill.
| Model | Substance | Discount |
|---|---|---|
| On-demand | Pay as you use | Baseline (list price) |
| Reserved Instances (RI) | 1-3 year commit | Up to 70% off |
| Savings Plans | Usage commit (AWS) | Up to 70% off |
| Spot / Preemptible | Interruptible / spare capacity | Up to 90% off |
| Committed Use Discounts | Long-term discount (GCP) | Up to 57% off |
Reserved for steady-load production, Savings Plans for variable load, Spot for batch — the standard pattern. RI is loss if unused, so coverage and utilization monitoring is part of operations.
Visualization basics: tag strategy
The premise for cost visualization is “a tagging convention covering all resources.” Untagged resources show up as “unknown cost” on the invoice; nobody knows who used them or why.
| Required tag | Examples |
|---|---|
| Environment | prod / staging / dev |
| Project | project-alpha |
| Owner | team-a / foo@example.com |
| CostCenter | Department code |
| Service | web / api / batch |
“A mechanism to enforce tagging” (auto-stop untagged resources, etc.) combined with IaC is the modern standard. “Tag it later” never happens.
Decide the tag convention at the start of design. Bolting on after launch is hell.
Decision criteria — scale and phase
The strictness of cost management changes by monthly spend and business phase.
| Phase | Monthly target | Management level |
|---|---|---|
| MVP / validation | up to $1k | Budget alerts only |
| Early growth | $1k-10k | Tags + monthly review |
| Scale | $10k-100k | Dedicated FinOps, RI/SP usage |
| Enterprise | $100k+ | Specialized team, automated optimization |
Excessive optimization at MVP is a waste of time, but past around $5k/month, “running cost reviews weekly” starts paying off.
Costs decided by architecture characteristics
Architecture selection itself decisively impacts cost.
| Design decision | Cost impact |
|---|---|
| Going serverless | Free when idle; effective at low/mid traffic |
| Containerization | Higher density; multiple services on one machine |
| Managed-service dependence | Operational cost down, unit cost higher; watch egress |
| Multi-region | Availability up, but cost premise is double |
| Data transfer volume | Cross-region, cross-AZ, CDN — be aware |
“Serverless vs container” especially produces 2x+ differences depending on request patterns; evaluate carefully.
Selection by case
Startup MVP
Set a monthly cap alert via AWS Budgets. Tags can be just Environment / Owner. Default to serverless (Lambda, Vercel, Cloud Run) at zero fixed cost; revisit when growing. Supabase / Neon free tiers are more rational than managed RDBs.
Mid-sized B2B SaaS
Monthly review via Cost Explorer, mandate 5 tag fields, get 30-50% off via RI / Savings Plans. Shorten CloudWatch Logs retention to 7 days, auto-stop unused dev environments at night. At this scale, manual ops hits a ceiling and writing automation scripts becomes worth it.
Large enterprise
Dedicated FinOps team, AWS Organizations for per-department account separation, auto-detect / auto-stop untagged resources. Apply AWS Compute Optimizer recommendations monthly, maintain RI coverage at 80%+. Multi-cloud management tools (CloudHealth, Apptio) come into view.
Data platforms / AI utilization
BigQuery bills by query volume — “SELECT * is fatal.” Snowflake’s warehouse size bites directly. Working columnar with explicit columns, partition pruning, query caching is high-impact.
LLM (Large Language Model) usage scales to monthly tens of thousands of dollars instantly via tokens × requests. Prompt caching can cut input cost 10x; model selection (Haiku → Sonnet → Opus) is 10x. Per-user upper bounds and abuse prevention are mandatory.
FinOps operational traps
Cost is the area where “don’t do anything and your bill doubles by the time you notice.” 90% gets decided by the first operational rules.
| Forbidden move | Why |
|---|---|
| Decide tag conventions after creating resources | Retroactive tagging never finishes; unknown cost exceeds 50% |
| Buy RI / Savings Plans without coverage monitoring | ”Dead-bought” expires unused — 1-3 year commitments lock in losses |
| Don’t auto-stop dev, run 24/7 | Sum of dev/staging frequently exceeds production |
| DEBUG logs in production | CloudWatch Logs bills $10k+/month — filter to INFO+ and shorten retention |
| Adding NAT GWs unconsciously | One = $30-60/month + traffic; charges accumulate even in unused environments |
| No S3 lifecycle policies | Years-old videos / images charged at Standard class indefinitely |
| Multi-region without estimating egress | Cross-cloud / cross-region / cross-AZ traffic comes 10x the estimate |
| First notice of cost anomaly is the month-end invoice | Loss already locked in for the month — set up AWS Cost Anomaly Detection |
| Provide LLM API without per-user cap | Malicious users’ prompt loops can hit $10k overnight |
| Adopt managed services without reading the back-side billing | The worst contracts come from not reading the docs. Weekly invoice review is the only defense |
Even Amazon internally publicized that the Prime Video team in 2023 went back from microservices to monolith and cut cost by 90%. Architecture selection drives cost; not reading pricing models at design time is fatal.
Cost incidents come from “forgot to stop.” Defend with auto-stop, not human discipline.
The AI-era lens
With AI-driven development and AI utilization as the assumption, cost management moves from “humans optimize” to “AI suggests and auto-applies continuously.” AWS Cost Anomaly Detection and Compute Optimizer are already AI-based, and “anomaly detect -> root cause -> recommend -> auto-apply” is becoming standard.
| AI-era favorable | AI-era unfavorable |
|---|---|
| Tag conventions and IaC | Manual provisioning |
| Standard service configurations | Unique exotic configurations |
| Auto-collected usage metrics | Monitoring not in place |
| LLM token management | Unlimited prompts |
Meanwhile, AI’s own usage cost becomes a new management target. LLM APIs scale to tens of thousands of dollars monthly fast; prompt caching, model selection, and per-user upper bounds are the new pillars of FinOps.
In the AI era, “AI’s own usage cost” becomes FinOps’s central topic.
Common misreadings
- “Optimize after launch” -> Architecture can’t be changed later. Folding pricing models into design is the rule. The canonical soft choice.
- “RI buying is savings” -> Loss if unused. Mass-buying without coverage and utilization monitoring leads to repeated waste.
- “Managed services are expensive” -> Unit cost is higher, but TCO including operational labor and incident response is often lower. Decide including labor.
- “Dev-environment cost is negligible” -> Sum of dev / staging exceeds production not rarely. Auto-stop and auto-deletion are mandatory.
”Year-end / new-year billing” (industry case)
A team rushing through year-end release work spun up GPU instances for verification, planned to “stop them later,” and went into the new-year break. On the first day back, the cost dashboard showed the monthly bill at over 10x normal, with the culprits being those forgotten GPU instances.
Personal-developer episodes exist: a verification r5.2xlarge left running over the weekend melted hundreds of dollars. Hundreds is funny; tens of thousands on GPUs isn’t.
The lesson: “‘Stop it later’ depends on human will, and almost always breaks down.” Embed mandatory auto-stop scripts at night/weekends with tags from day one. Defend cost with design, not willpower.
Don’t scold forgotten stops. Tilt the system to stop on its own.
What you must decide — what’s your project’s answer?
Articulate your project’s answer in 1-2 sentences for each:
- Tag convention (Environment / Project / Owner / CostCenter)
- Budget caps and alerts (monthly / weekly)
- Pricing-model selection (on-demand / RI / Savings Plans)
- Dev-environment auto-stop policy
- Log retention and data-deletion policy
- Cost-review cadence (weekly / monthly)
- LLM / AI-service usage caps
Common failure patterns
- Operating without tags, no idea who’s using what -> Without tag conventions at the start, no improvement starting point.
- Mass-bought RIs unused, dead spend -> Coverage and utilization monitoring are the precondition.
- DEBUG-log streaming, $10k+/month -> Filtering to INFO alone produces large reductions.
- NAT GWs accumulating per environment as hidden cost -> Delete unused environment NAT GWs immediately.
- No LLM API cap -> prompt loop hits $10k overnight -> Per-user caps are mandatory from the start.
How to make the final call
The substance of cost management is the framing “fold the pricing model into design and visualize continuously.” Cloud bills usage on a per-use basis, so design decisions hit the bill every month directly.
Decide the tag convention at the start, then run Inform (visualize) -> Optimize -> Operate (operationalize) in three phases. Untagged resources hide in the invoice as “unknown cost,” with no even-starting point for improvement. Choosing structural optimization over short-term savings is the architect’s job.
The decisive axis is the demand “put AI’s own usage cost at the center of FinOps.” LLM APIs are a new line item that can hit thousands of dollars overnight. Prompt caching, model selection, and per-user upper bounds become operational basics.
On the infrastructure side, AI-based anomaly detection like AWS Cost Anomaly Detection is standardizing; FinOps is moving from manual ops to automated optimization.
Selection priority:
- Tag convention and visualization first — without sight, nothing improves.
- Pick pricing models at design time — retroactive fixes are huge cost.
- Management level matched to scale — dedicated FinOps for MVP is excess; mandatory at scale.
- AI usage cost as the new central axis — LLM and AI-agent spending become the focus.
“Fold in at design, polish in operations.” FinOps’s substance is investment optimization, not savings.
Summary
This article covered cloud cost management (FinOps) — the patterns of bill explosion, 3-phase operation, tag strategy, management levels by scale, LLM-era topics.
Set the tag convention first, fold pricing models into design at design time, operationalize by scale. In this order, you avoid the “open the invoice and turn pale” incident.
This concludes the System Architecture category’s 11 articles. The next article opens the Software Architecture category — monolith vs microservices, language selection, API design, and other selection axes for software’s internal structure.
Back to series TOC -> ‘Architecture Crash Course for the Generative-AI Era’: How to Read This Book
I hope you’ll read the next article as well.
📚 Series: Architecture Crash Course for the Generative-AI Era (16/89)