About this article
This article is the third deep dive in the “System Architecture” category of the Architecture Crash Course for the Generative-AI Era series, covering how to choose a cloud vendor.
A casual “AWS, I guess” decides hiring, monthly cost, incident response, and regulatory handling for the next 10 years. AWS / Azure / Google Cloud are among the highest-exit-cost selections in software; later migration is essentially a system rebuild. This article covers each vendor’s strengths and weaknesses, world share, major services, and recommendations by scale and industry.
More articles in this category
Where do you put the infrastructure?
Cloud vendors rent servers, storage, DBs, and app platforms over the Internet. AWS, Azure, and Google Cloud together hold ~70% of the world cloud market. Which vendor you pick fixes your tech stack, talent strategy, and cost structure for a decade.
Vendor selection has the trait of being hard to undo. Hundreds of services entangle once operations begin, and migration to another vendor becomes nearly impossible. This is vendor lock-in.
The abstraction of “keep it ready to migrate at any time” almost always degenerates into over-engineering. Lock-in is not something to avoid — it’s to accept and exploit deeply, which is the realistic stance in 2026.
Vendor selection is hard to change once made. Decide carefully, with lock-in as the assumption.
Amazon Web Services (AWS)
AWS launched in March 2006, the cloud-computing pioneer. It has held #1 world share into 2026. Originally Amazon’s internal e-commerce infrastructure released externally — technical reliability and scale are overwhelming.
| Strengths | Weaknesses |
|---|---|
| 250+ services — the largest catalog | Too many services; specialty knowledge needed |
| Rich community, info, tooling | Complex pricing makes cost management hard |
| Most regions; resilient to outages | High support fees |
| Easier to hire engineers | Distinctive UI requires getting used to |
Information density, adoption, ease of hiring — all surpass competitors. The lowest-regret pick. “Default new projects to AWS unless specified otherwise” is the rule. The downside: the service catalog itself is a learning wall.
AWS’s strength is “being the de facto standard.” Article counts on books, blogs, Stack Overflow; Terraform module counts on GitHub; new-grad and mid-career hires with AWS experience — all 2-3x the #2 (Azure). In the AI era this gap deepens: AI’s accuracy on AWS code is one tier above other vendors.
Default new builds to AWS. Deviate only when you can write the reason.
Microsoft Azure
Microsoft Azure launched in February 2010, an enterprise-strong cloud. World #2, closing on AWS without slowing.
| Strengths | Weaknesses |
|---|---|
| Easy integration with Microsoft products | Some operational rough edges vs AWS / GCP |
| 200+ services, hybrid via Azure Arc | Non-Windows support tends to lag |
| Strong enterprise contracts and identity | New features often slower to roll out than AWS |
| Strong compliance posture for finance / public sector | Individual-developer UX is mediocre |
For companies on Office 365 / Microsoft 365, Azure is the natural unification target. Active Directory (Microsoft’s integrated identity)-anchored SSO (Single Sign-On) and permission management work seamlessly — a direct reason many enterprises pick Azure. Industries with strict compliance (finance, healthcare, government) also lean here.
Azure’s growth post-2023 is also driven by the exclusive partnership with OpenAI. ChatGPT / GPT-4 / GPT-5 family models offered to enterprises via Azure OpenAI Service produced a new market, “Azure for embedding generative AI in operations.” AWS Bedrock and GCP Vertex AI have caught up substantially, but the OpenAI exclusive remains an Azure strength as of 2026.
Microsoft-centric companies and compliance-heavy industries: Azure is the favorite.
Google Cloud (GCP)
Google Cloud (GCP) launched in April 2008, the third of the big three. Originally Google’s own infrastructure (YouTube, Gmail, search) opened externally. Has overwhelming technical leadership in containers, Kubernetes, AI/ML.
| Strengths | Weaknesses |
|---|---|
| Google-services integration (YouTube/Maps, etc.) | Smallest catalog (~150) |
| Strong on Kubernetes / containers | Service deprecation / changes more frequent |
| BigQuery, Gemini, etc., excellent AI | Fewer large-scale enterprise references |
| Sustained-use discounts trend lower | Engineer hiring is harder |
Kubernetes was originally developed by Google; the GKE (Google Kubernetes Engine) is the industry’s most polished. BigQuery and Gemini sit at a level that competitors haven’t matched.
On the other hand, GCP carries a historical reputation for “shutting things down.” The Cloud IoT Core wind-down (announced 2022, ended 2023) and a string of consumer-side closures (Reader, Hangouts, Inbox, Wave) left enterprises with the lingering doubt “Google shuts things down when they get bored.” Even on equal feature comparisons, this “personality” difference matters in long-term operations.
For analytics-, AI-, or Kubernetes-centric projects, the top choice. The long-term-reliability concern needs to be evaluated separately.
World share comparison
World cloud market share, 2025 Q3. The big three together hold ~70% — the “three giants” era continues.
pie showData
title World cloud market share (2025 Q3)
"AWS" : 29
"Azure" : 25
"Google Cloud" : 13
"Others (Alibaba/Oracle/IBM/etc.)" : 33
| Vendor | Share | Trend |
|---|---|---|
| AWS | ~29% | Flat, holding #1 |
| Azure | ~25% | Continually growing, closing on AWS |
| Google Cloud | ~13% | Growing, presence in AI |
Azure’s catch-up comes from migrating Microsoft 365 customers and OpenAI-driven generative-AI adoption. GCP grows on AI-domain presence but the gap to #2 remains large.
Major services compared
Functions are largely equivalent across vendors despite different names. Each vendor races to match competitor features, so basic system construction is covered everywhere.
Compute / storage
| Category | AWS | Azure | GCP |
|---|---|---|---|
| Virtual machines | EC2 | Virtual Machines | Compute Engine |
| Container management | ECS / EKS | AKS | GKE |
| Serverless | Lambda | Azure Functions | Cloud Functions |
| Object storage | S3 | Blob Storage | Cloud Storage |
The impression that “AWS has more features” applies to niches. Basics are covered by all three; differentiation lands at “ML -> GCP, identity -> Azure, overall -> AWS.”
DB / CDN / AI
| Category | AWS | Azure | GCP |
|---|---|---|---|
| Managed DB | RDS | Azure SQL | Cloud SQL |
| NoSQL | DynamoDB | Cosmos DB | Firestore / Bigtable |
| CDN | CloudFront | Azure CDN | Cloud CDN |
| AI / ML | SageMaker / Bedrock | Azure AI / OpenAI Service | Vertex AI / Gemini |
In NoSQL, AWS DynamoDB leads on stability and large-scale references. Azure Cosmos DB supports multi-model (document, graph, key-value, etc.); GCP Firestore is strong for mobile-app integration.
Basics line up everywhere. The difference is maturity and fit of individual services.
Selection criteria
There’s no “absolute right answer” — every vendor has more than enough features. Choose by fit with your tech assets and constraints. That’s it.
| Situation | Recommended |
|---|---|
| No special preference / restriction | AWS (safest) |
| Heavy use of Microsoft products | Azure |
| Strict compliance (finance, public sector) | Azure (rich identity) |
| Need Google-services integration | GCP |
| Analytics-/AI-centric | GCP (BigQuery, Gemini) |
| Kubernetes-centric | GCP (GKE) |
For new startups and individuals, info density, free tiers, and engineer availability make AWS the safest. With clear reasons (AI-forward -> GCP, existing Microsoft stack -> Azure), other choices are worthwhile.
Recommendations by org size × industry
Starting “AWS by default” can backfire for Microsoft-centric companies or BigQuery-fitting domains. The better classifier is the intersection of scale and industry.
| Situation | Org size | Recommended | Reason |
|---|---|---|---|
| New B2B/B2C web SaaS | up to 100 | AWS | Info density, managed services, easy hiring |
| Microsoft 365-core large enterprise | 1000+ | Azure | Entra ID / AD integration, Office, Teams |
| Analytics / AI core | Any | GCP | BigQuery, Gemini, Vertex AI |
| Finance / insurance (core systems) | Large | AWS or Azure (FISC-certified) | Compliance certifications |
| Public sector | — | Government Cloud-certified vendor | Procurement requirement |
| Kubernetes core | up to mid | GCP (GKE) | Origin Kubernetes maturity |
| Already running on AWS | — | Continue on AWS | Migration cost too high to justify |
Judge across current vs 5-year future. Avoiding Azure for a Microsoft 365-dependent company is a bad fit; existing-platform integration cost can flip even AWS’s information-density advantage.
The era of choosing on features is over. “Fit with existing assets” is the strongest axis now.
Domestic clouds in Japan
Beyond the big three, domestic clouds and specialty clouds exist. Demand from data sovereignty (data on Japanese soil) and yen-denominated billing supports the segment.
| Vendor | Trait |
|---|---|
| Sakura Internet | Domestic, transparent pricing, used by government statistics |
| IDCFrontier | Domestic, SoftBank-affiliated |
| Oracle Cloud | Generous free tier, Oracle DB compatibility |
| Alibaba Cloud | Strong China-market support |
In public-sector settings, “Government Cloud”-certified vendors (AWS, Azure, GCP, Oracle, Sakura) become eligible — a policy dimension layered onto cloud selection.
Without special requirements, the big three are sufficient. Domestic options come into play for “data must be in Japan” mandates.
Is multi-cloud effective?
Multi-cloud sometimes gets considered for “vendor lock-in avoidance,” but in practice the operational-complexity downside is too large to recommend casually. Effective only when there’s a clear reason:
- Regulatory requirements forbid certain data on specific vendors.
- Post-M&A systems scattered across vendors.
- Want a specific AI feature (GCP Gemini, etc.) for one piece only.
- BCP requires availability during one vendor’s outage.
Picking multi-cloud “because lock-in is uncomfortable” doubles cost and operational difficulty (specialists for each cloud). Leaning to a primary vendor is overwhelmingly more efficient.
Multi-cloud is only when a clear need exists. Don’t default to it.
Vendor migration / lock-in escape traps
Aiming for “abstraction so you can move anytime” almost certainly fails. The moment migration touches vendor-specific managed services, costs balloon to 3x the estimate.
| Forbidden move | Why |
|---|---|
| DIY abstraction layer “for future migration” | The maintenance burden of the abstraction code itself outweighs the benefit, and the migration never happens — canonical over-engineering |
| Planning migration with vendor-specific managed services (DynamoDB / BigQuery / Cosmos DB) at the core | Data model changes too — not just code rewrite, design redo |
| Big-bang migration switch | No retreat path if something breaks; 3-6 months parallel run is the floor |
| Skipping data-egress estimation | Multi-TB to PB transfer fees alone reach thousands of dollars |
| Starting migration before redoing IAM | Permission models differ per vendor; permission drift causes incidents during migration |
Migration calls should pass the test “migration cost > 3 years of lock-in cost” before kickoff. The AI era helps in some areas (“AI can write code for the new vendor”), but data movement and managed-feature replacement remain heavy.
Lock-in is accepted, not avoided. The portion you can escape with abstraction is smaller than it looks.
The AI-era lens
With AI-driven development as the assumption, the selection axis pivots to “how well does AI know this vendor?”
AWS has overwhelming info density, official docs, and sample code; AI’s Terraform / CDK accuracy is highest there. Azure and GCP also have enough info, but niche domestic / mid-tier vendors have thin training data. Generated code that doesn’t run or calls non-existent APIs (hallucination) becomes more frequent.
| AI-era favorable | AI-era unfavorable |
|---|---|
| AWS (max info, max samples) | Domestic / minor clouds (sparse training data) |
| Heavy Terraform / CDK support | Custom consoles only |
| Mature APIs, rich docs | Closed specs, oral-tradition know-how |
| Multi-region in code | Manual-build legacy designs |
The lock-in problem is not solved by AI. Migration to other vendors remains heavy construction. “The weight of the first selection hasn’t changed.” Don’t overrate AI’s help here.
The AI era favors “the vendor AI knows best.” AWS’s lead widens further.
”Things that might get killed” — the fear (industry case)
Google Cloud IoT Core’s wind-down was announced August 2022 and ended August 2023. Companies running it in production scrambled to pick a replacement and rewrite. Google has a history of closing consumer services (Reader, Hangouts, Inbox, Wave), and the doubt “Google shuts things down when they get bored” doesn’t go away — a frequent enterprise complaint.
AWS strongly emphasizes “once shipped, generally doesn’t get killed” as a stance, and the difference shows up in long-term-operations confidence. Even with feature parity, layering in “will this service exist in 10 years?” changes the picture. After IoT Core, “will this still be here in 5 years?” became unavoidable in GCP service selection.
Vendor selection means looking past the feature table to the vendor’s “personality.” Features look similar, but corporate culture and operating posture differ.
Cloud selection is a judgment about “personality,” not “features.” Vendors that kill things vs vendors that don’t.
What you must decide — what’s your project’s answer?
Articulate your project’s answer in 1-2 sentences for each:
- Primary cloud vendor (AWS / Azure / GCP)
- Region (Tokyo / Osaka / overseas)
- Existing-system integration
- Compliance (finance, healthcare, public)
- Whether multi-cloud is on the table
- Domestic-data-sovereignty requirement
- Engineer-supply realism
Common failure patterns
- “Pick AWS for max features” with a Microsoft-stack company — Without checking the existing environment, integration cost vaporizes the savings. Cases where Azure was the right answer.
- Complex multi-cloud from day one — Ops team exhausts itself, becomes unmanageable. Canonical bad fit.
- Skipping cost projection — Real billing comes in at multiples of the estimate, blowing the budget.
- Abstracting “for future vendor change” — Over-engineering tanks development speed; the migration never happens.
- Region in the US / Europe, violating regulation — In industries that mandate domestic storage of personal data.
How to make the final call
Vendor selection is decided more by “balance of existing assets and future lock-in” than by technical correctness. All three cover the basics; the era of feature-based selection is over.
What’s effective now: affinity with existing platforms (Microsoft 365 -> Azure, Google Workspace -> GCP) and the depth of internal hiring / training capacity. Two axes.
Lock-in is healthier accepted as a premise than avoided. “Anytime-migrate abstraction” tends to over-engineer, and migration doesn’t happen anyway. Leaning on one improves both operations and AI compatibility.
From the AI-driven-development angle, AWS’s info-density lead one-shots accuracy, so default to AWS unless there’s a special reason.
Selection priority:
- Affinity with existing assets (Microsoft / Google / none).
- Engineer hiring / training feasibility (AWS is most favorable).
- AI fluency in the vendor as the final differentiator.
- Multi-cloud only with a clear reason.
“The courage to lean on one” reconciles ops cost and AI productivity. Keep abstractions minimal.
Summary
This article covered how to choose a cloud vendor — strengths of the big three, recommendations by scale and industry, the lock-in posture, and AI-era judgment.
The era of choosing on features is over; affinity with existing assets and AI’s information density now decide it. New, no constraint -> AWS. Microsoft stack -> Azure. AI / data -> GCP. Lean on one and use it deeply — the realistic answer for 2026.
The next article covers the major decision after vendor: the runtime (VM / container / serverless / Wasm).
Back to series TOC -> ‘Architecture Crash Course for the Generative-AI Era’: How to Read This Book
I hope you’ll read the next article as well.
📚 Series: Architecture Crash Course for the Generative-AI Era (8/89)