About this article
This article is the third deep dive in the “System Architecture” category of the Architecture Crash Course for the Generative-AI Era series, covering how to choose a cloud vendor.
A casual “AWS, I guess” decides hiring, monthly cost, incident response, and regulatory handling for the next 10 years. AWS / Azure / Google Cloud are among the highest-exit-cost selections in software; later migration is essentially a system rebuild. This article covers each vendor’s strengths and weaknesses, world share, major services, and recommendations by scale and industry.
What is a cloud vendor in the first place
A cloud vendor is, roughly speaking, “a company that rents you servers and databases over the Internet.”
Imagine a rental apartment management company. Instead of buying land and building your own office (on-premises), you rent rooms (servers) prepared by the management company and pay utilities (pay-as-you-go). Floor-plan changes (configuration) are fairly flexible, but you follow the building’s rules (vendor specifications). The top three are AWS, Azure, and Google Cloud, together holding about 70% of the world cloud market.
Why cloud vendor selection matters
Once operations begin, hundreds of services interlock and migration to another vendor becomes nearly impossible — this is vendor lock-in. The abstraction of “always keeping migration possible” almost always ends as over-engineering. Vendor selection fixes your tech stack, talent strategy, and cost structure for a decade. Lock-in is not something to avoid but to accept and leverage deeply. That’s why the initial selection is critical.
Where do you put the infrastructure?
Cloud vendors rent servers, storage, DBs, and app platforms over the Internet. AWS, Azure, and Google Cloud together hold ~70% of the world cloud market. Which vendor you pick fixes your tech stack, talent strategy, and cost structure for a decade.
Vendor selection has the trait of being hard to undo. Hundreds of services entangle once operations begin, and migration to another vendor becomes nearly impossible. This is vendor lock-in.
The abstraction of “keep it ready to migrate at any time” almost always degenerates into over-engineering. Lock-in is not something to avoid — it’s to accept and exploit deeply, which is the realistic stance in 2026.
Vendor selection is hard to change once made. Decide carefully, with lock-in as the assumption.
Amazon Web Services (AWS)
AWS launched in March 2006, the cloud-computing pioneer. It has held #1 world share into 2026. Originally Amazon’s internal e-commerce infrastructure released externally — technical reliability and scale are overwhelming.
| Strengths | Weaknesses |
|---|---|
| 250+ services — the largest catalog | Too many services; specialty knowledge needed |
| Rich community, info, tooling | Complex pricing makes cost management hard |
| Most regions; resilient to outages | High support fees |
| Easier to hire engineers | Distinctive UI requires getting used to |
Information density, adoption, ease of hiring — all surpass competitors. The lowest-regret pick. “Default new projects to AWS unless specified otherwise” is the rule. The downside: the service catalog itself is a learning wall.
AWS’s strength is “being the de facto standard.” Article counts on books, blogs, Stack Overflow; Terraform module counts on GitHub; new-grad and mid-career hires with AWS experience — all 2-3x the #2 (Azure). In the AI era this gap deepens: AI’s accuracy on AWS code is one tier above other vendors.
Default new builds to AWS. Deviate only when you can write the reason.
Microsoft Azure
Microsoft Azure launched in February 2010, an enterprise-strong cloud. World #2, closing on AWS without slowing.
| Strengths | Weaknesses |
|---|---|
| Easy integration with Microsoft products | Some operational rough edges vs AWS / GCP |
| 200+ services, hybrid via Azure Arc | Non-Windows support tends to lag |
| Strong enterprise contracts and identity | New features often slower to roll out than AWS |
| Strong compliance posture for finance / public sector | Individual-developer UX is mediocre |
For companies on Office 365 / Microsoft 365, Azure is the natural unification target. Active Directory (Microsoft’s integrated identity)-anchored SSO and permission management work seamlessly — a direct reason many enterprises pick Azure. Industries with strict compliance (finance, healthcare, government) also lean here.
Azure’s growth post-2023 is also driven by the exclusive partnership with OpenAI. ChatGPT / GPT-4 / GPT-5 family models offered to enterprises via Azure OpenAI Service produced a new market, “Azure for embedding generative AI in operations.” AWS Bedrock and GCP Vertex AI have caught up substantially, but the OpenAI exclusive remains an Azure strength as of 2026.
Microsoft-centric companies and compliance-heavy industries: Azure is the favorite.
Google Cloud (GCP)
Google Cloud (GCP) launched in April 2008, the third of the big three. Originally Google’s own infrastructure (YouTube, Gmail, search) opened externally. Has overwhelming technical leadership in containers, Kubernetes, AI/ML.
| Strengths | Weaknesses |
|---|---|
| Google-services integration (YouTube/Maps, etc.) | Smallest catalog (~150) |
| Strong on Kubernetes / containers | Service deprecation / changes more frequent |
| BigQuery, Gemini, etc., excellent AI | Fewer large-scale enterprise references |
| Sustained-use discounts trend lower | Engineer hiring is harder |
Kubernetes was originally developed by Google; the GKE (Google Kubernetes Engine) is the industry’s most polished. BigQuery and Gemini sit at a level that competitors haven’t matched.
On the other hand, GCP carries a historical reputation for “shutting things down.” The Cloud IoT Core wind-down (announced 2022, ended 2023) and a string of consumer-side closures (Reader, Hangouts, Inbox, Wave) left enterprises with the lingering doubt “Google shuts things down when they get bored.” Even on equal feature comparisons, this “personality” difference matters in long-term operations.
For analytics-, AI-, or Kubernetes-centric projects, the top choice. The long-term-reliability concern needs to be evaluated separately.
World share comparison
World cloud market share, 2025 Q3. The big three together hold ~70% — the “three giants” era continues.
| Vendor | Share | Trend |
|---|---|---|
| AWS | ~29% | Flat, holding #1 |
| Azure | ~25% | Continually growing, closing on AWS |
| Google Cloud | ~13% | Growing, presence in AI |
Azure’s catch-up comes from migrating Microsoft 365 customers and OpenAI-driven generative-AI adoption. GCP grows on AI-domain presence but the gap to #2 remains large.
Major services compared
Functions are largely equivalent across vendors despite different names. Each vendor races to match competitor features, so basic system construction is covered everywhere.
Compute / storage
| Category | AWS | Azure | GCP |
|---|---|---|---|
| Virtual machines | EC2 | Virtual Machines | Compute Engine |
| Container management | ECS / EKS | AKS | GKE |
| Serverless | Lambda | Azure Functions | Cloud Functions |
| Object storage | S3 | Blob Storage | Cloud Storage |
The impression that “AWS has more features” applies to niches. Basics are covered by all three; differentiation lands at “ML -> GCP, identity -> Azure, overall -> AWS.”
DB / CDN / AI
| Category | AWS | Azure | GCP |
|---|---|---|---|
| Managed DB | RDS | Azure SQL | Cloud SQL |
| NoSQL | DynamoDB | Cosmos DB | Firestore / Bigtable |
| CDN | CloudFront | Azure CDN | Cloud CDN |
| AI / ML | SageMaker / Bedrock | Azure AI / OpenAI Service | Vertex AI / Gemini |
In NoSQL, AWS DynamoDB leads on stability and large-scale references. Azure Cosmos DB supports multi-model (document, graph, key-value, etc.); GCP Firestore is strong for mobile-app integration.
Basics line up everywhere. The difference is maturity and fit of individual services.
Selection criteria
There’s no “absolute right answer” — every vendor has more than enough features. Choose by fit with your tech assets and constraints. That’s it.
| Situation | Recommended |
|---|---|
| No special preference / restriction | AWS (safest) |
| Heavy use of Microsoft products | Azure |
| Strict compliance (finance, public sector) | Azure (rich identity) |
| Need Google-services integration | GCP |
| Analytics-/AI-centric | GCP (BigQuery, Gemini) |
| Kubernetes-centric | GCP (GKE) |
For new startups and individuals, info density, free tiers, and engineer availability make AWS the safest. With clear reasons (AI-forward -> GCP, existing Microsoft stack -> Azure), other choices are worthwhile.
Recommendations by org size × industry
Starting “AWS by default” can backfire for Microsoft-centric companies or BigQuery-fitting domains. The better classifier is the intersection of scale and industry.
| Situation | Org size | Recommended | Reason |
|---|---|---|---|
| New B2B/B2C web SaaS | up to 100 | AWS | Info density, managed services, easy hiring |
| Microsoft 365-core large enterprise | 1000+ | Azure | Entra ID / AD integration, Office, Teams |
| Analytics / AI core | Any | GCP | BigQuery, Gemini, Vertex AI |
| Finance / insurance (core systems) | Large | AWS or Azure (FISC-certified) | Compliance certifications |
| Public sector | — | Government Cloud-certified vendor | Procurement requirement |
| Kubernetes core | up to mid | GCP (GKE) | Origin Kubernetes maturity |
| Already running on AWS | — | Continue on AWS | Migration cost too high to justify |
Judge across current vs 5-year future. Avoiding Azure for a Microsoft 365-dependent company is a bad fit; existing-platform integration cost can flip even AWS’s information-density advantage.
The era of choosing on features is over. “Fit with existing assets” is the strongest axis now.
Domestic clouds in Japan
Beyond the big three, domestic clouds and specialty clouds exist. Demand from data sovereignty (data on Japanese soil) and yen-denominated billing supports the segment.
| Vendor | Trait |
|---|---|
| Sakura Internet | Domestic, transparent pricing, used by government statistics |
| IDCFrontier | Domestic, SoftBank-affiliated |
| Oracle Cloud | Generous free tier, Oracle DB compatibility |
| Alibaba Cloud | Strong China-market support |
In public-sector settings, “Government Cloud”-certified vendors (AWS, Azure, GCP, Oracle, Sakura) become eligible — a policy dimension layered onto cloud selection.
Without special requirements, the big three are sufficient. Domestic options come into play for “data must be in Japan” mandates.
Is multi-cloud effective?
Multi-cloud sometimes gets considered for “vendor lock-in avoidance,” but in practice the operational-complexity downside is too large to recommend casually. Effective only when there’s a clear reason:
- Regulatory requirements forbid certain data on specific vendors.
- Post-M&A systems scattered across vendors.
- Want a specific AI feature (GCP Gemini, etc.) for one piece only.
- BCP requires availability during one vendor’s outage.
Picking multi-cloud “because lock-in is uncomfortable” doubles cost and operational difficulty (needing specialists for each cloud). Leaning to a primary vendor is overwhelmingly more efficient.
Multi-cloud is only when a clear need exists. Don’t default to it.
Vendor migration / lock-in escape traps
Aiming for “abstraction so you can move anytime” almost certainly fails. The moment migration touches vendor-specific managed services, costs balloon to 3x the estimate.
| Forbidden move | Why |
|---|---|
| DIY abstraction layer “for future migration” | The maintenance burden of the abstraction code itself outweighs the benefit, and the migration never happens — canonical over-engineering |
| Planning migration with vendor-specific managed services (DynamoDB / BigQuery / Cosmos DB) at the core | Data model changes too — not just code rewrite, design redo |
| Big-bang migration switch | No retreat path if something breaks; 3-6 months parallel run is the floor |
| Skipping data-egress estimation | Multi-TB to PB transfer fees alone reach thousands of dollars |
| Starting migration before redoing IAM | Permission models differ per vendor; permission drift causes incidents during migration |
| Picking a vendor “for max features” while ignoring existing-platform fit | A Microsoft-stack company choosing AWS gets torpedoed by integration cost. Existing-asset affinity comes first |
| Underestimating cost and migrating to production | Real billing comes in at multiples of the estimate. Egress, NAT, and EIP hidden costs sting |
Migration calls should pass the test “migration cost > 3 years of lock-in cost” before kickoff. The AI era helps in some areas (“AI can write code for the new vendor”), but data movement and managed-feature replacement remain heavy.
Lock-in is accepted, not avoided. The portion you can escape with abstraction is smaller than it looks.
AI decision axes
With AI-driven development as the assumption, the selection axis pivots to “how well does AI know this vendor?”
| AI-era favorable | AI-era unfavorable |
|---|---|
| AWS (max info, max samples) | Domestic / minor clouds (sparse training data) |
| Heavy Terraform / CDK support | Custom consoles only |
| Mature APIs, rich docs | Closed specs, oral-tradition know-how |
| Multi-region in code | Manual-build legacy designs |
Niche domestic / mid-tier vendors have thin training data. Generated code that doesn’t run or calls non-existent APIs (hallucination) becomes more frequent. The lock-in problem is not solved by AI — “the weight of the first selection hasn’t changed.”
- Affinity with existing assets (Microsoft / Google / none).
- Engineer hiring / training feasibility (AWS is most favorable).
- AI fluency in the vendor as the final differentiator.
- Multi-cloud only with a clear reason.
AI training-data volume differs significantly between vendors
When you have AI write Terraform code, AWS resource generation accuracy is clearly higher than Azure or GCP. This is because the absolute volume of AWS-related code in public GitHub repositories is 3-5x that of Azure or GCP.
Specifically, instructing AI to “write VPC + ALB + ECS Fargate in Terraform” produces nearly-working code for AWS in one shot. The equivalent Azure configuration (Virtual Network + Application Gateway + Container Apps) sees higher frequency of parameter combination mistakes and deprecated API usage.
This gap may narrow in the future, but as of 2026, AWS is overwhelmingly favorable.
The meaning of vendor lock-in changed in the AI era
Traditional lock-in discussion focused on “can we technically migrate to another target.” In the AI era, “can AI accurately write for the migration-target vendor” becomes a new risk axis.
When migrating from AWS to GCP, having AI rewrite Terraform is technically possible. However, GCP-specific IAM design (Workload Identity Federation, etc.) and network design (Shared VPC) best practices have sparse training data, creating risk that post-migration operational quality degrades.
”Things that might get killed” — the fear (industry case)
Google Cloud IoT Core’s wind-down was announced August 2022 and ended August 2023. Companies running it in production scrambled to pick a replacement and rewrite. Google has a history of closing consumer services (Reader, Hangouts, Inbox, Wave), and the doubt “Google shuts things down when they get bored” doesn’t go away — a frequent enterprise complaint.
AWS strongly emphasizes “once shipped, generally doesn’t get killed” as a stance, and the difference shows up in long-term-operations confidence. Even with feature parity, layering in “will this service exist in 10 years?” changes the picture. After IoT Core, “will this still be here in 5 years?” became unavoidable in GCP service selection.
Vendor selection means looking past the feature table to the vendor’s “personality.” Features look similar, but corporate culture and operating posture differ.
Cloud selection is a judgment about “personality,” not “features.” Vendors that kill things vs vendors that don’t.
What you must decide — what’s your project’s answer?
Articulate your project’s answer in 1-2 sentences for each:
- Primary cloud vendor (AWS / Azure / GCP)
- Region (Tokyo / Osaka / overseas)
- Existing-system integration
- Compliance (finance, healthcare, public)
- Whether multi-cloud is on the table
- Domestic-data-sovereignty requirement
- Engineer-supply realism
Recording the decision rationale
Cloud vendor selection ripples through infrastructure cost, operations structure, and staffing requirements, so recording the rationale as an ADR is essential. Here is a concrete example:
| Item | Content |
|---|---|
| Title | Adopt AWS as the cloud vendor |
| Status | Accepted |
| Context | Selecting a cloud platform for a new product. 6 of 8 engineers have AWS experience. Low latency in the Tokyo region is a mandatory requirement |
| Decision | Adopt AWS as the single vendor, with Tokyo region (ap-northeast-1) as primary |
| Rationale | - Team’s deep AWS experience ensures fast ramp-up and operational quality - Tokyo region has the highest number of available services among the big three - High AWS adoption rate among domestic SIers and partners, making external support accessible |
| Rejected alternatives | Azure: no existing Microsoft assets, weak adoption rationale. GCP: BigQuery and data platform are attractive, but no team experience and high learning cost |
| Outcome | IaC via Terraform to leave room for future vendor migration. Osaka region reserved for DR |
Without a documented “why we chose AWS,” vendor-lock-in debates and cost reviews devolve into gut-feel arguments. Having “why we chose this” visible at a glance later is the greatest value of an ADR.
Related Articles
https://en.senkohome.com/arch-intro-system-bcp/ https://en.senkohome.com/arch-intro-system-network/ https://en.senkohome.com/arch-intro-system-security/
Summary
This article covered how to choose a cloud vendor — strengths of the big three, recommendations by scale and industry, the lock-in posture, and AI-era judgment.
The era of choosing on features is over; affinity with existing assets and AI’s information density now decide it. New, no constraint -> AWS. Microsoft stack -> Azure. AI / data -> GCP. Lean on one and use it deeply — the realistic answer for 2026.
The next article covers the major decision after vendor: the runtime (VM / container / serverless / Wasm).
Back to series TOC -> ‘Architecture Crash Course for the Generative-AI Era’: How to Read This Book
I hope you’ll read the next article as well.
📚 Series: Architecture Crash Course for the Generative-AI Era (8/89)