System Architecture

Choosing a Cloud Vendor — AWS / Azure / GCP

Choosing a Cloud Vendor — AWS / Azure / GCP

About this article

This article is the third deep dive in the “System Architecture” category of the Architecture Crash Course for the Generative-AI Era series, covering how to choose a cloud vendor.

A casual “AWS, I guess” decides hiring, monthly cost, incident response, and regulatory handling for the next 10 years. AWS / Azure / Google Cloud are among the highest-exit-cost selections in software; later migration is essentially a system rebuild. This article covers each vendor’s strengths and weaknesses, world share, major services, and recommendations by scale and industry.

Where do you put the infrastructure?

Cloud vendors rent servers, storage, DBs, and app platforms over the Internet. AWS, Azure, and Google Cloud together hold ~70% of the world cloud market. Which vendor you pick fixes your tech stack, talent strategy, and cost structure for a decade.

Vendor selection has the trait of being hard to undo. Hundreds of services entangle once operations begin, and migration to another vendor becomes nearly impossible. This is vendor lock-in.

The abstraction of “keep it ready to migrate at any time” almost always degenerates into over-engineering. Lock-in is not something to avoid — it’s to accept and exploit deeply, which is the realistic stance in 2026.

Vendor selection is hard to change once made. Decide carefully, with lock-in as the assumption.

Amazon Web Services (AWS)

AWS launched in March 2006, the cloud-computing pioneer. It has held #1 world share into 2026. Originally Amazon’s internal e-commerce infrastructure released externally — technical reliability and scale are overwhelming.

StrengthsWeaknesses
250+ services — the largest catalogToo many services; specialty knowledge needed
Rich community, info, toolingComplex pricing makes cost management hard
Most regions; resilient to outagesHigh support fees
Easier to hire engineersDistinctive UI requires getting used to

Information density, adoption, ease of hiring — all surpass competitors. The lowest-regret pick. “Default new projects to AWS unless specified otherwise” is the rule. The downside: the service catalog itself is a learning wall.

AWS’s strength is “being the de facto standard.” Article counts on books, blogs, Stack Overflow; Terraform module counts on GitHub; new-grad and mid-career hires with AWS experience — all 2-3x the #2 (Azure). In the AI era this gap deepens: AI’s accuracy on AWS code is one tier above other vendors.

Default new builds to AWS. Deviate only when you can write the reason.

Microsoft Azure

Microsoft Azure launched in February 2010, an enterprise-strong cloud. World #2, closing on AWS without slowing.

StrengthsWeaknesses
Easy integration with Microsoft productsSome operational rough edges vs AWS / GCP
200+ services, hybrid via Azure ArcNon-Windows support tends to lag
Strong enterprise contracts and identityNew features often slower to roll out than AWS
Strong compliance posture for finance / public sectorIndividual-developer UX is mediocre

For companies on Office 365 / Microsoft 365, Azure is the natural unification target. Active Directory (Microsoft’s integrated identity)-anchored SSO (Single Sign-On) and permission management work seamlessly — a direct reason many enterprises pick Azure. Industries with strict compliance (finance, healthcare, government) also lean here.

Azure’s growth post-2023 is also driven by the exclusive partnership with OpenAI. ChatGPT / GPT-4 / GPT-5 family models offered to enterprises via Azure OpenAI Service produced a new market, “Azure for embedding generative AI in operations.” AWS Bedrock and GCP Vertex AI have caught up substantially, but the OpenAI exclusive remains an Azure strength as of 2026.

Microsoft-centric companies and compliance-heavy industries: Azure is the favorite.

Google Cloud (GCP)

Google Cloud (GCP) launched in April 2008, the third of the big three. Originally Google’s own infrastructure (YouTube, Gmail, search) opened externally. Has overwhelming technical leadership in containers, Kubernetes, AI/ML.

StrengthsWeaknesses
Google-services integration (YouTube/Maps, etc.)Smallest catalog (~150)
Strong on Kubernetes / containersService deprecation / changes more frequent
BigQuery, Gemini, etc., excellent AIFewer large-scale enterprise references
Sustained-use discounts trend lowerEngineer hiring is harder

Kubernetes was originally developed by Google; the GKE (Google Kubernetes Engine) is the industry’s most polished. BigQuery and Gemini sit at a level that competitors haven’t matched.

On the other hand, GCP carries a historical reputation for “shutting things down.” The Cloud IoT Core wind-down (announced 2022, ended 2023) and a string of consumer-side closures (Reader, Hangouts, Inbox, Wave) left enterprises with the lingering doubt “Google shuts things down when they get bored.” Even on equal feature comparisons, this “personality” difference matters in long-term operations.

For analytics-, AI-, or Kubernetes-centric projects, the top choice. The long-term-reliability concern needs to be evaluated separately.

World share comparison

World cloud market share, 2025 Q3. The big three together hold ~70% — the “three giants” era continues.

pie showData
    title World cloud market share (2025 Q3)
    "AWS" : 29
    "Azure" : 25
    "Google Cloud" : 13
    "Others (Alibaba/Oracle/IBM/etc.)" : 33
VendorShareTrend
AWS~29%Flat, holding #1
Azure~25%Continually growing, closing on AWS
Google Cloud~13%Growing, presence in AI

Azure’s catch-up comes from migrating Microsoft 365 customers and OpenAI-driven generative-AI adoption. GCP grows on AI-domain presence but the gap to #2 remains large.

Major services compared

Functions are largely equivalent across vendors despite different names. Each vendor races to match competitor features, so basic system construction is covered everywhere.

Compute / storage

CategoryAWSAzureGCP
Virtual machinesEC2Virtual MachinesCompute Engine
Container managementECS / EKSAKSGKE
ServerlessLambdaAzure FunctionsCloud Functions
Object storageS3Blob StorageCloud Storage

The impression that “AWS has more features” applies to niches. Basics are covered by all three; differentiation lands at “ML -> GCP, identity -> Azure, overall -> AWS.”

DB / CDN / AI

CategoryAWSAzureGCP
Managed DBRDSAzure SQLCloud SQL
NoSQLDynamoDBCosmos DBFirestore / Bigtable
CDNCloudFrontAzure CDNCloud CDN
AI / MLSageMaker / BedrockAzure AI / OpenAI ServiceVertex AI / Gemini

In NoSQL, AWS DynamoDB leads on stability and large-scale references. Azure Cosmos DB supports multi-model (document, graph, key-value, etc.); GCP Firestore is strong for mobile-app integration.

Basics line up everywhere. The difference is maturity and fit of individual services.

Selection criteria

There’s no “absolute right answer” — every vendor has more than enough features. Choose by fit with your tech assets and constraints. That’s it.

SituationRecommended
No special preference / restrictionAWS (safest)
Heavy use of Microsoft productsAzure
Strict compliance (finance, public sector)Azure (rich identity)
Need Google-services integrationGCP
Analytics-/AI-centricGCP (BigQuery, Gemini)
Kubernetes-centricGCP (GKE)

For new startups and individuals, info density, free tiers, and engineer availability make AWS the safest. With clear reasons (AI-forward -> GCP, existing Microsoft stack -> Azure), other choices are worthwhile.

Recommendations by org size × industry

Starting “AWS by default” can backfire for Microsoft-centric companies or BigQuery-fitting domains. The better classifier is the intersection of scale and industry.

SituationOrg sizeRecommendedReason
New B2B/B2C web SaaSup to 100AWSInfo density, managed services, easy hiring
Microsoft 365-core large enterprise1000+AzureEntra ID / AD integration, Office, Teams
Analytics / AI coreAnyGCPBigQuery, Gemini, Vertex AI
Finance / insurance (core systems)LargeAWS or Azure (FISC-certified)Compliance certifications
Public sectorGovernment Cloud-certified vendorProcurement requirement
Kubernetes coreup to midGCP (GKE)Origin Kubernetes maturity
Already running on AWSContinue on AWSMigration cost too high to justify

Judge across current vs 5-year future. Avoiding Azure for a Microsoft 365-dependent company is a bad fit; existing-platform integration cost can flip even AWS’s information-density advantage.

The era of choosing on features is over. “Fit with existing assets” is the strongest axis now.

Domestic clouds in Japan

Beyond the big three, domestic clouds and specialty clouds exist. Demand from data sovereignty (data on Japanese soil) and yen-denominated billing supports the segment.

VendorTrait
Sakura InternetDomestic, transparent pricing, used by government statistics
IDCFrontierDomestic, SoftBank-affiliated
Oracle CloudGenerous free tier, Oracle DB compatibility
Alibaba CloudStrong China-market support

In public-sector settings, “Government Cloud”-certified vendors (AWS, Azure, GCP, Oracle, Sakura) become eligible — a policy dimension layered onto cloud selection.

Without special requirements, the big three are sufficient. Domestic options come into play for “data must be in Japan” mandates.

Is multi-cloud effective?

Multi-cloud sometimes gets considered for “vendor lock-in avoidance,” but in practice the operational-complexity downside is too large to recommend casually. Effective only when there’s a clear reason:

  • Regulatory requirements forbid certain data on specific vendors.
  • Post-M&A systems scattered across vendors.
  • Want a specific AI feature (GCP Gemini, etc.) for one piece only.
  • BCP requires availability during one vendor’s outage.

Picking multi-cloud “because lock-in is uncomfortable” doubles cost and operational difficulty (specialists for each cloud). Leaning to a primary vendor is overwhelmingly more efficient.

Multi-cloud is only when a clear need exists. Don’t default to it.

Vendor migration / lock-in escape traps

Aiming for “abstraction so you can move anytime” almost certainly fails. The moment migration touches vendor-specific managed services, costs balloon to 3x the estimate.

Forbidden moveWhy
DIY abstraction layer “for future migration”The maintenance burden of the abstraction code itself outweighs the benefit, and the migration never happens — canonical over-engineering
Planning migration with vendor-specific managed services (DynamoDB / BigQuery / Cosmos DB) at the coreData model changes too — not just code rewrite, design redo
Big-bang migration switchNo retreat path if something breaks; 3-6 months parallel run is the floor
Skipping data-egress estimationMulti-TB to PB transfer fees alone reach thousands of dollars
Starting migration before redoing IAMPermission models differ per vendor; permission drift causes incidents during migration

Migration calls should pass the test “migration cost > 3 years of lock-in cost” before kickoff. The AI era helps in some areas (“AI can write code for the new vendor”), but data movement and managed-feature replacement remain heavy.

Lock-in is accepted, not avoided. The portion you can escape with abstraction is smaller than it looks.

The AI-era lens

With AI-driven development as the assumption, the selection axis pivots to “how well does AI know this vendor?”

AWS has overwhelming info density, official docs, and sample code; AI’s Terraform / CDK accuracy is highest there. Azure and GCP also have enough info, but niche domestic / mid-tier vendors have thin training data. Generated code that doesn’t run or calls non-existent APIs (hallucination) becomes more frequent.

AI-era favorableAI-era unfavorable
AWS (max info, max samples)Domestic / minor clouds (sparse training data)
Heavy Terraform / CDK supportCustom consoles only
Mature APIs, rich docsClosed specs, oral-tradition know-how
Multi-region in codeManual-build legacy designs

The lock-in problem is not solved by AI. Migration to other vendors remains heavy construction. “The weight of the first selection hasn’t changed.” Don’t overrate AI’s help here.

The AI era favors “the vendor AI knows best.” AWS’s lead widens further.

”Things that might get killed” — the fear (industry case)

Google Cloud IoT Core’s wind-down was announced August 2022 and ended August 2023. Companies running it in production scrambled to pick a replacement and rewrite. Google has a history of closing consumer services (Reader, Hangouts, Inbox, Wave), and the doubt “Google shuts things down when they get bored” doesn’t go away — a frequent enterprise complaint.

AWS strongly emphasizes “once shipped, generally doesn’t get killed” as a stance, and the difference shows up in long-term-operations confidence. Even with feature parity, layering in “will this service exist in 10 years?” changes the picture. After IoT Core, “will this still be here in 5 years?” became unavoidable in GCP service selection.

Vendor selection means looking past the feature table to the vendor’s “personality.” Features look similar, but corporate culture and operating posture differ.

Cloud selection is a judgment about “personality,” not “features.” Vendors that kill things vs vendors that don’t.

What you must decide — what’s your project’s answer?

Articulate your project’s answer in 1-2 sentences for each:

  • Primary cloud vendor (AWS / Azure / GCP)
  • Region (Tokyo / Osaka / overseas)
  • Existing-system integration
  • Compliance (finance, healthcare, public)
  • Whether multi-cloud is on the table
  • Domestic-data-sovereignty requirement
  • Engineer-supply realism

Common failure patterns

  • “Pick AWS for max features” with a Microsoft-stack company — Without checking the existing environment, integration cost vaporizes the savings. Cases where Azure was the right answer.
  • Complex multi-cloud from day one — Ops team exhausts itself, becomes unmanageable. Canonical bad fit.
  • Skipping cost projection — Real billing comes in at multiples of the estimate, blowing the budget.
  • Abstracting “for future vendor change” — Over-engineering tanks development speed; the migration never happens.
  • Region in the US / Europe, violating regulation — In industries that mandate domestic storage of personal data.

How to make the final call

Vendor selection is decided more by “balance of existing assets and future lock-in” than by technical correctness. All three cover the basics; the era of feature-based selection is over.

What’s effective now: affinity with existing platforms (Microsoft 365 -> Azure, Google Workspace -> GCP) and the depth of internal hiring / training capacity. Two axes.

Lock-in is healthier accepted as a premise than avoided. “Anytime-migrate abstraction” tends to over-engineer, and migration doesn’t happen anyway. Leaning on one improves both operations and AI compatibility.

From the AI-driven-development angle, AWS’s info-density lead one-shots accuracy, so default to AWS unless there’s a special reason.

Selection priority:

  1. Affinity with existing assets (Microsoft / Google / none).
  2. Engineer hiring / training feasibility (AWS is most favorable).
  3. AI fluency in the vendor as the final differentiator.
  4. Multi-cloud only with a clear reason.

“The courage to lean on one” reconciles ops cost and AI productivity. Keep abstractions minimal.

Summary

This article covered how to choose a cloud vendor — strengths of the big three, recommendations by scale and industry, the lock-in posture, and AI-era judgment.

The era of choosing on features is over; affinity with existing assets and AI’s information density now decide it. New, no constraint -> AWS. Microsoft stack -> Azure. AI / data -> GCP. Lean on one and use it deeply — the realistic answer for 2026.

The next article covers the major decision after vendor: the runtime (VM / container / serverless / Wasm).

Back to series TOC -> ‘Architecture Crash Course for the Generative-AI Era’: How to Read This Book

I hope you’ll read the next article as well.