About this article
As the third installment of the âEnterprise Architectureâ category in the series âArchitecture Crash Course for the Generative-AI Era,â this article explains EA-perspective Data Architecture (DA).
While the Data Architecture chapter (40 series) handled âimplementation of individual systems,â this article handles âcross-enterprise consistency.â For example, âcentralizing the customer masterâ is this article, âwhich DB to place it onâ is the 40 seriesâs job. This article covers MDM, data governance, company-wide data flow, and CDO / data steward roles - explained for CDO / data-department-head.
What is EA-perspective Data Architecture in the first place
Imagine a libraryâs classification system. If each branch organized its books using its own classification scheme, no one could instantly answer âwhich branch has this book?â Only with a shared classification and search system across all branches can someone at any branch find the book they need.
EA-perspective Data Architecture (DA) is the discipline of systematically organizing the entire enterpriseâs data assets. While the individual-system DB design (40 series) handles âthat operationâs data,â EA-DA draws a company-wide map of âwhich data exists where and how it flows.â
Without DA, the same customer data exists in different formats across departments, making company-wide analytics and AI utilization impossible.
Companies where âthis monthâs revenueâ differs by department have broken DA
The 2nd EA layer (DA) systematically designs the companyâs data assets. Different in viewpoint from the âDB selection / data foundationâ handled in the data-architecture chapter, the goal is organizing at company level the types, relationships, flows, and owners of data the whole org handles.
While individual-system data architecture handles âdata for running that operation,â EAâs data architecture handles all data as the companyâs strategic asset. Draw on a single company-wide picture which system holds which data, where the original is, and where it flows.
Individual DB design = tactic, EAâs DA = strategy. Viewpoint one rank higher.
Why DA is needed
Integrating siloed data
When departments hold data in different systems, you reach the state of the same customer registered with 3 IDs. Need to organize data from company-wide viewpoint.
Foundation for data-driven management
To use company-wide data for management decisions, a map of where what is is needed. Companies without DA frequently see ânumbers donât matchâ problems in management meetings.
In management meetings, âthis monthâs revenueâ from finance and âthis monthâs revenueâ from sales differ by nearly 5%, with debates starting from there every month - reported scenarios. Tracing causes, returns / discounts / consumption tax / accounting timing are subtly different per department, with everyoneâs numbers correct by their departmentâs definition. A typical example showing more serious than data itself is not aligning word definitions company-wide.
Regulatory / privacy compliance
For GDPR (General Data Protection Regulation) and Personal Information Protection Act compliance, a personal-info location map is required. DA setup is the premise of audit response.
Main DA components
EAâs data architecture captures company-wide data from multiple viewpoints. Beyond mere ER diagrams, it includes all viewpoints of strategy / operations / technology.
| Element | Content |
|---|---|
| Conceptual data model | Major company-wide entities |
| Logical data model | Relationship / attribute details |
| Physical data model | Actual DB design |
| Data flow diagram | Inter-system data movement |
| Data catalog | Catalog of all data |
| Master Data Management | Uniqueness of core data |
| Data governance | Management regime / rules |
Conceptual data model
What draws major entities handled company-wide is the conceptual data model. âCustomer / product / order / employee / partnerâ - express the core âthingsâ of corporate activity in about 10-30 items. Granularity understandable by business departments matters.
[Customer] -- purchase -- [Product]
| |
| |
+-- delivery to --[Address]-- inventory -- [Warehouse]
The iron rule for conceptual models is drawing in business language. Write âcustomerâ not âuser_accountâ - making it the common language of business and tech.
Data domains
Areas grouping related data are data domains. âCustomer domainâ / âproduct domainâ / âfinance domainâ - splitting by business function and placing data owners on each is the modern approach.
| Domain | Major data |
|---|---|
| Customer | Customer master, behavior history, segments |
| Product | Product master, categories, prices |
| Transactions | Orders, deliveries, returns |
| Finance | Accounting, budget, actuals |
| HR | Employees, salary, evaluation |
| Partners | Business partners, contracts |
In Data Mesh thinking, domains hold data ownership and responsibility, providing high-quality data complete within domain to other domains.
Master Data Management (MDM)
The mechanism centrally managing core data company-wide. With master data like âcustomer ID,â âproduct code,â and âpartner codeâ differing per department, company-wide analysis becomes impossible. MDM creates the single source of truth.
flowchart TB
subgraph BEFORE["Without MDM (typical failure)"]
SYS1[CRM<br/>customer ID=A001] -.| SYS2[ERP<br/>customer ID=12345]
SYS2 -.| SYS3[Accounting<br/>customer ID=Tokyo Taro]
SYS3 -.| Q1[Company-wide analysis<br/>impossible]
end
subgraph AFTER["Coexistence MDM"]
MDM[(MDM<br/>customer master)]
SYS4[CRM] <-->|bidirectional sync| MDM
SYS5[ERP] <-->|bidirectional sync| MDM
SYS6[Accounting] <-->|bidirectional sync| MDM
MDM --> ANALYTICS[Company-wide analysis<br/>possible]
end
classDef bad fill:#fee2e2,stroke:#dc2626;
classDef good fill:#dcfce7,stroke:#16a34a;
classDef mdm fill:#dbeafe,stroke:#2563eb,stroke-width:2px;
class BEFORE,SYS1,SYS2,SYS3,Q1 bad;
class AFTER,SYS4,SYS5,SYS6,ANALYTICS good;
class MDM mdm;
| MDM construction method | Content |
|---|---|
| Registry | Each systemâs data as is, only IDs integrated |
| Consolidation | Read-only integrated data |
| Coexistence | Bidirectional sync with each system |
| Centralized | Aggregated to a single master system |
Realistically, Coexistence is more often chosen - the realistic method of phased consistency without breaking existing systems.
The reason Coexistence is chosen over others is clear. Centralized is ideal but the migration cost of stopping existing core / CRM / ERP and consolidating into a single master is huge - few companies can complete this without stopping running businesses. Consolidation is read-only so updates remain in each system, ending up with continued dual management. Registry is the light method just connecting IDs, but canât resolve attribute-value inconsistencies (same customer, different addresses, etc.). Coexistence keeps existing system updates alive while organizing master via bidirectional sync, with the trio of not breaking existing assets, suppressing initial cost, avoiding full-integration failure risk - fitting most realistic enterprises premising phased introduction.
Company-wide data flow diagram
Visualize inter-system data movement at company-wide unit. Drawing âwhich system receives data from where, sends whereâ reveals data dependencies.
[Core system] --orders--> [Inventory mgmt]
| |
| v
+--customer info--> [CRM] --analysis--> [DWH]
| |
v v
[Email delivery] [BI]
With diagrams at this level, âwhatâs the impact range when a system stopsâ is visible at a glance. Directly connects to incident response too.
Enterprise-level data catalog
The data catalog handled in the data-architecture chapter, deployed company-wide at EA level. Centrally manages data metadata, owners, and usage, realizing Google Search for data.
| Tool | Characteristics |
|---|---|
| Collibra | Commercial, enterprise |
| DataHub | LinkedIn OSS |
| Alation | AI-equipped, commercial |
| Apache Atlas | Hadoop-system OSS |
| Informatica EDC | Integrated suite |
Integration of departmental catalogs is the EA-level challenge, requiring devices to integrate disparate catalogs.
Data-governance regime
The org regime managing company-wide data. Beyond technology, role and authority design matters - establishing a data governance committee is general.
| Role | Responsibility |
|---|---|
| Chief Data Officer (CDO) | Company-wide data strategy |
| Data governance committee | Rules, priorities |
| Data owner | Domain responsible |
| Data steward | Daily management |
| Data user | User, compliance obligation |
Establishing CDO is a trend since 2015, a required position at companies treating data as management asset.
Data and cloud
Modern EAâs DA is designed premising cloud DWH (Data Warehouse), data lake, and lakehouse. The design paradigm has shifted from âaggregating internal DBsâ to âintegrated data foundation in cloud.â
| Role | Major tool |
|---|---|
| DWH | Snowflake, BigQuery |
| Data lake | S3, GCS, ADLS |
| Lakehouse | Databricks, BigLake |
| Streaming | Kafka, Kinesis |
| ETL / ELT | Fivetran, dbt |
| Catalog | DataHub, Collibra |
Redrawing EAâs DA premising cloud is becoming the work of 2020s enterprise architects.
Data security and privacy
EAâs DA also includes data-confidentiality classification. Govern who handles which data how via labeling of âpublic / internal / confidential / top secret.â
| Class | Target | Handling |
|---|---|---|
| Public | Web pages, IR info | Free |
| Internal | Employee-facing info | In-house only |
| Confidential | Sales plans, contract info | Access restricted |
| Top secret | Personal info, financial secrets | Strong encryption, audit logs |
A personal-info location map (PII Inventory, the catalog of Personally Identifiable Information) is a required output for GDPR compliance, uncreatable without EAâs DA in place.
Decision criterion 1: data-utilization strategy
The more companies utilize data as management asset, the more important EAâs DA. For companies seeing data only as operational logs, detailed DA is excessive.
| Strategy | Recommended |
|---|---|
| Data as mere records | DA at minimum |
| Decisions via BI (Business Intelligence) | Conceptual model + catalog |
| Auto-judgment via AI | Full DA + governance |
| Data itself is product | CDO + dedicated org |
Decision criterion 2: org scale and complexity
The more complex the org, the higher DA-setup cost - but investment-effect ratio also larger. The needed DA depth differs between single-product small enterprises and diversified large enterprises.
| Org | Recommended |
|---|---|
| Single business | Conceptual model + main DB design |
| Multiple businesses | Domain split + MDM |
| M&A in progress | Master alignment premising integration |
| Global | Per-region / per-regulation design |
How to choose by case
Startup / single business
Conceptual model + BigQuery / Snowflake + dbt. Dedicated CDO unneeded, engineering manager concurrent. Data catalog enough with dbt docs, master integration starts when needed.
Mid-size enterprise / BI-driven management
Domain split + DataHub / Alation + data-steward placement. Split into 3-5 domains, place concurrent stewards on each. MDM with Coexistence for phased integration, deliver to decision-makers via BI tools (Tableau / Looker).
Large enterprise / diversified businesses
Establish CDO + Collibra / Informatica + dedicated MDM team. Place data-governance committee directly under management, standing M&A-response master-integration projects. Manage region-based / regulation-based DA in ArchiMate, auto-generate PII Inventory for GDPR / Personal Information Protection Act.
Companies where data is product (advertising, finance, SaaS)
Data Mesh + semantic layer (dbt semantic layer / Cube.js) + AI Ready design. Domains productize data and provide to other departments / customers, with AI agents autonomously querying via semantic layer. Attach freshness / quality SLAs to all data.
Phased MDM-integration practical matrix
MDM breaks down aiming for âperfect centralization,â so phased integration not breaking existing systems is the realistic answer.
| Phase | Period | Coverage | Investment guideline |
|---|---|---|---|
| 1. Current inventory | 1-3 months | Grasp ID systems of major masters (customer, product) | Millions |
| 2. Registry integration | 6-12 months | Make IDs of each system mutually referenceable | Tens of millions |
| 3. Coexistence bidirectional sync | 1-2 years | Bidirectional sync with each system, attribute-value unification | Tens of millions to hundreds of millions |
| 4. Golden Record establishment | 3-5 years | Establish single authoritative data | Hundreds of millions |
| 5. Centralized (ideal) | Long-term | Fully consolidate into single master | Practically impossible at many companies |
Practical lower bound for MDM investment is mid-size enterprise and up. At startup / small SaaS, MDM is excessive - PostgreSQL master tables + common ID-naming conventions is enough. Uberâs 2014 âdashboard warsâ (the same âweekly ridesâ coexisting in 3-5 versions, CEO and field numbers diverging) is the typical case showing the necessity of central MDM.
MDM goes phased integration via Coexistence. Aiming for perfect centralization always fails.
EA-perspective DA pitfalls and forbidden moves
Typical accident patterns in EAâs DA. All become causes of âsame customer registered with 3 IDs,â ânumbers diverge at management meetingsâ.
| Forbidden move | Why itâs bad |
|---|---|
| Donât align term definitions company-wide | âThis monthâs revenueâ diverges 3-8% by department, parallel debates |
| Aim for master integration all at once Centralized | Migration stopping existing core, business-stop risk |
| Just install data catalog and abandon | No stewards, metadata not updated, rots |
| Split data domains by org name | Ownership disappears on org change, split by capability units |
| Donât create PII Inventory | GDPR compliance impossible, same risk as Meta EUR 1.2B fine |
| Direct AI to DB without semantic layer | AI misunderstands ârevenue,â mass-producing hallucinations |
| Talk company-wide data strategy without CDO | Doesnât ride management agenda, stalls in dept warfare |
| Donât operate data classification (public / internal / confidential / top secret) | Vague personal-info handling, regulatory violations |
| Try MDM introduction stopping existing systems | Business-stop, big firestorm; phased integration via Coexistence |
| Manage metadata in PDF / Excel | AI canât read, not continuously updated, becomes outdated |
| DB design exists so EAâs DA is unneeded | Individual DB design and company-wide viewpoint are different; domain splitting and master alignment are work outside DB design |
| Buying a data catalog completes DA | Tools are means; regime, rules, and operations are substance â just installing leads to neglect and rot |
| CDO is only for large enterprises | Recently mid-size also places CDO; whether to put data strategy on management agenda |
| Master-data integration solves all problems | Integration itself is a hard project; the success secret is proceeding phased and realistically |
Uberâs 2014 dashboard wars is told as a success case of in-house Michelangelo (ML platform) and Querybuilder (semantic layer) rooting the culture of âmetric definitions agreed via GitHub PRs,â converting metric debates to engineering work.
EA-perspective DA is âword definitions before technology.â Aligning terms company-wide is the first step.
What to decide - what is your projectâs answer?
For each of the following, try to articulate your projectâs answer in 1-2 sentences. Starting work with these vague always invites later questions like âwhy did we decide this again?â
- Conceptual data model (major 10-30 entities)
- Data-domain split (who owns what)
- Master-data strategy (integration method)
- Data catalog (tool, operation)
- Governance regime (CDO, committee)
- Data-classification policy (public / internal / confidential / top secret)
- Cloud DWH strategy (Snowflake / BigQuery etc.)
Authorâs note - ânumbers donât matchâ that stopped a new project
The real fear of company-wide data definitions being disparate surfaces not at incidents but at decision-making time.
A DX project of âcreating a company-wide revenue dashboardâ started at a mid-size retailer, and aggregating revenue data from finance / sales / e-commerce DBs revealed that the 3 numbers diverged 3-8% monthly. The cause was differences per system in âis revenue counted at order or shipping?â / âwith or without consumption tax?â / âwhen are returns reflected?â - spending over half a year on investigation and definition agreement, the management-meeting dashboard came online 1.5 years late from start - repeatedly told as standard talking point.
Another, Uberâs 2014 âdashboard warsâ is also a famous case. Uber, in rapid growth, made independent data pipelines per team, resulting in the same metric âweekly ridesâ coexisting in 3-5 versions on internal dashboards, with CEO numbers diverging from field numbers. Eventually Uber developed in-house Michelangelo (ML platform) and Querybuilder (semantic layer), switching to mechanisms of defining metrics company-wide once and reusing. Thereafter, the culture of âmetric definitions agreed via GitHub PRsâ took root inside Uber, with metric debates converted to engineering work.
Both slap home the decisive value of âaligning word definitions company-wide before technology.â At companies without EAâs DA, the moment AI agents are asked âwhatâs this monthâs revenue?â AI returns 3 different answers.
How to choose
The core of EA-perspective DA is the viewpoint of designing not individual DBs but company-wide data as strategic asset. Siloed data, same customer registered with 3 IDs, numbers diverging in management meetings - these arenât DB tech problems but flaws in enterprise-level data systems. The work of EA-level DA is splitting domains and placing data owners, securing core-data uniqueness via MDM, and creating a company-wide map via conceptual models, data flows, and catalogs. The realistic approach is phased integration via Coexistence MDM â aiming for perfect centralization breaks down.
Another decisive axis is building a data space autonomously understandable by AI agents. For AI hearing ârevenueâ to reach the correct aggregation logic, semantic layers (dbt semantic layer, Cube.js) are required. Only companies where Data Mesh has domains providing data products to other departments and AI, with API-referenceable / continuously-updated catalogs in place, can hold competitiveness even in AI usage.
AI-era decision axes
When AI-driven dev (vibe coding) and AI usage are the premise, EAâs DA is redesigned as a data space accessible by AI agents. In the era when AI autonomously seeks data, whether company-wide data is visible to AI decides competitiveness.
| Favored in the AI era | Disfavored in the AI era |
|---|---|
| Data Mesh (domain ownership) | Centralized silos |
| API-referenceable data | Excel, files |
| Semantic layer (term definitions) | Undefined column names |
| Continuously-updated catalog | Half-year-old snapshots |
As AI Ready data architecture, setup of semantic layers (dbt semantic layer, Cube.js, etc.) draws attention. Design where AI hearing ârevenueâ reaches the correct aggregation logic is needed.
AI-era DA designs in vocabulary AI understands. Semantic layer is key.
Selection priorities
- Domain splitting and data owners - clarify ownership, governance foundation
- MDM via phased Coexistence - perfect centralization fails, donât break existing
- Data classification for privacy compliance - public / internal / confidential / top secret, PII Inventory setup
- Semantic layer to hand AI vocabulary - dbt semantic layer / Cube.js, AI Ready design
âDesign data in vocabulary AI understands.â Domain split + MDM + semantic layer is the core.
Summary
This article covered EA-perspective Data Architecture, including conceptual models, domains, MDM, catalog, PII Inventory, semantic layer, and AI Ready design.
Clarify ownership via domain split and data owners, MDM phased via Coexistence, data classification for privacy, hand AI vocabulary via semantic layer. That is the practical answer for EA-perspective DA in 2026.
Next time weâll cover Application Architecture (AA) (system portfolio, integration patterns).
I hope youâll read the next article as well.
đ Series: Architecture Crash Course for the Generative-AI Era (71/89)