About this article
Thank you for visiting this site. This article is a guide listing all 7 articles in the “Data Architecture” category of the Architecture Crash Course for the Generative-AI Era series.
Data architecture is the layer that designs how data is stored, flows, and is governed. In the AI era, data is the fuel for AI, and without well-organized data, AI can do nothing. This category systematically covers design decisions about data, from datastore selection to data governance.
Article index
1. Overview — Data Readiness as the Prerequisite for the AI Era
Covers OLTP/OLAP separation, the best storage target for each data type, a data-volume × freshness tier table, and how data tidiness sets the ceiling for AI adoption. The big picture of data architecture — read this first as the entry point to the category.
2. Datastore Selection — RDB-centric + Purpose-specific Polyglot
Compares the strengths and weaknesses of RDB, KVS, document DB, columnar, time-series, search engine, and vector DB. Learn how to judge the optimal combination of datastores per application using a data-volume × use-case tier table.
3. Data Modeling — Schemas Readable by Both AI and Humans
Covers the three-stage modeling process (conceptual/logical/physical), third normal form and denormalization, UUID v7, index design, and schema change strategy — data modeling practices that stay relevant for a decade. Also covers soft-delete and history management patterns from a practical standpoint.
4. Data Platforms — DWH / Data Lake / Lakehouse
Compares the three choices of DWH, data lake, and lakehouse, with BI tool integration and a tier table by scale. Learn how to avoid data swamps where “collecting” becomes the goal and build a data platform that’s actually usable — including catalog operations.
5. ETL / ELT — Fivetran + dbt + DWH Is the Modern Default
Explains the difference between ETL and ELT, the typical Fivetran/dbt/Airflow stack, data quality testing, and lineage. Also covers the tier table by scale and why GUI ETL tools are structurally headed for technical debt in the AI era.
6. Streaming — Question Whether You Really Need It First
Covers Kafka, Kinesis, Pub/Sub, Flink, and ksqlDB selection, plus the basics of Exactly-Once and window processing. As the title suggests, this article provides selection criteria for avoiding over-investment in real-time processing — balancing freshness requirements against operational cost.
7. Data Governance — Building a Dictionary for AI
Covers data catalogs, metadata, lineage, quality management, stewards, and access control with a phased roadmap by scale and regulation. Data governance in the AI era is the act of building a dictionary for AI — learn the practice.
Summary
This article listed all 7 articles in the Data Architecture category of the Architecture Crash Course for the Generative-AI Era series.
Data architecture is the domain whose value is increasing the most in the AI era. If you want to leverage AI, you need to start by getting your data in order — and this category teaches the design decisions systematically.
For the overall series structure and other categories, see the master series index.
Hope you’ll check out the next article as well.