DevOps Architecture

[DevOps Architecture] Documentation - Lean README + ADR + OpenAPI Toward Git

[DevOps Architecture] Documentation - Lean README + ADR + OpenAPI Toward Git

About this article

As the fourteenth installment of the “DevOps Architecture” category in the series “Architecture Crash Course for the Generative-AI Era,” this article explains documentation.

For documentation, deciding “where to place it” before writing comes first - misplacing erases the meaning of writing. This article handles ADR, README, API docs, and docs-as-code as practice maintaining the state of “what’s written gets read / AI can read / not rotted half a year later.”

What is documentation, anyway?

Imagine leaving handover notes for your family. “Trash goes out Monday and Thursday,” “press this button for the water heater” — it’s the act of leaving behind the information needed to keep things running even without you. Without it, nobody knows the correct procedures while you’re away.

Software documentation works the same way. It’s the activity of recording why this design was chosen, how to start the system, and how to use the API so that your future self or new team members won’t be lost six months later.

Without documentation, design rationale vanishes into the heads of people who have left, and new-member onboarding becomes nothing but verbal “ask the person next to you” tradition.

Why documentation matters

The “why” of design vanishes in six months

Reading code tells you “what it does,” but not “why this design was chosen.” Without recording the reasoning, even you yourself will wonder “why did I do this?” six months later.

Don’t let onboarding depend on individuals

In teams without documentation, new hires have no option but “ask the person sitting next to me,” tanking the mentor’s productivity. A well-maintained README dramatically shortens the time to self-sufficiency.

AI now reads documentation to make decisions

When ADRs (Architecture Decision Records) and OpenAPI specs live in Git, AI agents can generate code with full understanding of design intent. Documentation now has machine readers, not just human ones.

Distinguish 4 types of documents

What’s called “documentation” actually mixes 4 types differing in purpose, lifespan, and update frequency. Without distinguishing, “putting everything in Confluence” simultaneously degrades searchability and updatability.

flowchart TB
    DOCS([Documentation])
    subgraph GIT["Git repo (docs-as-code)"]
        README[README<br/>entry, startup steps]
        ADR[ADR<br/>design-decision record]
        API[API docs<br/>OpenAPI/TypeDoc]
    end
    subgraph WIKI["Confluence/Notion/Wiki"]
        BIZ[Business knowledge / ops procedures<br/>org knowledge]
    end
    DOCS --> GIT
    DOCS --> WIKI
    GIT -.|PR/review/history/<br/>AI can read|.- L1[unified with code]
    WIKI -.|short-mid lifespan<br/>easily outdated|.- L2[business knowledge only]
    classDef root fill:#fef3c7,stroke:#d97706;
    classDef git fill:#dcfce7,stroke:#16a34a,stroke-width:2px;
    classDef wiki fill:#dbeafe,stroke:#2563eb;
    class DOCS root;
    class GIT,README,ADR,API git;
    class WIKI,BIZ wiki;
TypePurposeLifespanPlace
README (repo intro / startup)Entry signpostingMid (same as repo)Repo root
ADR (design decision record)Why that choice was madePermanent (append-only)Repo docs/adr/
API docsMachine-readable spec definitionsCode-syncedOpenAPI YAML / TypeDoc
Business knowledge / ops proceduresAccumulating org knowledgeShort-mid (easily outdated)Confluence / Notion / Wiki

The modern standard is committing the first 3 types to Git repos. PR / review / history management together with code, AI can read too. Confluence / Notion is design balancing searchability and updatability by narrowing to “business knowledge / ops procedures” only.

ADR - design-decision record

ADR (Architecture Decision Record) is the format Michael Nygard proposed in 2011, leaving “why this technology was chosen” with one file per decision. Today, regardless of OSS, SaaS, or enterprise, it’s the de facto standard format for design records.

# ADR-0007: Adopt PostgreSQL as main DB

## Status
Accepted (2026-03-15)

## Context
SQLite can't meet concurrent-connection / read-replica requirements.

## Decision
Adopt PostgreSQL 16. Compared with MySQL and MongoDB, valued
the combination of JSON type and relational design.

## Consequences
- Secure extensibility for full-text search and geospatial
- For special horizontal distribution, separate products needed

The trick: short (within 1 page), append-only without overwrite (set Status to Superseded by ADR-XX for past decisions). With decision history remaining, new hires half a year later can reconstruct “why it’s like this” from Git history.

When to write ADR - One-way Door judgment

ADRs aren’t to be written for “every design decision.” Writing too many leads to obsolescence and no one reads. Practical to narrow to those that fall under One-way Door (decisions hard to reverse once passed, Amazon’s decision-making framework).

Decisions to writeDecisions not to write
DB selection (PostgreSQL vs MongoDB)Switching from library A to B
Adding language / frameworkChanging function arg types
Auth method decision (OAuth / Passkey)Changing endpoint names
Microservice splittingRefactoring individual APIs
Cloud-vendor selectionChanging EC2 instance size
Fundamental data-model changeAdding a column

The ADR-line guideline is “redoing takes 3+ months.” Lighter than this, PR description suffices. Conversely, not leaving One-way Door decisions in ADRs means no one can explain “why we did this” 3 years later - the entry point of debt frequent in enterprises.

README - the entry signpost

README is the signpost the person opening the repo first sees. In many teams, READMEs polarize into “just project name and 1-line description” or “everything-in-one but no one reads.” The standard is composition narrowing to steps to start in 5 minutes + minimum context needed.

Should includeShouldn’t include
Project purpose (1 paragraph)Detailed design explanations (-> ADR / docs/)
Startup steps (copy-paste-runnable)All-features usage
Dev-env premises (runtime version requirements etc.)API reference (-> separate file)
License / contactPersonal TODO memos
Related links (CONTRIBUTING / docs/)Incident history

Most important is “startup by copy-paste.” Guarantee in README a state where just pasting commands in order works, from git clone to localhost:3000. READMEs left with old commands or changed dependencies become devices frustrating new hires within their first 30 minutes.

docs-as-code - lean toward Markdown + Git

docs-as-code is the thinking of managing documentation with the same mechanism as code (Markdown + Git + PR review). The flow today mainstream is leaning from GUI-centric tools like Confluence and Notion to in-repo Markdown.

Viewpointdocs-as-code (Markdown + Git)Confluence / Notion
Version controlComplete via Git historyLimited (edit history only)
ReviewPR with same workflow as codeComment feature only
SearchabilityInstant via grep / IDESearch-accuracy issues
AI-readableExcellent (standard format)Marginal (custom API, scraping difficult)
DiagramsCode-ize with Mermaid / PlantUMLEmbedded images (manual update)
Learning costMarkdown knowledge onlyTool-specific UI

Confluence / Notion is suited for business knowledge / org info, but placing code-tightly-coupled documents there degrades searchability and reviewability. The modern standard is leaning all of API specs / design decisions / READMEs to in-repo Markdown.

Stages for writing documentation - phased practice

Vague “when to write docs” also formalizes. Practical to split timing and granularity by stage, deciding what to leave where at each stage.

StageWhen to writeWhat to writeWhere to place
1. Design considerationBefore implementationADR (chosen / alternatives / reasons)docs/adr/NNNN-title.md
2. PR submissionJust after implementationPR description (changes / verification)GitHub PR
3. After mergeAs neededREADME update / API doc generationIn repo
4. ReleaseAt feature publicationCHANGELOG / release notesCHANGELOG.md (auto-generated)
5. IncidentWithin 24h of recoveryPostmortemdocs/postmortems/

PR description aims for “the level where full context is understandable from that commit alone.” The standard is writing 3 points - “why this change is needed,” “whether alternatives were considered,” “verification steps” - in the body. Just thoroughly doing this lets people tracing with git blame 3 years later instantly reconstruct context.

Documentation pitfalls - rotted docs are misinformation

The biggest documentation trap is not getting updated after writing. Documentation with old info is worse than “no documentation” since it guides readers in wrong directions.

PitfallWhy it happens / why it’s bad
Old commands remain in READMENew hires get stuck on day one. Make commands CI-runnable
3-year-old architecture diagrams in ConfluenceDiverges from “running code.” Images rot
Distribute API specs as WordDoesn’t sync with code, becomes lump of misinformation in half a year
Cram all-feature usage into READMEBloated, no one reads or updates
”Design docs in Excel” operationsUnsearchable, undiffable, AI-unparseable
”More detailed is always better”Bloated, no one reads, never updated, rots
”Confluence is searchable so it’s fine”Search accuracy is poor, AI-unreadable, inappropriate for code-tightly-coupled info

The countermeasure core is the systematization of placing docs in same place as code, forcing updates via PR. Auto-generate everything code-generatable like OpenAPI / TypeDoc, minimizing docs humans manually update - the modern defense line.

Old documentation is worse than no documentation. The problem is design where the writer doesn’t take update responsibility.

AI decision axes

AI-favoredAI-disfavored
Markdown + Git documentationConfluence, Notion, verbal tradition
Culture of leaving “why” in ADRs”Working code is documentation”-ism
Machine-readable specs via OpenAPI / TypeDocSpec distribution in Word / Excel
Mermaid / PlantUML (text diagrams)PNG / JPG diagrams (un-updatable)
README covering startup stepsThin README, relying on verbal tradition
  1. Lean documentation placement to Git repos - Confluence for org info only
  2. Always leave One-way Door decisions in ADR
  3. Sync code and spec via OpenAPI / TypeDoc / Mermaid
  4. Force “why / alternatives / verification steps” via PR description template

OpenAPI / TypeDoc - auto-generation from code

Hand-writing API specs always rots them. Using OpenAPI (formerly Swagger, machine-readable spec format for REST APIs) or TypeDoc (auto-generated from TypeScript comments), code and spec stay constantly in sync.

ToolTargetCharacteristics
OpenAPI (YAML / JSON)REST APIDe facto standard, visualized via SwaggerUI
GraphQL Schema (SDL)GraphQLTypes are the spec
gRPC + Protocol BuffersgRPC.proto is the spec
TypeDocTypeScript librariesHTML generated from comments
Sphinx + autodocPythonHTML generated from docstrings
rustdocRustBuilt-in standard

The modern standard is design where comments / types in code become documentation as is. Using OpenAPI lets you generate server implementation, client SDK, mock server, and HTML docs from one YAML. Manage OpenAPI YAML in Git and verify “code and YAML stay in sync” in CI - the biggest defense line preventing API-doc decay.

Mermaid / PlantUML - diagrams as code too

Diagrams are representative of things that rot easily. Pasting PNGs created in Lucidchart or draw.io to repos frequently causes accidents where sources go missing and updates become impossible. Mermaid and PlantUML are mechanisms defining diagrams as text, with best chemistry with docs-as-code since they can be written directly in Markdown.

sequenceDiagram
  User->>Frontend: login request
  Frontend->>AuthAPI: POST /auth/login
  AuthAPI->>DB: user authentication
  DB-->>AuthAPI: auth result
  AuthAPI-->>Frontend: JWT issued

GitHub, GitLab, Notion, and VS Code render Mermaid as standard. Being text means overwhelming strength on 3 points - diffs readable, change-reviewable in PRs, AI-understandable. The cost gap between “drawing diagrams” and “updating diagrams” shrinks dramatically, getting the mechanism of “diagrams don’t rot.”

Postmortems - mechanism for learning from incidents

Postmortems (literally “post-death examination,” in IT industry meaning incident review docs) are the core of SRE practice. Written within 24-48 hours of incidents, documents for improving mechanisms not blaming individuals.

# Postmortem: 2026-03-20 auth incident (45 minutes)

## Impact
All users couldn't log in / impact 14:23-15:08 JST / opportunity loss about 1.2M yen

## Timeline
- 14:23 alert fired
- 14:38 cause identified (Redis connection-pool exhaustion)
- 15:08 normalization confirmed

## Cause (5 Whys analysis)

## Action items
- [ ] Add connection-pool monitoring to Grafana
- [ ] Add this case to load-test scenarios

Blameless (no blame) is the iron rule. Writing not “Person A made a mistake” but it was a mechanism that couldn’t prevent mistakes changes recurrence prevention from individual-dependent to mechanism-improvement. What’s written is published to all internal staff, turning learning into organizational knowledge - Google SRE’s operational style.

Documentation antipatterns

Even with documentation-writing culture, mixing in the following antipatterns drastically reduces effect.

AntipatternWhy it’s bad
Throw everything into ConfluenceUnsearchable, no PR review, AI-unreadable
Only “responsible person” updates docsBottleneck-ization, instant outdated when they leave
Don’t evaluate “writing” but only “writing code”Culture of no one writing settles
Cram all-feature usage into 1 fileBloated, not read
Write important info in Slack and share via DMUnsearchable, new hires can’t access
Treat docs as confidential and partially discloseReviews don’t turn unless everyone sees same info

Design decisions shared in Slack DM are equivalent to non-existent for the org. Slack logs are searchable but buried in unintended flows and undiscoverable, requiring the line “Slack is for discussion, conclusions go to Git.”

What to decide - what is your project’s answer?

For each of the following, try to articulate your project’s answer in 1-2 sentences. Starting work with these vague always invites later questions like “why did we decide this again?”

  • Documentation placement (repo Markdown / Confluence / Notion)
  • ADR operation (write targets / template / placement)
  • README minimum line (whether startup steps work via copy-paste)
  • API-spec management (OpenAPI / GraphQL SDL / .proto)
  • Diagram management (Mermaid / PlantUML / images)
  • PR description template
  • Postmortem operation (writing timing / publication scope)
  • Doc-update responsibility rule (simultaneous update in change PRs)

Author’s note - “Confluence graveyard” that killed a migration project

There’s a case at a mid-size SaaS where 3 years of architecture decisions, ops procedures, and incident responses were all accumulated in Confluence. The problem: new hires couldn’t find needed info. Searches hit 10-year-old drafts, old design proposals from other teams, and duplicate pages copied and left, with the state of taking hours to find correct info becoming normal.

This team conducted phased migration from Confluence to in-repo Markdown, ultimately organizing as “all code-tightly-coupled info to Git, only org info in Confluence.” Beyond drastically shortening new-hire ramp-up time, AI agents could read Git docs and generate code, raising dev speed too - the case. It’s becoming the era where doc placement decides organizational competitiveness.

Confluence easily becomes the graveyard of information. Move code-tightly-coupled info to Git.

Summary

This article covered documentation, including 4-type distinction, ADR, README, docs-as-code, OpenAPI, Mermaid, postmortems, and AI-era placement.

Lean documentation toward Git repos, leave One-way Door in ADRs, auto-sync code and spec, force intent via PR description. That is the practical answer for documentation design in 2026.

Next time we’ll cover ticket / project management (Issue, Kanban, WIP limits).

Back to series TOC -> ‘Architecture Crash Course for the Generative-AI Era’: How to Read This Book

I hope you’ll read the next article as well.