PoC Design - PoCs Ending With 'Sort of Worked' Are All Failures

About this article

As the fifth installment (final) of the “Solution Architecture” category in the series “Architecture Crash Course for the Generative-AI Era,” this article explains PoC design.

PoC is investment to produce decisions - PoCs without answers are failures. This article handles pre-defined Go/No-Go criteria, period setting (within 3 months), differences from MVP, AI-PoC specifics (accuracy, hallucination rate), and weekly PoC cycles - design that doesn’t end with “so what?”

What is PoC in the first place

Think of a tasting session. Before officially adding a new dish to the menu, you prepare a small batch to verify the taste, cost, and operations. The purpose is to test small before committing to full investment — “is it really good?” “does it justify the cost?”

PoC (Proof of Concept) is the IT version of a tasting session. Before investing tens of millions to hundreds of millions into serious development, you verify technical and business feasibility with a small prototype and obtain Go/No-Go decision material.

Without a PoC, jumping straight into serious development means the entire investment is wasted the moment technical impossibility is discovered. PoC is the mechanism for keeping failures small.

Why PoC is needed

Lower risk before serious development

Before serious investment of tens of millions to hundreds of millions, verifying with millions minimizes loss on failure.

Reduce uncertainty

New tech, new operations, AI usage - many elements unknowable without trying. Getting reliable info via PoC is rational.

Build basis for decisions

In scenes where convincing management requires demonstration, actually-running small prototypes are more eloquent than anything.

PoC vs prototype vs MVP

PoC, prototype, and MVP are similar but different. With different purposes, design policies differ too.

Type	Purpose	Users
PoC (Proof of Concept)	Verify “feasibility”	Internal stakeholders
Prototype	Verify “usability”	Some users
MVP (Minimum Viable Product)	Minimum form for market launch	Real users

PoC is internal experiment for Go / No-Go judgment, MVP is product measuring whether value emerges in market. Confusing them breaks down design.

What to verify in PoC

PoC doesn’t verify everything - the iron rule is narrowing to the most uncertain parts. Define “what proven lets us proceed to serious development.”

Verification target	Example
Tech feasibility	Whether it really works with this tech
Performance achievability	Whether processing speed meets requirements
Business fit	Whether used in the field
Data quality	Whether expected results emerge with data
Cost validity	Whether buildable at expected cost
Vendor capability	Whether candidates really can do it

PoCs verifying already-known things are waste. Choose only unknown / uncertain parts.

What not to verify in PoC

PoC also clarifies scope not to verify. Vagueness here makes PoC bloat and become indistinguishable from serious development.

Shouldn’t verify	Reason
Fine UI design	Handled in serious phase after PoC
Scalability	Hard to judge at small scale
Full production data	Samples enough
Already-verified tech	No point PoC-ing
All-feature implementation	Scope explosion

Go/No-Go judgment criteria

The most important PoC design element is the Go/No-Go judgment criteria. By pre-deciding “if this number is achieved, Go; if not, No-Go,” prevents post-PoC emotional disputes.

flowchart TB
    START([PoC start<br/>document criteria upfront])
    EXEC[PoC execution<br/>1-3 months]
    M1{Processing time<br/>P95 <= 500ms?}
    M2{Accuracy >= 90%?}
    M3{Cost<br/>within 1.5x of expected?}
    GO[Go: serious-dev approval]
    PIVOT[Pivot: scope change]
    NOGO[No-Go: stop<br/>consider alternatives]
    START --> EXEC --> M1
    M1 -->|Yes| M2
    M1 -->|No| NOGO
    M2 -->|Yes| M3
    M2 -->|partial| PIVOT
    M2 -->|No| NOGO
    M3 -->|Yes| GO
    M3 -->|No| PIVOT
    classDef start fill:#fef3c7,stroke:#d97706;
    classDef step fill:#dbeafe,stroke:#2563eb;
    classDef good fill:#dcfce7,stroke:#16a34a,stroke-width:2px;
    classDef pivot fill:#fae8ff,stroke:#a21caf;
    classDef bad fill:#fee2e2,stroke:#dc2626;
    class START,EXEC start;
    class M1,M2,M3 step;
    class GO good;
    class PIVOT pivot;
    class NOGO bad;

Starting PoC without pre-deciding judgment criteria is the worst. Disputes over “is this success or failure?” after finishing.

PoC period design

The principle for PoC is short, with clear deadline. Realistic to keep within 3 months at most - longer means scope too wide.

Period	Suited PoC
1-2 weeks	Tech selection, vendor evaluation
1 month	Single-feature feasibility
2-3 months	Verification including operations
3+ months	Closer to serious development than PoC

The trap of PoC is continues forever without setting deadline. The iron rule is delimiting time and producing answers.

PoC regime

The principle for PoC is few people, short concentration. Large-scale regime can’t move and decisions delay.

Role	People guideline
Architect (lead)	1
Engineer	1-3
Business expert	1
Project manager	0.5
External vendor (when needed)	1-2

The ideal is 5 or fewer, minimizing communication cost. Large-scale PoCs have management cost exceeding effect.

Gap between PoC and serious development

PoC working doesn’t mean serious-dev success. PoC just shows “can be done”; scale, operations, maintenance are separate issues.

Areas insufficient in PoC	Content
Large-scale data	100x, 1000x scale
Concurrent users	Behavior with simultaneous use
Production operations	24/7 ops, incident response
Security	Production-level countermeasures
Governance	Permissions, audits
Other-system integration	Real-env connections

PoC success != project success. Designing how to leverage PoC results in serious development is also important.

AI PoC specifics

AI / ML PoC need verification axes different from conventional. Beyond “running,” “produces business value” and “can maintain accuracy continuously” matter.

Special verification items	Content
Data quality	Whether learning data is sufficient
Accuracy / recall	Level usable in business
Hallucination	Wrong-answer rate for LLMs (Large Language Models)
Continuous learning	Temporal accuracy degradation
Explainability	Transparency of judgment reasons
Cost	Inference-cost actuals

In LLM PoCs, “may significantly exceed expectations on one hand, accuracy drops at scale” - careful evaluation is needed.

Choices after PoC

After PoC, choose from 3 options. Ending in No-Go is also fine PoC outcome.

Option	Content
Go (serious development)	Goal achieved, serious investment starts
No-Go (stop)	Hard to realize, consider alternative approach
Pivot (direction change)	Partial success, re-consider with scope change

Culture of not seeing No-Go as shame is important. Failed PoCs are successes preventing failure in serious investment. Some orgs reward failed PoCs.

PoC outputs

PoC outputs aren’t just running code - judgment documents are included too. The iron rule is leaving in form management can read.

Output	Content
Working prototype	Verification code
Evaluation report	Measurement results, judgment
Go / No-Go recommendation	Next recommended action
Risk list	Notes for serious development
Estimate (refined version)	Re-calculated ROI for serious dev
Demo video	For management

Working code + 1-page summary is most effective for management.

Decision criterion 1: uncertainty level

The higher project uncertainty, the higher PoC value. With known tech and similar-project experience, PoC unneeded.

Uncertainty	PoC needed?
Known tech, known operations	Unneeded
New tech, known operations	Tech PoC recommended
Known tech, new operations	Business PoC recommended
New tech, new operations	Multiple PoCs required
Research element	Exploration PoC + R&D (Research and Development)

Decision criterion 2: investment scale

The bigger serious investment, the higher PoC value. Serious PoC for small projects is excessive.

Serious investment	Recommended PoC
~JPY 5M	No PoC, serious development
JPY 5-30M	Lightweight PoC (1 month)
JPY 30M-100M	Serious PoC (2-3 months)
JPY 100M+	Multiple PoCs + phased

How to choose by case

New-tech selection / vendor comparison

1-2 week tech PoC + quantitative comparison table. Run 3 candidate companies’ products in same scenario, compare performance, usability, and cost. Concentrated 1-2 engineers, judgment criteria pre-agreed with clear thresholds for performance numbers and licensing fees.

AI / LLM utilization projects

1-4 week AI PoC + accuracy / cost / hallucination-rate evaluation. Verify with real-data samples from internal data, prototype with Dify / LangChain, Go judgment based on “business-usable accuracy X%+ + monthly inference cost within JPY Y.” Note risk of accuracy degradation at production scale.

Operations reform / RPA / workflow

2-3 month business PoC + field-user participation. Select 5-10 pilot users from business departments, have them use real operations for 1 month, measure time / mistake reduction. Judgment criteria are “achieving X hours monthly reduction” and user-satisfaction score.

Large-scale core reform / JPY 100M+

Multiple PoCs in parallel + phased decision gates. Run tech PoC, data PoC, business PoC in parallel, Go/No-Go judgment meeting after each PoC, serious-dev approval only after all pass. Vendor selection also via PoC for actual capability measurement.

PoC scale / period numerical gates

Note: Industry baseline values as of April 2026. Will become outdated as technology and the talent market shift, so requires periodic updates.

The iron rule for PoC is short and clear. Below are industry-standard guidelines.

PoC scale	Period	People	Budget guideline	Judgment criteria
Tech-selection PoC	1-2 weeks	1-2	~JPY 1M	Performance numbers + licensing fees
Single-feature feasibility PoC	1 month	2-3	JPY 1-5M	Whether technically running
AI/LLM PoC	1-4 weeks	2-3	JPY 1-5M	Accuracy + cost + hallucination rate
Business-included PoC	2-3 months	3-5	JPY 5-20M	Achieving business-time-reduction goal
Pre-serious-investment PoC	Within 3 months	5 or fewer	5-10% of serious investment	Multiple criteria simultaneously achieved

PoCs over 3 months are close to serious development, the sign to review scope. 5-10% of serious investment is the PoC-budget guideline - for JPY 100M projects, JPY 5-10M PoC budget is appropriate. In the AI era, 1-week PoC has become realistic.

PoC is within 3 months, 5 or fewer, numerical criteria required. Missing this falls into PoC hell of unable-to-judge.

PoC-design pitfalls and forbidden moves

Typical accident patterns in PoC. All end in iridescent reports of “sort of worked”.

Forbidden move	Why it’s bad
Start PoC without deciding Go/No-Go criteria	Disputes over “success/failure” after, report meetings drift
No upper bound for PoC period	Continues forever, “almost has answer” extends 6 months
Try to verify all features	Scope explosion, becomes indistinguishable from serious development
PoC-ize known tech / known operations	No point verifying, resource waste
Run with large headcount (10+) regime	Management cost exceeds effect, 5 or fewer principle
Inject PoC code as is into production	Quality not production-ready, rewrite required
Recognize No-Go as shame	Lost value of preventing serious failure, firestorms by forcing Go
Run 14 AI PoCs in parallel with zero conclusions	Org exhaustion, management loses AI expectation
Report PoC results with just running code	1-page summary + demo video lands with management
Confuse MVP / prototype / PoC in operations	Design breakdown from purpose mismatch, internal-judgment vs market-launch are different

Netflix’s “Test and Learn” culture (annually hundreds to thousands of A/B tests, each with success / failure conditions / period pre-coded, statistical-significance auto-judgment) is a success case systematically erasing un-judgable PoCs. In contrast, the 14-PoCs-parallel hell case (each department’s own, no judgment criteria, zero conclusions a year later) shows the cost of vague purposes and judgment criteria.

Go/No-Go criteria are PoC insurance and human-relations insurance. Write on A4 paper and sign.

| “PoC must not fail” — fearing failure | No-Go is a kind of success; has the value of preventing serious failure | | “Extending PoC period brings success” — dragging on | PoCs without answers don’t produce them with extension; reset needed |

AI decision axes

AI-era favorable	AI-era unfavorable
1-week PoC, high-frequency verification	3-month-fixed PoC plan
Multiple-case parallel verification	Verify only 1 case
AI-premised business design	Conventional-business PoC
Continuous small PoCs	One-shot large PoC

Pre-decide Go/No-Go criteria numerically — seal disputes from vague judgment
Narrow to most uncertain parts only — PoCs verifying known tech are waste
Few people, within 3 months — continues forever without deadline
Make weekly via AI utilization — high frequency, multiple-case parallel, fail fast learn fast

What to decide - what is your project’s answer?

For each of the following, try to articulate your project’s answer in 1-2 sentences. Starting work with these vague always invites later questions like “why did we decide this again?”

Verification purpose (what to prove)
Judgment criteria (Go/No-Go numbers)
Period (usually within 1-3 months)
Regime (few people, clear lead)
Verification scope (do, don’t do)
Outputs (code, report, demo)
Post-PoC progression (serious / stop / Pivot)

Author’s note - cases of “PoC hell” wasting a year

Cases of vague-purpose / judgment-criteria PoCs continuously exhausting orgs are repeatedly told.

A large enterprise, under the policy of “utilize generative AI in operations,” had each department launch generative AI PoCs independently, with the result of 14 PoCs running in parallel a year later, all “sort of worked” without conclusions, zero serious deployments - cases often reported. PoCs without judgment criteria become “content for report meetings” rather than success or failure, with field engineers exhausted, management losing AI expectations - falling into a vicious circle.

In contrast, Netflix’s “Test and Learn” culture is cited as a PoC-design success case. Netflix runs hundreds to thousands of feature A/B tests yearly, but pre-declares “success / failure conditions / period” in code for each test, with mechanisms automatically judging when results become statistically significant. Without waiting for human judgment, Go/No-Go gets decided mechanically, systematically erasing “unjudgable PoCs.”

Both show the truth from front and back that PoC value is producing decisions; PoCs not producing decisions are blocks of existence cost. Go/No-Go conditions are PoC insurance and human-relations insurance.

Summary

This article covered PoC design, including Go/No-Go criteria, period, regime, differences from MVP, AI-PoC specifics, and weekly-cycle-ization.

Pre-decide Go/No-Go, narrow to uncertain parts, cut at within 3 months, run weekly. That is the practical answer for PoC design in 2026.

And this was the final installment of the “Solution Architecture” category. Next time we’ll start a new category (Case Studies). Plan to dig into how the judgment axes learned in all categories so far combine in the field, through scale / phase-specific real-case comparisons.

Back to series TOC -> ‘Architecture Crash Course for the Generative-AI Era’: How to Read This Book

I hope you’ll read the next article as well.