Solution Architecture

PoC Design - PoCs Ending With 'Sort of Worked' Are All Failures

PoC Design - PoCs Ending With 'Sort of Worked' Are All Failures

About this article

As the fifth installment (final) of the “Solution Architecture” category in the series “Architecture Crash Course for the Generative-AI Era,” this article explains PoC design.

PoC is investment to produce decisions - PoCs without answers are failures. This article handles pre-defined Go/No-Go criteria, period setting (within 3 months), differences from MVP, AI-PoC specifics (accuracy, hallucination rate), and weekly PoC cycles - design that doesn’t end with “so what?”

What is PoC in the first place

Think of a tasting session. Before officially adding a new dish to the menu, you prepare a small batch to verify the taste, cost, and operations. The purpose is to test small before committing to full investment — “is it really good?” “does it justify the cost?”

PoC (Proof of Concept) is the IT version of a tasting session. Before investing tens of millions to hundreds of millions into serious development, you verify technical and business feasibility with a small prototype and obtain Go/No-Go decision material.

Without a PoC, jumping straight into serious development means the entire investment is wasted the moment technical impossibility is discovered. PoC is the mechanism for keeping failures small.

Why PoC is needed

Lower risk before serious development

Before serious investment of tens of millions to hundreds of millions, verifying with millions minimizes loss on failure.

Reduce uncertainty

New tech, new operations, AI usage - many elements unknowable without trying. Getting reliable info via PoC is rational.

Build basis for decisions

In scenes where convincing management requires demonstration, actually-running small prototypes are more eloquent than anything.

PoC vs prototype vs MVP

PoC, prototype, and MVP are similar but different. With different purposes, design policies differ too.

TypePurposeUsers
PoC (Proof of Concept)Verify “feasibility”Internal stakeholders
PrototypeVerify “usability”Some users
MVP (Minimum Viable Product)Minimum form for market launchReal users

PoC is internal experiment for Go / No-Go judgment, MVP is product measuring whether value emerges in market. Confusing them breaks down design.

What to verify in PoC

PoC doesn’t verify everything - the iron rule is narrowing to the most uncertain parts. Define “what proven lets us proceed to serious development.”

Verification targetExample
Tech feasibilityWhether it really works with this tech
Performance achievabilityWhether processing speed meets requirements
Business fitWhether used in the field
Data qualityWhether expected results emerge with data
Cost validityWhether buildable at expected cost
Vendor capabilityWhether candidates really can do it

PoCs verifying already-known things are waste. Choose only unknown / uncertain parts.

What not to verify in PoC

PoC also clarifies scope not to verify. Vagueness here makes PoC bloat and become indistinguishable from serious development.

Shouldn’t verifyReason
Fine UI designHandled in serious phase after PoC
ScalabilityHard to judge at small scale
Full production dataSamples enough
Already-verified techNo point PoC-ing
All-feature implementationScope explosion

Go/No-Go judgment criteria

The most important PoC design element is the Go/No-Go judgment criteria. By pre-deciding “if this number is achieved, Go; if not, No-Go,” prevents post-PoC emotional disputes.

flowchart TB
    START([PoC start<br/>document criteria upfront])
    EXEC[PoC execution<br/>1-3 months]
    M1{Processing time<br/>P95 <= 500ms?}
    M2{Accuracy >= 90%?}
    M3{Cost<br/>within 1.5x of expected?}
    GO[Go: serious-dev approval]
    PIVOT[Pivot: scope change]
    NOGO[No-Go: stop<br/>consider alternatives]
    START --> EXEC --> M1
    M1 -->|Yes| M2
    M1 -->|No| NOGO
    M2 -->|Yes| M3
    M2 -->|partial| PIVOT
    M2 -->|No| NOGO
    M3 -->|Yes| GO
    M3 -->|No| PIVOT
    classDef start fill:#fef3c7,stroke:#d97706;
    classDef step fill:#dbeafe,stroke:#2563eb;
    classDef good fill:#dcfce7,stroke:#16a34a,stroke-width:2px;
    classDef pivot fill:#fae8ff,stroke:#a21caf;
    classDef bad fill:#fee2e2,stroke:#dc2626;
    class START,EXEC start;
    class M1,M2,M3 step;
    class GO good;
    class PIVOT pivot;
    class NOGO bad;

Starting PoC without pre-deciding judgment criteria is the worst. Disputes over “is this success or failure?” after finishing.

PoC period design

The principle for PoC is short, with clear deadline. Realistic to keep within 3 months at most - longer means scope too wide.

PeriodSuited PoC
1-2 weeksTech selection, vendor evaluation
1 monthSingle-feature feasibility
2-3 monthsVerification including operations
3+ monthsCloser to serious development than PoC

The trap of PoC is continues forever without setting deadline. The iron rule is delimiting time and producing answers.

PoC regime

The principle for PoC is few people, short concentration. Large-scale regime can’t move and decisions delay.

RolePeople guideline
Architect (lead)1
Engineer1-3
Business expert1
Project manager0.5
External vendor (when needed)1-2

The ideal is 5 or fewer, minimizing communication cost. Large-scale PoCs have management cost exceeding effect.

Gap between PoC and serious development

PoC working doesn’t mean serious-dev success. PoC just shows “can be done”; scale, operations, maintenance are separate issues.

Areas insufficient in PoCContent
Large-scale data100x, 1000x scale
Concurrent usersBehavior with simultaneous use
Production operations24/7 ops, incident response
SecurityProduction-level countermeasures
GovernancePermissions, audits
Other-system integrationReal-env connections

PoC success != project success. Designing how to leverage PoC results in serious development is also important.

AI PoC specifics

AI / ML PoC need verification axes different from conventional. Beyond “running,” “produces business value” and “can maintain accuracy continuously” matter.

Special verification itemsContent
Data qualityWhether learning data is sufficient
Accuracy / recallLevel usable in business
HallucinationWrong-answer rate for LLMs (Large Language Models)
Continuous learningTemporal accuracy degradation
ExplainabilityTransparency of judgment reasons
CostInference-cost actuals

In LLM PoCs, “may significantly exceed expectations on one hand, accuracy drops at scale” - careful evaluation is needed.

Choices after PoC

After PoC, choose from 3 options. Ending in No-Go is also fine PoC outcome.

OptionContent
Go (serious development)Goal achieved, serious investment starts
No-Go (stop)Hard to realize, consider alternative approach
Pivot (direction change)Partial success, re-consider with scope change

Culture of not seeing No-Go as shame is important. Failed PoCs are successes preventing failure in serious investment. Some orgs reward failed PoCs.

PoC outputs

PoC outputs aren’t just running code - judgment documents are included too. The iron rule is leaving in form management can read.

OutputContent
Working prototypeVerification code
Evaluation reportMeasurement results, judgment
Go / No-Go recommendationNext recommended action
Risk listNotes for serious development
Estimate (refined version)Re-calculated ROI for serious dev
Demo videoFor management

Working code + 1-page summary is most effective for management.

Decision criterion 1: uncertainty level

The higher project uncertainty, the higher PoC value. With known tech and similar-project experience, PoC unneeded.

UncertaintyPoC needed?
Known tech, known operationsUnneeded
New tech, known operationsTech PoC recommended
Known tech, new operationsBusiness PoC recommended
New tech, new operationsMultiple PoCs required
Research elementExploration PoC + R&D (Research and Development)

Decision criterion 2: investment scale

The bigger serious investment, the higher PoC value. Serious PoC for small projects is excessive.

Serious investmentRecommended PoC
~JPY 5MNo PoC, serious development
JPY 5-30MLightweight PoC (1 month)
JPY 30M-100MSerious PoC (2-3 months)
JPY 100M+Multiple PoCs + phased

How to choose by case

New-tech selection / vendor comparison

1-2 week tech PoC + quantitative comparison table. Run 3 candidate companies’ products in same scenario, compare performance, usability, and cost. Concentrated 1-2 engineers, judgment criteria pre-agreed with clear thresholds for performance numbers and licensing fees.

AI / LLM utilization projects

1-4 week AI PoC + accuracy / cost / hallucination-rate evaluation. Verify with real-data samples from internal data, prototype with Dify / LangChain, Go judgment based on “business-usable accuracy X%+ + monthly inference cost within JPY Y.” Note risk of accuracy degradation at production scale.

Operations reform / RPA / workflow

2-3 month business PoC + field-user participation. Select 5-10 pilot users from business departments, have them use real operations for 1 month, measure time / mistake reduction. Judgment criteria are “achieving X hours monthly reduction” and user-satisfaction score.

Large-scale core reform / JPY 100M+

Multiple PoCs in parallel + phased decision gates. Run tech PoC, data PoC, business PoC in parallel, Go/No-Go judgment meeting after each PoC, serious-dev approval only after all pass. Vendor selection also via PoC for actual capability measurement.

PoC scale / period numerical gates

Note: Industry baseline values as of April 2026. Will become outdated as technology and the talent market shift, so requires periodic updates.

The iron rule for PoC is short and clear. Below are industry-standard guidelines.

PoC scalePeriodPeopleBudget guidelineJudgment criteria
Tech-selection PoC1-2 weeks1-2~JPY 1MPerformance numbers + licensing fees
Single-feature feasibility PoC1 month2-3JPY 1-5MWhether technically running
AI/LLM PoC1-4 weeks2-3JPY 1-5MAccuracy + cost + hallucination rate
Business-included PoC2-3 months3-5JPY 5-20MAchieving business-time-reduction goal
Pre-serious-investment PoCWithin 3 months5 or fewer5-10% of serious investmentMultiple criteria simultaneously achieved

PoCs over 3 months are close to serious development, the sign to review scope. 5-10% of serious investment is the PoC-budget guideline - for JPY 100M projects, JPY 5-10M PoC budget is appropriate. In the AI era, 1-week PoC has become realistic.

PoC is within 3 months, 5 or fewer, numerical criteria required. Missing this falls into PoC hell of unable-to-judge.

PoC-design pitfalls and forbidden moves

Typical accident patterns in PoC. All end in iridescent reports of “sort of worked”.

Forbidden moveWhy it’s bad
Start PoC without deciding Go/No-Go criteriaDisputes over “success/failure” after, report meetings drift
No upper bound for PoC periodContinues forever, “almost has answer” extends 6 months
Try to verify all featuresScope explosion, becomes indistinguishable from serious development
PoC-ize known tech / known operationsNo point verifying, resource waste
Run with large headcount (10+) regimeManagement cost exceeds effect, 5 or fewer principle
Inject PoC code as is into productionQuality not production-ready, rewrite required
Recognize No-Go as shameLost value of preventing serious failure, firestorms by forcing Go
Run 14 AI PoCs in parallel with zero conclusionsOrg exhaustion, management loses AI expectation
Report PoC results with just running code1-page summary + demo video lands with management
Confuse MVP / prototype / PoC in operationsDesign breakdown from purpose mismatch, internal-judgment vs market-launch are different

Netflix’s “Test and Learn” culture (annually hundreds to thousands of A/B tests, each with success / failure conditions / period pre-coded, statistical-significance auto-judgment) is a success case systematically erasing un-judgable PoCs. In contrast, the 14-PoCs-parallel hell case (each department’s own, no judgment criteria, zero conclusions a year later) shows the cost of vague purposes and judgment criteria.

Go/No-Go criteria are PoC insurance and human-relations insurance. Write on A4 paper and sign.

| “PoC must not fail” — fearing failure | No-Go is a kind of success; has the value of preventing serious failure | | “Extending PoC period brings success” — dragging on | PoCs without answers don’t produce them with extension; reset needed |

AI decision axes

AI-era favorableAI-era unfavorable
1-week PoC, high-frequency verification3-month-fixed PoC plan
Multiple-case parallel verificationVerify only 1 case
AI-premised business designConventional-business PoC
Continuous small PoCsOne-shot large PoC
  1. Pre-decide Go/No-Go criteria numerically — seal disputes from vague judgment
  2. Narrow to most uncertain parts only — PoCs verifying known tech are waste
  3. Few people, within 3 months — continues forever without deadline
  4. Make weekly via AI utilization — high frequency, multiple-case parallel, fail fast learn fast

What to decide - what is your project’s answer?

For each of the following, try to articulate your project’s answer in 1-2 sentences. Starting work with these vague always invites later questions like “why did we decide this again?”

  • Verification purpose (what to prove)
  • Judgment criteria (Go/No-Go numbers)
  • Period (usually within 1-3 months)
  • Regime (few people, clear lead)
  • Verification scope (do, don’t do)
  • Outputs (code, report, demo)
  • Post-PoC progression (serious / stop / Pivot)

Author’s note - cases of “PoC hell” wasting a year

Cases of vague-purpose / judgment-criteria PoCs continuously exhausting orgs are repeatedly told.

A large enterprise, under the policy of “utilize generative AI in operations,” had each department launch generative AI PoCs independently, with the result of 14 PoCs running in parallel a year later, all “sort of worked” without conclusions, zero serious deployments - cases often reported. PoCs without judgment criteria become “content for report meetings” rather than success or failure, with field engineers exhausted, management losing AI expectations - falling into a vicious circle.

In contrast, Netflix’s “Test and Learn” culture is cited as a PoC-design success case. Netflix runs hundreds to thousands of feature A/B tests yearly, but pre-declares “success / failure conditions / period” in code for each test, with mechanisms automatically judging when results become statistically significant. Without waiting for human judgment, Go/No-Go gets decided mechanically, systematically erasing “unjudgable PoCs.”

Both show the truth from front and back that PoC value is producing decisions; PoCs not producing decisions are blocks of existence cost. Go/No-Go conditions are PoC insurance and human-relations insurance.

Summary

This article covered PoC design, including Go/No-Go criteria, period, regime, differences from MVP, AI-PoC specifics, and weekly-cycle-ization.

Pre-decide Go/No-Go, narrow to uncertain parts, cut at within 3 months, run weekly. That is the practical answer for PoC design in 2026.

And this was the final installment of the “Solution Architecture” category. Next time we’ll start a new category (Case Studies). Plan to dig into how the judgment axes learned in all categories so far combine in the field, through scale / phase-specific real-case comparisons.

Back to series TOC -> ‘Architecture Crash Course for the Generative-AI Era’: How to Read This Book

I hope you’ll read the next article as well.