Non-Functional Requirements - 'Don't Stop' Has No Price Tag

About this article

As the third installment of the “Solution Architecture” category in the series “Architecture Crash Course for the Generative-AI Era,” this article explains non-functional requirements.

Functional requirements can be written by business; non-functional requirements can’t be written without specialists. Vague non-functional requirements cause post-completion firestorms of “running but slow / stops / ops is hell.” This article handles numerically quantifying performance, availability, security, and operability, IPA non-functional requirement grades, and AI-era non-functional-test automation.

What are non-functional requirements in the first place

In a nutshell, non-functional requirements are “rules that define not ‘what the system does’ but ‘how well it runs.’”

Think of earthquake resistance and insulation specs when building a house. The floor plan (functional requirements) can be decided by the residents, but “can it withstand a magnitude 6 earthquake?” “can it maintain winter room temperature at a certain degree?” — only specialists can design these. And raising the seismic rating after construction is essentially a rebuild. Software is the same: without settling quality standards like “respond within 1 second” or “maintain 99.9% monthly uptime” upfront, you end up with the post-completion firestorm of “runs but slow, stops, ops is hell.”

Why non-functional requirements are needed

Prevent “completed but unusable”

Even with perfect features, a 10-second-response system isn’t used. Without numerically settled, disputes arise at acceptance.

Becomes basis for cost estimates

“99.9% uptime” and “99.99% uptime” sometimes have 5x build-cost differences. Estimates only emerge once numbers are settled.

Alignment with regulatory requirements

Finance / medical / personal info often have non-functional-requirement levels decided by law - without early clarification, violation risk emerges.

Main NFR categories

IPA’s (Information-technology Promotion Agency) non-functional requirement grades are the standard classification in Japan. Comprehensive organization in 6 major items.

flowchart TB
    NFR([Non-functional requirements])
    AVAIL[Availability<br/>uptime 99.9% etc.]
    PERF[Performance / scalability<br/>response, TPS]
    OPS[Operability / maintainability<br/>monitoring/backup]
    MIG[Migratability<br/>data/env migration]
    SEC[Security<br/>authentication/encryption]
    ENV[System environment<br/>OS/browser premise]
    NFR --> AVAIL
    NFR --> PERF
    NFR --> OPS
    NFR --> MIG
    NFR --> SEC
    NFR --> ENV
    BAD[Make non-functional vague<br/>= post-completion firestorm]
    BAD -.|common failure| NFR
    classDef root fill:#fef3c7,stroke:#d97706,stroke-width:2px;
    classDef cat fill:#dbeafe,stroke:#2563eb;
    classDef bad fill:#fee2e2,stroke:#dc2626;
    class NFR root;
    class AVAIL,PERF,OPS,MIG,SEC,ENV cat;
    class BAD bad;

Category	Content
Availability	Degree of not stopping
Performance / scalability	Speed, scale
Operability / maintainability	Ease of operations
Migratability	Ease of migration
Security	Whether protected
System environment	Premise environment, requirements

IPA’s non-functional-requirement grades are a free-to-use template, widely used in Japanese companies.

Availability

Define how much the system doesn’t stop. Same viewpoint as SLO content - numerical quantification required.

Metric	Content	Typical
Uptime	What % running monthly	99.9% (43 min monthly down)
RTO	Recovery target time on failure	1 hour
RPO	Allowed data-loss time	15 min
MTBF	Mean time between failures	30 days
MTTR	Mean time to repair	30 min

Promising 99.99% is the extremely strict level allowing only 4.3 min monthly down. Cost spikes - choose levels matching business requirements.

Performance

Define how fast and many can be processed. Per business nature, clearly numerically quantify response time and throughput.

Metric	Content	Typical example
Response time	Per-request processing time	Within 300ms at P95
Throughput	Per-unit-time processing count	1000 req/sec
Concurrent connections	Parallel users	10,000
Peak multiplier	Peak-time load	10x normal
Latency	Network delay	Under 50ms

For “within 3 seconds response,” clarify whether average or max. Usually defined at P95 / P99 (95 / 99 percentile) - the modern way.

Scalability

Define whether you can respond to future growth. Beyond service-launch scale, design including growth predictions for years ahead.

Metric	Content
Horizontal	Can add servers to handle
Vertical	Can boost CPU / memory to handle
Data	DB-capacity growth
User	10x / 100x growth
Geographic	Overseas expansion

Designing all from the start is excessive, but scenarios for phased expansion need consideration.

Operability / maintainability

Define ease of operations. Weakness here makes ops-team load explode, increasing incidents.

Item	Content
Backup	Frequency, retention, restoration test
Monitoring	What to monitor at what frequency
Log retention	Period, capacity
Deploy	Frequency, downtime
Documentation	Operational manual setup
On-call	24/7 response regime

If “premising outsourced operations,” define levels outsourceable.

Security

Define levels to protect. Levels vary with handled-data sensitivity and law/regulation.

Item	Content
Authentication	MFA (Multi-Factor Authentication) required, password strength
Authorization	Permission design, least privilege
Encryption	Communication, storage, key management
Audit logs	Retention, tamper-prevention
Vulnerability response	Patch-application SLA
Penetration testing	Frequency, scope

Always weave in regulatory requirements like Personal Information Protection Act, GDPR, and PCI DSS.

Migratability

Define ease of migration from existing systems. Cases where projects break down from migration-plan-design lack are many - shouldn’t be underestimated.

Item	Content
Data-migration method	Bulk / phased
Parallel operation	New-old coexistence period
Rollback	Reversion procedure, conditions
System-stop time	At cutover
User training	Education plan
Business-stop impact	Business-department coordination

Countermeasures against NFR gaps

Many easily-forgotten items in NFR. Use comprehensive checklists like IPA’s non-functional-requirement grades to eliminate gaps.

Easily-forgotten items	Content
Browser-support scope	IE11? Latest Chrome only?
Character encoding	UTF-8, emoji support
Timezone	UTC, JST, multiple regions
Multilingual support	i18n (internationalization), L10n (localization)
Accessibility	WCAG (Web Content Accessibility Guidelines) 2.1 compliance
Disaster countermeasure	DR (Disaster Recovery), geo-distribution
Log retention	Legal requirements

These are items easily firestormed with “didn’t support that” after completion. Define from start.

Relationship with SLA / SLO

NFR closely links with SLA / SLO. SLA is external-contractual promise, NFR is target value at design.

	NFR	SLO	SLA
Phase	At design	At operation	At contract
Nature	Target	Internal target	External contract
On violation	Design change	Improvement investment	Penalty / reduction

For NFRs affecting SLA (availability, performance), the iron rule is setting stricter than SLA.

Decision criterion 1: system nature

Strictness of NFR varies with business importance and disclosure scope.

System nature	Availability guideline
Internal tools	99%
General B2C services	99.9%
B2B SaaS	99.95%
Finance / payments	99.99%
Power / telecom	99.999%

Decision criterion 2: org regime

Realizability of NFR varies with ops-team scale. Without 24/7 regime, can’t keep 99.99%.

Ops regime	Possible availability
Business hours only	99%
Extended hours	99.5%
24/7 on-call	99.9%
24/7 SRE (Site Reliability Engineering) dedicated	99.95%
Multi-region / Follow-the-Sun	99.99%+

How to choose by case

In-house tools / business-hours use

Availability 99% + response 3 sec + daily backup. IPA non-functional grade equivalent to “Model System 1.” SLA unneeded, RTO 24 hours / RPO 1 day enough. Security covers minimum with internal ID linkage + TLS.

General B2C web service

Availability 99.9% + P95 500ms + 24/7 on-call + auto backup. IPA “Model 2,” introduce SLO management, PII (Personally Identifiable Information) masking for Personal Information Protection Act, annual pentest. Optimize cost via AWS / GCP managed services.

B2B SaaS / enterprise customers

Availability 99.95% + SLA contract + 7-year audit logs + SOC 2 (US standard auditing service-organization security/availability) compliance. IPA “Model 3” equivalent, individual SLA agreements per customer, RTO / RPO clearly stated in contract, eyeing ISO 27001 acquisition. Include multi-tenant separation design in NFR.

Finance / payments / medical

99.99%+ availability + multi-region DR + FISC / PCI DSS / HIPAA compliance. IPA “Model 4,” 24/7 dedicated SRE, annual pentest / quarterly vulnerability scans, encryption with FIPS 140-2-certified HSM (Hardware Security Module), tamper-proof audit logs. NFR integrated with regulations themselves.

Service-type x NFR numerical gates

Note: Industry baseline values as of April 2026. Will become outdated as technology and the talent market shift, so requires periodic updates.

NFR is the area where discussion starts the moment numbers are agreed. Below is the industry-standard correspondence table.

Service type	Availability	RTO	RPO	Response time (P95)	Monthly-cost guideline
Internal tools	99%	24 hours	1 day	3 sec	Tens of thousands of yen
General B2C web	99.9%	1 hour	15 min	500ms	Hundreds of thousands
B2B SaaS	99.95%	30 min	5 min	300ms	Hundreds of thousands to millions
Finance / payments	99.99%	5 min	1 min	100ms	Millions+
Telecom / power	99.999%	30 sec	10 sec	50ms	Tens of millions+

The empirical rule: 99.9% and 99.99% have several-times build-cost differences. Even when business requests “non-stopping system,” presenting numerically often gets “99.9% is enough.” IPA non-functional-requirement grades are Japan’s standard checklist, comprehensively covering easily-forgotten items (timezone, browser support, i18n, WCAG, etc.).

“Don’t stop” discussion only starts when presented numerically. In words, never aligns.

NFR-design pitfalls and forbidden moves

Typical accident patterns in NFR. All produce systems that run but unusable.

Forbidden move	Why it’s bad
Decide NFR later	The Knight Capital incident (phased deployment / auto-rollback missing, $440M loss in 45 min)
Vaguely agree availability “as high as possible”	Without numbers, design and quote impossible
Apply “somehow 99.99%” to all systems	Several-times cost difference 99.9% vs 99.99%, over-investment
Define response time by average	Slow 1% of users invisible, measure with P95 / P99
Promise 99.99% without ops regime	Can’t keep without 24/7 dedicated SRE
Add disaster countermeasures (DR) at the end	After-fitting costs 10x, design from start
Release with undefined browser-support scope	Pointed out “not supporting IE11” after completion, major revision
Forget timezone / character encoding	Fatal bugs in overseas deployment / emoji
Ignore WCAG (accessibility)	Risk of violating revised Act for Eliminating Discrimination against Persons with Disabilities, effective April 2024
Don’t test-ize NFR	Just written in design doc, no one verifies, surfaces in production

The 2012 Knight Capital incident had complete absence of NFR like phased deployment (Canary), auto-rollback, and monitoring as the lethal blow (details in appendix “Critical Incident Cases”). The case of a major EC site’s new UI release for year-end shopping causing 30-second delays and SNS firestorm, emergency rollback to old UI 4 hours later also shows the cost of undefined response-time NFR.

NFR is insurance preventing “running but unusable”. Define numerically from the start.

| “NFR is decided later” — postponing | Later additions cost 10x; decide first is the iron rule | | “99.9% and 99.99% aren’t much different” — being casual | 43 min/month vs 4.3 min/month, build cost several times; choose levels matching business requirements |

AI decision axes

AI-era favorable	AI-era unfavorable
Numerically-quantified NFR	Words like “fast” “safe”
Automated load tests	One-time manual tests
Automated security scans	Manual review only
SLO-based monitoring	Threshold-based

Decide numerically first — postponement is 10x cost, vagueness is fatal
Comprehensive check via IPA grades — even easily-forgotten items (timezone, browser, etc.) without gaps
Align with ops regime — can’t keep 99.99% without 24/7, levels matching capability
Auto-test-ize in CI/CD — continuously verify performance / security of AI-generated code

What to decide - what is your project’s answer?

For each of the following, try to articulate your project’s answer in 1-2 sentences. Starting work with these vague always invites later questions like “why did we decide this again?”

Availability target (99.X%, RTO, RPO)
Performance target (response time, throughput)
Scalability (growth scenarios)
Operational requirements (monitoring, backup, on-call)
Security level (authentication, encryption, audit)
Migration plan (parallel operation, rollback)
Comprehensive check (IPA non-functional-requirement grade, etc.)

Author’s note - cases of “no NFR” producing firestorms

Cases of postponing NFR and firestorm-ing are continuously told in the SI industry.

The 2012 Knight Capital incident is symbolic of the result of underestimating NFR (especially deploy safety) - subsequent investigation determined “complete absence of NFR like phased deployment (Canary), auto-rollback, and monitoring” was the lethal blow (details in appendix “Critical Incident Cases”).

Another, Amazon Prime’s first-year Prime Day stoppage is also cited. Sloppy estimation of performance NFR for peak traffic caused checkout to be down for hours, with estimated billion-dollar-class opportunity loss. Thereafter, Amazon updates performance requirements actuals-based quarterly and built in mechanisms auto-verifying with chaos engineering.

Domestically too, a major EC site released new UI for year-end shopping, with undefined response-time NFR causing 30-second-plus response delays at peak, SNS firestorm, and emergency rollback to old UI 4 hours later - cases continuously told. “Even with completed features, undefined non-functional makes the system unusable” - this reality is repeatedly slapped home in the history of NFR underestimation.

Summary

This article covered non-functional requirements design, including availability, performance, operations, security, IPA grades, SLA/SLO relationship, and AI-era auto-test-ization.

Decide numerically first, comprehensive via IPA, align with ops regime, auto-test-ize. That is the practical answer for NFR design in 2026.

Next time we’ll cover “estimation and ROI.” Plan to dig into the practice of 3-point estimation, buffers, 3-year ROI, break-even points, and how to build numbers to pass approvals.

Back to series TOC -> ‘Architecture Crash Course for the Generative-AI Era’: How to Read This Book

I hope you’ll read the next article as well.