Solution Architecture

Non-Functional Requirements - 'Don't Stop' Has No Price Tag

Non-Functional Requirements - 'Don't Stop' Has No Price Tag

About this article

As the third installment of the “Solution Architecture” category in the series “Architecture Crash Course for the Generative-AI Era,” this article explains non-functional requirements.

Functional requirements can be written by business; non-functional requirements can’t be written without specialists. Vague non-functional requirements cause post-completion firestorms of “running but slow / stops / ops is hell.” This article handles numerically quantifying performance, availability, security, and operability, IPA non-functional requirement grades, and AI-era non-functional-test automation.

What are non-functional requirements in the first place

In a nutshell, non-functional requirements are “rules that define not ‘what the system does’ but ‘how well it runs.’”

Think of earthquake resistance and insulation specs when building a house. The floor plan (functional requirements) can be decided by the residents, but “can it withstand a magnitude 6 earthquake?” “can it maintain winter room temperature at a certain degree?” — only specialists can design these. And raising the seismic rating after construction is essentially a rebuild. Software is the same: without settling quality standards like “respond within 1 second” or “maintain 99.9% monthly uptime” upfront, you end up with the post-completion firestorm of “runs but slow, stops, ops is hell.”

Why non-functional requirements are needed

Prevent “completed but unusable”

Even with perfect features, a 10-second-response system isn’t used. Without numerically settled, disputes arise at acceptance.

Becomes basis for cost estimates

“99.9% uptime” and “99.99% uptime” sometimes have 5x build-cost differences. Estimates only emerge once numbers are settled.

Alignment with regulatory requirements

Finance / medical / personal info often have non-functional-requirement levels decided by law - without early clarification, violation risk emerges.

Main NFR categories

IPA’s (Information-technology Promotion Agency) non-functional requirement grades are the standard classification in Japan. Comprehensive organization in 6 major items.

flowchart TB
    NFR([Non-functional requirements])
    AVAIL[Availability<br/>uptime 99.9% etc.]
    PERF[Performance / scalability<br/>response, TPS]
    OPS[Operability / maintainability<br/>monitoring/backup]
    MIG[Migratability<br/>data/env migration]
    SEC[Security<br/>authentication/encryption]
    ENV[System environment<br/>OS/browser premise]
    NFR --> AVAIL
    NFR --> PERF
    NFR --> OPS
    NFR --> MIG
    NFR --> SEC
    NFR --> ENV
    BAD[Make non-functional vague<br/>= post-completion firestorm]
    BAD -.|common failure| NFR
    classDef root fill:#fef3c7,stroke:#d97706,stroke-width:2px;
    classDef cat fill:#dbeafe,stroke:#2563eb;
    classDef bad fill:#fee2e2,stroke:#dc2626;
    class NFR root;
    class AVAIL,PERF,OPS,MIG,SEC,ENV cat;
    class BAD bad;
CategoryContent
AvailabilityDegree of not stopping
Performance / scalabilitySpeed, scale
Operability / maintainabilityEase of operations
MigratabilityEase of migration
SecurityWhether protected
System environmentPremise environment, requirements

IPA’s non-functional-requirement grades are a free-to-use template, widely used in Japanese companies.

Availability

Define how much the system doesn’t stop. Same viewpoint as SLO content - numerical quantification required.

MetricContentTypical
UptimeWhat % running monthly99.9% (43 min monthly down)
RTORecovery target time on failure1 hour
RPOAllowed data-loss time15 min
MTBFMean time between failures30 days
MTTRMean time to repair30 min

Promising 99.99% is the extremely strict level allowing only 4.3 min monthly down. Cost spikes - choose levels matching business requirements.

Performance

Define how fast and many can be processed. Per business nature, clearly numerically quantify response time and throughput.

MetricContentTypical example
Response timePer-request processing timeWithin 300ms at P95
ThroughputPer-unit-time processing count1000 req/sec
Concurrent connectionsParallel users10,000
Peak multiplierPeak-time load10x normal
LatencyNetwork delayUnder 50ms

For “within 3 seconds response,” clarify whether average or max. Usually defined at P95 / P99 (95 / 99 percentile) - the modern way.

Scalability

Define whether you can respond to future growth. Beyond service-launch scale, design including growth predictions for years ahead.

MetricContent
HorizontalCan add servers to handle
VerticalCan boost CPU / memory to handle
DataDB-capacity growth
User10x / 100x growth
GeographicOverseas expansion

Designing all from the start is excessive, but scenarios for phased expansion need consideration.

Operability / maintainability

Define ease of operations. Weakness here makes ops-team load explode, increasing incidents.

ItemContent
BackupFrequency, retention, restoration test
MonitoringWhat to monitor at what frequency
Log retentionPeriod, capacity
DeployFrequency, downtime
DocumentationOperational manual setup
On-call24/7 response regime

If “premising outsourced operations,” define levels outsourceable.

Security

Define levels to protect. Levels vary with handled-data sensitivity and law/regulation.

ItemContent
AuthenticationMFA (Multi-Factor Authentication) required, password strength
AuthorizationPermission design, least privilege
EncryptionCommunication, storage, key management
Audit logsRetention, tamper-prevention
Vulnerability responsePatch-application SLA
Penetration testingFrequency, scope

Always weave in regulatory requirements like Personal Information Protection Act, GDPR, and PCI DSS.

Migratability

Define ease of migration from existing systems. Cases where projects break down from migration-plan-design lack are many - shouldn’t be underestimated.

ItemContent
Data-migration methodBulk / phased
Parallel operationNew-old coexistence period
RollbackReversion procedure, conditions
System-stop timeAt cutover
User trainingEducation plan
Business-stop impactBusiness-department coordination

Countermeasures against NFR gaps

Many easily-forgotten items in NFR. Use comprehensive checklists like IPA’s non-functional-requirement grades to eliminate gaps.

Easily-forgotten itemsContent
Browser-support scopeIE11? Latest Chrome only?
Character encodingUTF-8, emoji support
TimezoneUTC, JST, multiple regions
Multilingual supporti18n (internationalization), L10n (localization)
AccessibilityWCAG (Web Content Accessibility Guidelines) 2.1 compliance
Disaster countermeasureDR (Disaster Recovery), geo-distribution
Log retentionLegal requirements

These are items easily firestormed with “didn’t support that” after completion. Define from start.

Relationship with SLA / SLO

NFR closely links with SLA / SLO. SLA is external-contractual promise, NFR is target value at design.

NFRSLOSLA
PhaseAt designAt operationAt contract
NatureTargetInternal targetExternal contract
On violationDesign changeImprovement investmentPenalty / reduction

For NFRs affecting SLA (availability, performance), the iron rule is setting stricter than SLA.

Decision criterion 1: system nature

Strictness of NFR varies with business importance and disclosure scope.

System natureAvailability guideline
Internal tools99%
General B2C services99.9%
B2B SaaS99.95%
Finance / payments99.99%
Power / telecom99.999%

Decision criterion 2: org regime

Realizability of NFR varies with ops-team scale. Without 24/7 regime, can’t keep 99.99%.

Ops regimePossible availability
Business hours only99%
Extended hours99.5%
24/7 on-call99.9%
24/7 SRE (Site Reliability Engineering) dedicated99.95%
Multi-region / Follow-the-Sun99.99%+

How to choose by case

In-house tools / business-hours use

Availability 99% + response 3 sec + daily backup. IPA non-functional grade equivalent to “Model System 1.” SLA unneeded, RTO 24 hours / RPO 1 day enough. Security covers minimum with internal ID linkage + TLS.

General B2C web service

Availability 99.9% + P95 500ms + 24/7 on-call + auto backup. IPA “Model 2,” introduce SLO management, PII (Personally Identifiable Information) masking for Personal Information Protection Act, annual pentest. Optimize cost via AWS / GCP managed services.

B2B SaaS / enterprise customers

Availability 99.95% + SLA contract + 7-year audit logs + SOC 2 (US standard auditing service-organization security/availability) compliance. IPA “Model 3” equivalent, individual SLA agreements per customer, RTO / RPO clearly stated in contract, eyeing ISO 27001 acquisition. Include multi-tenant separation design in NFR.

Finance / payments / medical

99.99%+ availability + multi-region DR + FISC / PCI DSS / HIPAA compliance. IPA “Model 4,” 24/7 dedicated SRE, annual pentest / quarterly vulnerability scans, encryption with FIPS 140-2-certified HSM (Hardware Security Module), tamper-proof audit logs. NFR integrated with regulations themselves.

Service-type x NFR numerical gates

Note: Industry baseline values as of April 2026. Will become outdated as technology and the talent market shift, so requires periodic updates.

NFR is the area where discussion starts the moment numbers are agreed. Below is the industry-standard correspondence table.

Service typeAvailabilityRTORPOResponse time (P95)Monthly-cost guideline
Internal tools99%24 hours1 day3 secTens of thousands of yen
General B2C web99.9%1 hour15 min500msHundreds of thousands
B2B SaaS99.95%30 min5 min300msHundreds of thousands to millions
Finance / payments99.99%5 min1 min100msMillions+
Telecom / power99.999%30 sec10 sec50msTens of millions+

The empirical rule: 99.9% and 99.99% have several-times build-cost differences. Even when business requests “non-stopping system,” presenting numerically often gets “99.9% is enough.” IPA non-functional-requirement grades are Japan’s standard checklist, comprehensively covering easily-forgotten items (timezone, browser support, i18n, WCAG, etc.).

Don’t stopdiscussion only starts when presented numerically. In words, never aligns.

NFR-design pitfalls and forbidden moves

Typical accident patterns in NFR. All produce systems that run but unusable.

Forbidden moveWhy it’s bad
Decide NFR laterThe Knight Capital incident (phased deployment / auto-rollback missing, $440M loss in 45 min)
Vaguely agree availability “as high as possible”Without numbers, design and quote impossible
Apply “somehow 99.99%” to all systemsSeveral-times cost difference 99.9% vs 99.99%, over-investment
Define response time by averageSlow 1% of users invisible, measure with P95 / P99
Promise 99.99% without ops regimeCan’t keep without 24/7 dedicated SRE
Add disaster countermeasures (DR) at the endAfter-fitting costs 10x, design from start
Release with undefined browser-support scopePointed out “not supporting IE11” after completion, major revision
Forget timezone / character encodingFatal bugs in overseas deployment / emoji
Ignore WCAG (accessibility)Risk of violating revised Act for Eliminating Discrimination against Persons with Disabilities, effective April 2024
Don’t test-ize NFRJust written in design doc, no one verifies, surfaces in production

The 2012 Knight Capital incident had complete absence of NFR like phased deployment (Canary), auto-rollback, and monitoring as the lethal blow (details in appendix “Critical Incident Cases”). The case of a major EC site’s new UI release for year-end shopping causing 30-second delays and SNS firestorm, emergency rollback to old UI 4 hours later also shows the cost of undefined response-time NFR.

NFR is insurance preventing “running but unusable”. Define numerically from the start.

| NFR is decided later” — postponing | Later additions cost 10x; decide first is the iron rule | | “99.9% and 99.99% aren’t much different” — being casual | 43 min/month vs 4.3 min/month, build cost several times; choose levels matching business requirements |

AI decision axes

AI-era favorableAI-era unfavorable
Numerically-quantified NFRWords like “fast” “safe”
Automated load testsOne-time manual tests
Automated security scansManual review only
SLO-based monitoringThreshold-based
  1. Decide numerically first — postponement is 10x cost, vagueness is fatal
  2. Comprehensive check via IPA grades — even easily-forgotten items (timezone, browser, etc.) without gaps
  3. Align with ops regime — can’t keep 99.99% without 24/7, levels matching capability
  4. Auto-test-ize in CI/CD — continuously verify performance / security of AI-generated code

What to decide - what is your project’s answer?

For each of the following, try to articulate your project’s answer in 1-2 sentences. Starting work with these vague always invites later questions like “why did we decide this again?”

  • Availability target (99.X%, RTO, RPO)
  • Performance target (response time, throughput)
  • Scalability (growth scenarios)
  • Operational requirements (monitoring, backup, on-call)
  • Security level (authentication, encryption, audit)
  • Migration plan (parallel operation, rollback)
  • Comprehensive check (IPA non-functional-requirement grade, etc.)

Author’s note - cases of “no NFR” producing firestorms

Cases of postponing NFR and firestorm-ing are continuously told in the SI industry.

The 2012 Knight Capital incident is symbolic of the result of underestimating NFR (especially deploy safety) - subsequent investigation determined “complete absence of NFR like phased deployment (Canary), auto-rollback, and monitoring” was the lethal blow (details in appendix “Critical Incident Cases”).

Another, Amazon Prime’s first-year Prime Day stoppage is also cited. Sloppy estimation of performance NFR for peak traffic caused checkout to be down for hours, with estimated billion-dollar-class opportunity loss. Thereafter, Amazon updates performance requirements actuals-based quarterly and built in mechanisms auto-verifying with chaos engineering.

Domestically too, a major EC site released new UI for year-end shopping, with undefined response-time NFR causing 30-second-plus response delays at peak, SNS firestorm, and emergency rollback to old UI 4 hours later - cases continuously told. “Even with completed features, undefined non-functional makes the system unusable” - this reality is repeatedly slapped home in the history of NFR underestimation.

Summary

This article covered non-functional requirements design, including availability, performance, operations, security, IPA grades, SLA/SLO relationship, and AI-era auto-test-ization.

Decide numerically first, comprehensive via IPA, align with ops regime, auto-test-ize. That is the practical answer for NFR design in 2026.

Next time we’ll cover “estimation and ROI.” Plan to dig into the practice of 3-point estimation, buffers, 3-year ROI, break-even points, and how to build numbers to pass approvals.

Back to series TOC -> ‘Architecture Crash Course for the Generative-AI Era’: How to Read This Book

I hope you’ll read the next article as well.