Solution Architecture

Non-Functional Requirements - 'Don't Stop' Has No Price Tag

Non-Functional Requirements - 'Don't Stop' Has No Price Tag

About this article

As the third installment of the “Solution Architecture” category in the series “Architecture Crash Course for the Generative-AI Era,” this article explains non-functional requirements.

Functional requirements can be written by business; non-functional requirements can’t be written without specialists. Vague non-functional requirements cause post-completion firestorms of “running but slow / stops / ops is hell.” This article handles numerically quantifying performance, availability, security, and operability, IPA non-functional requirement grades, and AI-era non-functional-test automation.

What are non-functional requirements in the first place

Six Major Non-functional Requirement Categories

In a nutshell, non-functional requirements are “rules that define not ‘what the system does’ but ‘how well it runs.’”

Think of earthquake resistance and insulation specs when building a house. The floor plan (functional requirements) can be decided by the residents, but “can it withstand a magnitude 6 earthquake?” “can it maintain winter room temperature at a certain degree?” — only specialists can design these. And raising the seismic rating after construction is essentially a rebuild. Software is the same: without settling quality standards like “respond within 1 second” or “maintain 99.9% monthly uptime” upfront, you end up with the post-completion firestorm of “runs but slow, stops, ops is hell.”

Why non-functional requirements are needed

Prevent “completed but unusable”

Even with perfect features, a 10-second-response system isn’t used. Without numerically settled, disputes arise at acceptance.

Becomes basis for cost estimates

“99.9% uptime” and “99.99% uptime” sometimes have 5x build-cost differences. Estimates only emerge once numbers are settled.

Alignment with regulatory requirements

Finance / medical / personal info often have non-functional-requirement levels decided by law - without early clarification, violation risk emerges.

Main NFR categories

IPA’s non-functional requirement grades are the standard classification in Japan. Comprehensive organization in 6 major items.

Six Major Items of IPA Non-Functional Requirements Grade Like a house's earthquake/insulation rating. Upgrading after completion is nearly a rebuild Non-Functional Requirements Grade (IPA Standard) 1 Availability Degree to which the system stays up Uptime: 99.9% (43 min/month downtime) RTO: Recovery Time Objective RPO: Recovery Point Objective 99.99% costs several times more to build 2 Performance & Scalability Speed & Scale Response Time: P95 within 300ms Throughput: 1000 req/sec Concurrent Connections: 10,000 Peak Ratio: 10x normal 3 Operability & Maintainability Ease of Operations Backup: Frequency & Retention Monitoring: What & How Often Deployment: Frequency & Downtime On-call: 24/7 Coverage 4 Portability Ease of Migration Parallel operation period & rollback procedures Data migration method & validation plan Phased cutover & rollback criteria 5 Security Are they being protected Auth: MFA required, password strength Encryption: Transport, storage, key management Audit Logs: Retention & tamper prevention 6 System Environment Prerequisite Environment & Requirements Supported Browsers, OS, Devices Timezone & Character Encoding WCAG (Accessibility) Cost Guideline: Internal tools(99%) ¥10Ks → B2C(99.9%) ¥100Ks → Finance(99.99%) ¥Ms → Telecom(99.999%) ¥10Ms/mo "Never goes down" only becomes discussable when presented as numbers. Words never align
CategoryContent
AvailabilityDegree of not stopping
Performance / scalabilitySpeed, scale
Operability / maintainabilityEase of operations
MigratabilityEase of migration
SecurityWhether protected
System environmentPremise environment, requirements

IPA’s non-functional-requirement grades are a free-to-use template, widely used in Japanese companies.

Availability

Define how much the system doesn’t stop. Same viewpoint as SLO content - numerical quantification required.

MetricContentTypical
UptimeWhat % running monthly99.9% (43 min monthly down)
RTORecovery target time on failure1 hour
RPOAllowed data-loss time15 min
MTBFMean time between failures30 days
MTTRMean time to repair30 min

Promising 99.99% is the extremely strict level allowing only 4.3 min monthly down. Cost multiplies several-fold - choose levels matching business requirements.

Performance

Define how fast and many can be processed. Per business nature, clearly numerically quantify response time and throughput.

MetricContentTypical example
Response timePer-request processing timeWithin 300ms at P95
ThroughputPer-unit-time processing count1000 req/sec
Concurrent connectionsParallel users10,000
Peak multiplierPeak-time load10x normal
LatencyNetwork delayUnder 50ms

For “within 3 seconds response,” clarify whether average or max. Usually defined at P95 / P99 (95 / 99 percentile) - the modern way.

Scalability

Define whether you can respond to future growth. Beyond service-launch scale, design including growth predictions for years ahead.

MetricContent
HorizontalCan add servers to handle
VerticalCan boost CPU / memory to handle
DataDB-capacity growth
User10x / 100x growth
GeographicOverseas expansion

Designing all from the start is excessive, but scenarios for phased expansion need consideration.

Operability / maintainability

Define ease of operations. Weakness here makes ops-team load explode, increasing incidents.

ItemContent
BackupFrequency, retention, restoration test
MonitoringWhat to monitor at what frequency
Log retentionPeriod, capacity
DeployFrequency, downtime
DocumentationOperational manual setup
On-call24/7 response regime

If “premising outsourced operations,” define levels outsourceable.

Security

Define levels to protect. Levels vary with handled-data sensitivity and law/regulation.

ItemContent
AuthenticationMFA required, password strength
AuthorizationPermission design, least privilege
EncryptionCommunication, storage, key management
Audit logsRetention, tamper-prevention
Vulnerability responsePatch-application SLA
Penetration testingFrequency, scope

Always weave in regulatory requirements like Personal Information Protection Act, GDPR, and PCI DSS.

Migratability

Define ease of migration from existing systems. Cases where projects break down from migration-plan-design lack are many - shouldn’t be underestimated.

ItemContent
Data-migration methodBulk / phased
Parallel operationNew-old coexistence period
RollbackReversion procedure, conditions
System-stop timeAt cutover
User trainingEducation plan
Business-stop impactBusiness-department coordination

Countermeasures against NFR gaps

Many easily-forgotten items in NFR. Use comprehensive checklists like IPA’s non-functional-requirement grades to eliminate gaps.

Easily-forgotten itemsContent
Browser-support scopeIE11? Latest Chrome only?
Character encodingUTF-8, emoji support
TimezoneUTC, JST, multiple regions
Multilingual supporti18n (internationalization), L10n (localization)
AccessibilityWCAG 2.1 compliance
Disaster countermeasureDR, geo-distribution
Log retentionLegal requirements

These are items easily firestormed with “didn’t support that” after completion. Define from start.

Relationship with SLA / SLO

NFR closely links with SLA / SLO. SLA is external-contractual promise, NFR is target value at design.

NFRSLOSLA
PhaseAt designAt operationAt contract
NatureTargetInternal targetExternal contract
On violationDesign changeImprovement investmentPenalty / reduction

For NFRs affecting SLA (availability, performance), the iron rule is setting stricter than SLA.

Decision criterion 1: system nature

Strictness of NFR varies with business importance and disclosure scope.

System natureAvailability guideline
Internal tools99%
General B2C services99.9%
B2B SaaS99.95%
Finance / payments99.99%
Power / telecom99.999%

Decision criterion 2: org regime

Realizability of NFR varies with ops-team scale. Without 24/7 regime, can’t keep 99.99%.

Ops regimePossible availability
Business hours only99%
Extended hours99.5%
24/7 on-call99.9%
24/7 SRE dedicated99.95%
Multi-region / Follow-the-Sun99.99%+

How to choose by case

In-house tools / business-hours use

Availability 99% + response 3 sec + daily backup. IPA non-functional grade equivalent to “Model System 1.” SLA unneeded, RTO 24 hours / RPO 1 day enough. Security covers minimum with internal ID linkage + TLS.

General B2C web service

Availability 99.9% + P95 500ms + 24/7 on-call + auto backup. IPA “Model 2,” introduce SLO management, PII masking for Personal Information Protection Act, annual pentest. Optimize cost via AWS / GCP managed services.

B2B SaaS / enterprise customers

Availability 99.95% + SLA contract + 7-year audit logs + SOC 2 (US standard auditing service-organization security/availability) compliance. IPA “Model 3” equivalent, individual SLA agreements per customer, RTO / RPO clearly stated in contract, eyeing ISO 27001 acquisition. Include multi-tenant separation design in NFR.

Finance / payments / medical

99.99%+ availability + multi-region DR + FISC / PCI DSS / HIPAA compliance. IPA “Model 4,” 24/7 dedicated SRE, annual pentest / quarterly vulnerability scans, encryption with FIPS 140-2-certified HSM, tamper-proof audit logs. NFR integrated with regulations themselves.

Service-type x NFR numerical gates

Note: Industry baseline values as of April 2026. Will become outdated as technology and the talent market shift, so requires periodic updates.

NFR is the area where discussion starts the moment numbers are agreed. Below is the industry-standard correspondence table.

Service typeAvailabilityRTORPOResponse time (P95)Monthly-cost guideline
Internal tools99%24 hours1 day3 secTens of thousands of yen
General B2C web99.9%1 hour15 min500msHundreds of thousands
B2B SaaS99.95%30 min5 min300msHundreds of thousands to millions
Finance / payments99.99%5 min1 min100msMillions+
Telecom / power99.999%30 sec10 sec50msTens of millions+

The empirical rule: 99.9% and 99.99% have several-times build-cost differences. Even when business requests “non-stopping system,” presenting numerically often gets “99.9% is enough.” IPA non-functional-requirement grades are Japan’s standard checklist, comprehensively covering easily-forgotten items (timezone, browser support, i18n, WCAG, etc.).

Don’t stopdiscussion only starts when presented numerically. In words, never aligns.

NFR-design pitfalls and forbidden moves

Typical accident patterns in NFR. All produce systems that run but unusable.

Forbidden moveWhy it’s bad
Decide NFR laterThe Knight Capital incident (phased deployment / auto-rollback missing, $440M loss in 45 min)
Vaguely agree availability “as high as possible”Without numbers, design and quote impossible
Apply “somehow 99.99%” to all systemsSeveral-times cost difference 99.9% vs 99.99%, over-investment
Define response time by averageSlow 1% of users invisible, measure with P95 / P99
Promise 99.99% without ops regimeCan’t keep without 24/7 dedicated SRE
Add disaster countermeasures (DR) at the endAfter-fitting costs 10x, design from start
Release with undefined browser-support scopePointed out “not supporting IE11” after completion, major revision
Forget timezone / character encodingFatal bugs in overseas deployment / emoji
Ignore WCAG (accessibility)Risk of violating revised Act for Eliminating Discrimination against Persons with Disabilities, effective April 2024
Don’t test-ize NFRJust written in design doc, no one verifies, surfaces in production

The 2012 Knight Capital incident had complete absence of NFR like phased deployment (Canary), auto-rollback, and monitoring as the lethal blow (details in appendix “Critical Incident Cases”). The case of a major EC site’s new UI release for year-end shopping causing 30-second delays and SNS firestorm, emergency rollback to old UI 4 hours later also shows the cost of undefined response-time NFR.

NFR is insurance preventing “running but unusable”. Define numerically from the start.

| NFR is decided later” — postponing | Later additions cost 10x; decide first is the iron rule | | “99.9% and 99.99% aren’t much different” — being casual | 43 min/month vs 4.3 min/month, build cost several times; choose levels matching business requirements |

AI decision axes

AI-era favorableAI-era unfavorable
Numerically-quantified NFRWords like “fast” “safe”
Automated load testsOne-time manual tests
Automated security scansManual review only
SLO-based monitoringThreshold-based
  1. Decide numerically first — postponement is 10x cost, vagueness is fatal
  2. Comprehensive check via IPA grades — even easily-forgotten items (timezone, browser, etc.) without gaps
  3. Align with ops regime — can’t keep 99.99% without 24/7, levels matching capability
  4. Auto-test-ize in CI/CD — continuously verify performance / security of AI-generated code

Auto-generating NFR tests with AI

When NFR is defined numerically, AI can generate corresponding test code with high accuracy. Given a requirement like “response within 200ms, 1000 concurrent users,” it can produce k6 or Locust load-test scripts nearly as-is.

Flow from Non-Functional Requirements to AI-Automated Test Code Generation If numbers are defined, AI can generate test code with high accuracy Non-Functional Requirements (Numerical) Response Time Within 200ms (P95) Concurrent Connections 1,000 Users Uptime 99.9% RTO 1 Hour AI AI Test Code Generation k6 Load Test Script export default function() { check(res, {P95<200ms}) } Locust / Gatling / Artillery also applicable CI/CD Automated Verification Auto-run on every deployment Pass: P95=180ms ✓ / 1000VU ✓ Auto-stop deployment if failed Security scanning also automated Without Numbers (NG Example) "Make it fast" "Don't let it go down" Only vague words Cannot Convert Test Generation Impossible Can't write tests without pass criteria Discovered in Production "Works but slow or crashes" Conclusion: The habit of quantifying non-functional requirements is not just document quality but also a prerequisite for AI test automation

Conversely, word-only NFR like “make it fast” or “don’t let it stop” makes test generation impossible. Without criteria for what constitutes a pass, neither AI nor humans can write tests.

The habit of numerically defining NFR from the start is now mandatory not just for document quality but as a prerequisite for AI-powered test automation. Use IPA’s non-functional requirement grades to cover items comprehensively, and put concrete numbers on each item at the early stage.

AI-generated infra design goes haywire without NFR numbers

When you ask AI to propose infra configurations, vague NFR leads to either over-design or under-design. Writing just “high availability” can yield a full multi-region, multi-AZ, auto-failover setup, with monthly cost several times over expectations.

Numbers change the game. Specifying “99.9% uptime, RPO 1 hour, RTO 4 hours” enables AI to correctly judge that single-region, multi-AZ with daily backup suffices.

This problem isn’t AI-specific — human infra engineers face the same issue — but AI differs in “outputting without asking for confirmation.” A human would ask “do you really need 99.99%?” but AI does its best with given conditions. Locking down NFR numbers at the requirements-definition stage becomes the safeguard against AI-era infra design going haywire.

What to decide - what is your project’s answer?

For each of the following, try to articulate your project’s answer in 1-2 sentences. Starting work with these vague always invites later questions like “why did we decide this again?”

  • Availability target (99.X%, RTO, RPO)
  • Performance target (response time, throughput)
  • Scalability (growth scenarios)
  • Operational requirements (monitoring, backup, on-call)
  • Security level (authentication, encryption, audit)
  • Migration plan (parallel operation, rollback)
  • Comprehensive check (IPA non-functional-requirement grade, etc.)

Author’s note - cases of “no NFR” producing firestorms

Cases of postponing NFR and firestorm-ing are continuously told in the SI industry.

The 2012 Knight Capital incident is symbolic of the result of underestimating NFR (especially deploy safety) - subsequent investigation determined “complete absence of NFR like phased deployment (Canary), auto-rollback, and monitoring” was the lethal blow (details in appendix “Critical Incident Cases”).

Another, Amazon Prime’s first-year Prime Day stoppage is also cited. Sloppy estimation of performance NFR for peak traffic caused checkout to be down for hours, with estimated billion-dollar-class opportunity loss. Thereafter, Amazon updates performance requirements actuals-based quarterly and built in mechanisms auto-verifying with chaos engineering.

Domestically too, a major EC site released new UI for year-end shopping, with undefined response-time NFR causing 30-second-plus response delays at peak, SNS firestorm, and emergency rollback to old UI 4 hours later - cases continuously told. “Even with completed features, undefined non-functional makes the system unusable” - this reality is repeatedly slapped home in the history of NFR underestimation.

https://en.senkohome.com/arch-intro-solution-overview/ https://en.senkohome.com/arch-intro-solution-poc/ https://en.senkohome.com/arch-intro-solution-requirements/

Summary

This article covered non-functional requirements design, including availability, performance, operations, security, IPA grades, SLA/SLO relationship, and AI-era auto-test-ization.

Decide numerically first, comprehensive via IPA, align with ops regime, auto-test-ize. That is the practical answer for NFR design in 2026.

Next time we’ll cover “estimation and ROI.” Plan to dig into the practice of 3-point estimation, buffers, 3-year ROI, break-even points, and how to build numbers to pass approvals.

Back to series TOC -> ‘Architecture Crash Course for the Generative-AI Era’: How to Read This Book

I hope you’ll read the next article as well.

📚 Series: Architecture Crash Course for the Generative-AI Era (77/89)