Your ERP has passed functional testing. Key users have signed off on business processes. UAT is complete. Then, on go-live day, the system crawls: orders take 15 seconds to save, the month-end close has been running for six hours without finishing, and the supplier portal throws 500 errors.
This scenario is far from unusual. According to a Panorama Consulting study, 51% of organisations experience operational disruptions during an ERP go-live. The recurring cause: functional testing proves the system works, but not that it scales. This guide walks through the methodology for planning, executing and interpreting ERP performance tests — with concrete acceptance thresholds by business process.
Why Functional Testing Is Not Enough
Classic failure modes that UAT never catches
UAT typically runs with 5 to 15 users, on a cut-down dataset, in an environment that is often under-provisioned. Production is a different world:
- Underestimated data volumes: an order management module tested with 200 records suddenly faces 50,000 daily lines, triggering SQL joins that grind to a halt.
- Missing indexes: test databases are too small to expose slow queries. A
SELECTthat takes 50 ms on 10,000 rows can take 12 seconds on 5 million. - Heavy customisations: bespoke workflows, cascading pricing engines and EDI interfaces consume resources that are invisible in UAT but critical under real load.
- Concurrent access: 300 simultaneous users hitting the same module generate database locks that 10 testers will never reproduce.
The real cost of a degraded go-live
A failed go-live is not a temporary inconvenience — the consequences are measurable:
- Productivity loss: every extra second of latency on a transaction repeated 500 times a day by 200 users adds up to hours of lost time per week.
- User adoption: a slow ERP fuels change-resistance. Users work around the system, create parallel spreadsheets, and your change management programme loses months of ground.
- Cost of rollback: according to Elevatiq, Zimmer Biomet pursued a $172 million lawsuit against Deloitte following a catastrophic SAP S/4HANA go-live in 2024, with estimated annual revenue losses of $75 million due to shipping delays.
These numbers involve a large enterprise, but the mechanics are identical for a mid-market business: a degraded go-live always costs more than a planned 4–6 week delay to fix the performance issues first.
The 4 Performance Tests to Run Before Go-Live
Load testing: simulate normal operating volumes
Load testing replicates standard operating conditions — expected concurrent users, average transaction volumes, access frequency by module. The goal is to confirm that the system meets its response-time targets under everyday usage.
Practical example: simulate 200 concurrent users entering purchase orders, 50 checking stock availability, and 20 running financial reports — all for a two-hour window. The system must maintain a 95th-percentile response time under 2 seconds.
Stress testing: push beyond the rated capacity
Stress testing ramps load beyond the nominal threshold (150%, 200%, 250%) to locate the breaking point. The objective is not for the system to hold — it is to understand how it breaks: graceful degradation or hard crash.
Why it matters: an ERP that degrades progressively (response times rising linearly) is manageable. An ERP that crashes at 120% of nominal load is an operational time-bomb.
Endurance (soak) testing: verify stability over 24–72 hours
Endurance testing sustains a normal load for an extended period to uncover memory leaks, temporary-table fragmentation, orphaned session accumulation and unreleased database locks.
Classic trap: many systems pass a 2-hour load test but degrade after 48 hours because the JVM garbage collector does not reclaim memory cleanly, or because application logs fill the disk to capacity.
Spike testing: simulate period-end closes and peak events
Spike testing injects a high volume of transactions over a very short window to simulate predictable events: month-end accounting close, promotional sales campaigns, batch EDI order imports, or payroll runs.
Example: inject 50,000 journal entries in 30 minutes to simulate a month-end close while 100 users remain active across other modules. The system must absorb the spike without a perceptible slowdown for those concurrent users.
Building Your Test Scenarios: 5 Critical Business Processes
Bulk order entry
Simulate 100 to 500 concurrent users entering purchase orders with 5 to 20 lines each. Include pricing calculations (cascading discounts, promotions, multi-country tax rules), stock availability checks (ATP), and automatic approval-workflow generation.
Target threshold: full order validation in under 3 seconds, on-screen confirmation in under 2 seconds.
Month-end accounting close
Reproduce the full sequence: depreciation calculation, currency revaluation, intercompany reconciliation, accrual journal generation, and journal close. Use real volumes from the last full financial year plus a 30% headroom buffer.
Target threshold: full close in under 4 hours for a mid-market single-entity; under 8 hours for a large multi-entity group.
Mass data import
Load CSV or EDI files with more than 100,000 lines (customer orders, supplier invoices, stock movements). Measure import time, rejection rate, and the impact on users already logged in during the load.
Target threshold: import throughput above 1,000 lines per second, rejection rate below 0.5%, and degradation in user response times below 20%.
Reporting and data extraction
Run 10 to 20 heavy reports simultaneously (aged debtors, stock valuation, P&L by cost centre) while operational users continue working normally.
Target threshold: standard report generated in under 30 seconds, complex report (multi-entity, multi-currency) in under 2 minutes.
Parallel approval workflows
Trigger 200 simultaneous approval workflows (purchase requisitions, expense claims, above-threshold orders). Verify that notifications are dispatched in under 5 seconds and that approvals chain through without a visible queue.
Recommended Testing Tools by ERP Vendor
SAP (S/4HANA, ECC)
- SAP Cloud ALM: native monitoring and testing tool for S/4HANA Cloud environments, with pre-built test scenarios.
- Apache JMeter: open-source, well-suited for HTTP/API load testing on SAP Fiori and OData. Free, broad community, but requires manual SAP scenario configuration.
- Gatling: high-throughput load testing, Scala or Java scripting. Particularly effective for the API-first architectures of S/4HANA Cloud.
- Tricentis NeoLoad: commercial solution with native recording for SAP GUI and Fiori transactions.
Oracle (Fusion Cloud, E-Business Suite)
- Oracle Application Testing Suite (OATS): Oracle’s native tool for functional and load testing.
- Selenium Grid + JMeter: a common combination for testing Fusion Cloud web interfaces.
- Micro Focus LoadRunner: industry standard with native Oracle protocols (Oracle NCA, Oracle Web).
Odoo and open-source ERP
- Locust: Python framework, ideal for scripting Odoo business scenarios via XML-RPC or JSON-RPC. Accessible for Python-native teams.
- k6: Grafana Labs tool, strong for REST API load testing. JavaScript scripting with native Grafana dashboards for result analysis.
- Artillery: YAML-driven, suited to rapid Odoo API ramp-up tests.
Cloud ERP (Dynamics 365, NetSuite, Workday)
- Azure Load Testing (for Dynamics 365): managed Azure service with native integration into Azure Monitor and Application Insights.
- AWS CloudWatch + Lambda scripts (for NetSuite): native monitoring with the ability to script load scenarios via SuiteScript.
- APM tools: Datadog, New Relic or Dynatrace for real-time monitoring throughout testing, regardless of cloud platform.
Cloud caveat: SaaS ERP vendors often impose API throttling limits. Check your contract quotas before running a load test — hitting rate limits will skew results by artificially rejecting requests before the system itself is under pressure.
Acceptance Thresholds: Key KPIs to Track
Response time by transaction type
Response time is the most visible KPI for end users. The baseline rule: a response time above 2 seconds for a routine transaction is perceived as slow by business users.
| Transaction type | Acceptable threshold (P95) | Critical threshold |
|---|---|---|
| Order entry | < 2 s | > 5 s |
| Item/customer lookup | < 1 s | > 3 s |
| Workflow approval | < 1.5 s | > 4 s |
| Standard report display | < 10 s | > 30 s |
| Full accounting close | < 4 h | > 8 h |
| Batch import (100,000 lines) | < 10 min | > 30 min |
These are operational benchmarks. Adjust them to your business context: a B2B e-commerce platform will have stricter order-entry requirements than an internal ERP managing back-office operations.
Throughput (transactions per second)
Throughput measures the system’s capacity to sustain a continuous volume of transactions. An ERP that takes 2 seconds per transaction but can only handle 10 per second will collapse when 200 users submit requests simultaneously.
Method: calculate nominal throughput (expected transactions per hour divided by 3,600), then test at 150% of that figure.
Resource utilisation (CPU, RAM, disk, network)
Infrastructure metrics reveal bottlenecks before they become visible to users:
- CPU: average utilisation below 70% under nominal load. Above 85%, there is insufficient headroom to absorb spikes.
- RAM: no active swap during testing. Once swapping starts, performance degrades non-linearly.
- Disk I/O: write latency below 5 ms on database volumes. ERPs running on HANA or Oracle Exadata are particularly sensitive to storage performance.
- Network: sufficient bandwidth between the application server and the database (frequently underestimated in multi-tier architectures).
Acceptable error rate
The error rate measures the percentage of transactions that fail (timeouts, 500 errors, deadlocks). Under nominal load testing, the acceptable ceiling is below 0.1%. Under stress testing (beyond nominal capacity), a rate of up to 1% may be tolerable — provided the system stabilises once the spike subsides.
Interpreting Results and Making the Go/No-Go Call
Go/No-Go decision matrix
After test execution, structure the decision around three outcomes:
GO: all KPIs within acceptable thresholds under load and spike conditions. Stress testing shows progressive degradation (no hard crash). Endurance testing reveals no memory leaks. Clear the go-live.
Conditional GO: one or two KPIs slightly above threshold (less than 20% overshoot), with a remediation plan that is identified and scoped. Go-live can proceed if the fixes are deployable as a hotfix within D+1 to D+7.
NO-GO: one critical KPI breached (response times three times the threshold or worse, error rate above 1% under nominal load, system crash at 120% stress). Mandatory go-live delay, with a validated remediation plan and a full re-test cycle before a new date is set.
Rapid remediation levers
If thresholds are missed, the most effective levers — in order of impact — are:
- SQL optimisation: adding indexes, rewriting the slowest queries (analyse via explain plan or SQL trace). Usually the fastest and highest-impact fix.
- Vertical scaling: increasing RAM or CPU on the database server. Effective short-term, limited long-term.
- Horizontal scaling: adding application servers to distribute load (relevant for web/Fiori architectures).
- Application tuning: disabling verbose logging, adjusting cache parameters, optimising batch jobs.
- Customisation refactoring: rewriting bespoke developments identified as resource-heavy during testing.
The classic mistake is running tests in an environment that is significantly smaller than production. The performance-testing environment must match production hardware configuration, or represent at least 80% of target capacity with a documented correction factor applied to the results.
To go further on operational go-live readiness, read our ERP UAT testing checklist and our ERP API integration testing methodology. Together with this article, they form a complete pre-go-live validation path.