Source: Webapp Performance Tuning Docs, 2026-05-04 ↔ 2026-05-14.
Cold-start cost in the old setup was dominated by ASP.NET’s runtime
CodeDom + csc.exe pipeline parsing and compiling
1,781 view files on the first user click after every restart.
View files in Cybersoft.Primero.Webpages | Count |
|---|---|
.aspx pages | 1,201 |
.ascx user controls | 557 |
.asmx service endpoints | 23 |
| Total compilable | 1,781 |
| Phase | What it does | Cost |
|---|---|---|
| 1. Web.config flip | Source defaults flipped from debug=true, batch=false to
debug=false, batch=true. Re-enables JIT optimization,
script bundling, and per-directory DLL batching. Per-cabinet overrides
via DEBUG_COMPILATION / BATCH_COMPILATION env vars. |
Zero bytes, zero runtime cost. |
| 2. Precompile + IIS Application Init | aspnet_compiler builds all 1,781 view DLLs at image
build time. IIS warmup hits /Warmup.aspx synchronously
before the pod is marked Ready. |
+50–250 MB image size. ~2.5 h build (drops to ~30–45 min after the codebase namespace mismatches are fixed by Cybersoft). |
| 3. NGEN + Multi-core JIT | Native-image precompile over Cybersoft.*.dll +
App_global.asax.dll at image build. ProfileOptimization
records and replays a JIT profile across w3wp restarts. |
~tens to low hundreds of MB image size. |
| 4. 64-bit app pool | Migrated DefaultAppPool from 32-bit to 64-bit. Removes the
~3.2 GB working-set ceiling. Access Database Engine swapped from x86 to
x64 to match (Excel-import path). |
+30 % steady-state memory, +30–50 % peak under burst. 80 MB ACE installer. 17-consumer Excel-import smoke sweep required before prod. |
Cybersoft.Primero.Webpages so batch-mode precompile becomes
viable (cuts build time from 2.5 h to 30–45 min).Source: DB Performance Tuning Docs, 2026-05-05 ↔
2026-05-14. Target: Primero_CodiacDEV, Azure SQL Server 2022 CU21, 16 vCPU.
| Metric | Baseline | After Tier-1 recompile strip | After scalar-UDF cleanup | Combined (current) |
|---|---|---|---|---|
| Requests / sec | 93.94 | — | 99.24 (+5.6 %) | 121.13 (+29 %) |
| 50th percentile latency | 420 ms | — | 380 ms | 370 ms |
| 95th percentile latency | 2400 ms | — | 2200 ms | 1500 ms (−37.5 %) |
| 99th percentile latency | 5600 ms | — | 5800 ms | 3800 ms (−32 %) |
Saturation test (100 users on the parallel warmup script): 191.35 RPS — up 79 % from baseline.
Why this matters. The two changes are super-additive — scalar-UDF cleanup alone delivered only +5.6 %, but combined with the recompile strip it’s +29 %. Each fix was holding the other back. Removing both unblocked the throughput floor.
OPTION(RECOMPILE) hints across 34
Tier-1 procs/functions. These were short, parameter-stable lookup
procs that were generating a fresh plan per execution — in one case
(NF_Setting_GetSettingValueByRegion), 2.14 million wasted
compiles. Server-level settings (OPTIMIZE FOR AD HOC WORKLOADS,
PARAMETERIZATION FORCED) are defeated by inline hints; the hints
had to go.NF_User_GetUserRoleRealmDescription to use
STRING_AGG. The function used the SQL 2008-era
SELECT @var = @var + col + ‘, ’ pattern, which
forces serialization on every consuming plan and defeats SQL 2019’s
scalar-UDF inlining feature. Mechanical change, low risk, big effect.CMN_Message_GetList_ByUser. Initial pass produced a
14× CPU regression. Root cause: the original plan already times out during
optimizer compilation (260 operators, 15 sorts, 62 nested loops). The proc
needs a structural change, not a surgical one. Recommendations
handed to the Cybersoft team (see below).CMN_Message_GetList_ByUser is the #1 CPU offender on the
click-around workload — ~1 million executions / 24 h, ~120 ms CPU each.
The structural fix is to materialize the multi-statement table-valued function
outputs (PR_GetUserRegionsWithParentsAndChildren,
NF_GetSitesForUser) into temp tables once at the top of the proc,
rather than expanding them inline 15–27 times in a single plan.
This is the highest-leverage next move on the database side. Owner: Cybersoft engineering, with Codiac advisory support.
Two runs, several days apart, against the same web app on two very different platforms. Both used Python Locust with comparable click-around scripts. Different databases and different days — so direct response-time comparison is unsafe; capacity ceiling comparison is fair.
| Field | Value |
|---|---|
| Target | VM-hosted QA, single fixed environment |
| Test script | locustfile_loadtest.py (random nav across the app) |
| Duration | 30 minutes (05:59 → 06:29 UTC) |
| Ramp | 0 → 250 sustained users at peak |
| Total requests | 934,009 |
| Total failures | 0 |
| Avg RPS over the test | 518.7 |
| p95 (test-wide aggregate) | 790 ms |
| p99 (test-wide aggregate) | 1,700 ms |
Source: Locust_2026-05-07-00h59_locustfile_loadtest.py_https___qa.primeroedge.co.html.
The May 7 test was never pushed beyond 400 users because the VMs are
known-fixed capacity and the team had no expectation 400 could be
exceeded.
| Field | Value |
|---|---|
| Target | Codiac internal test cabinet bens-perf, 8 copies (auto-scaled to that level via the burst-schedule platform feature) |
| Test script | Same locustfile.py family (random nav across the app) |
| Ramp design | Step-up: 100 → 500 → 1,000 → 2,000 → 3,500 → 5,000 → 8,000 → 12,000 → 20,000 users, 3 min/step |
| What survived | Steps 1–5 ran cleanly with measurements; steps 6+ saw the load generator itself crash repeatedly (Codiac stayed up) |
| Codiac at 3,500 users | 43 req/sec successful traffic, 17 % failure rate, p50 60 s, p95 81 s — degraded but still serving real customer traffic |
| Codiac at 5,000+ users | Locust runner OOM’d before Codiac did. Inconclusive. |
The honest comparison. Response-time numbers are not directly comparable across the two tests (different databases, different customer data shapes, different days). What is comparable is capacity ceiling: VMs sustained 250 users; Codiac at 8 copies was still serving real traffic at 3,500 users when the load generator broke. That’s an at-minimum 14× headroom multiplier with the true number unknown.
The 3,500-user run did show significant response-time degradation on the Codiac side. The likely shared bottleneck is the database (same shared database used in both Codiac perf-dev runs) — which is exactly why “Continue database optimization” and “Finish proving managed database services” sit in the Decisions section of the executive briefing. Adding more app copies past 8 won’t help if the database can’t take more callers.
head-to-head.mov — the platform under load; capacity
rises with traffic.reports.mov — the platform’s operations view: live
events, replica counts, conditions, logs.Source: prior cost analysis (primeroedge-performance-comparison.html,
May 8 2026). Reproduced here so this document stands alone.
| Bucket | Effort | Billed | Notes |
|---|---|---|---|
| Rescue work | ~643 commits (76 %) | $196,000 | Reverse-engineering and stabilising an undocumented IIS / SQL monolith. One-time cost. Does not recur for the next customer. |
| Platform onboarding | ~168 commits (20 %) | $51,000 | Tenant setup, deploy templates, base scaffolding. The work Codiac does for any new customer. |
| Training & enablement | ~30 hr (1 %) | $7,000 | 20 hours live (2 hr/week × 10 weeks) + prep. ⅓ Codiac platform; ⅔ general cloud & PrimeroEdge operating-model training. |
| Total | 841 commits + 30 hr | $253,640 | 10 months of active engagement |
Against ~$293K / month of recurring infrastructure savings (cost-savings document, May 8), the $253,640 pays back in ~26 days of operation. After that, the savings are pure.
| Item | Owner | Risk if not done |
|---|---|---|
| Excel-import smoke sweep (17 consumers) before Phase 4 to production | Cybersoft QA + Codiac | Regression on Excel-import workflows after 64-bit migration |
| Cabinet memory limit review post-Phase-4 soak | Codiac ops | OOM under burst load with x64 working set ~30–50 % larger than x86 |
| Cybersoft codebase fix: 100 namespace/directory mismatches | Cybersoft engineering | Build time stays at 2.5 h; otherwise drops to 30–45 min |
CMN_Message_GetList_ByUser structural rewrite |
Cybersoft engineering, Codiac advisory | Database CPU stays concentrated on the #1 offender; further RPS lift gated on this |
Tier-2 / Tier-3 OPTION(RECOMPILE) walk |
Cybersoft engineering | Compile rate stays elevated on medium-complexity procs |
| Production pilot (one real customer, both stacks running side-by-side) | Codiac + customer ops | Findings stay in test; recurring savings not realized |
Every number in this document is reproducible from one of:
Webapp Performance Tuning Docs/ archive — image
build metrics, runtime expectations, the 100 namespace mismatches list.DB Performance Tuning Docs/ archive — Locust CSVs,
Query Store snapshots, sp_BlitzCache outputs, per-experiment
write-ups.primeroedge-performance-comparison.html — the May 8
cost-savings document, fully sourced.https://codiacio.grafana.net/d/burst-test-t60/.