Performance Benchmarks
TopGun includes an automated load harness that tests the Rust server under realistic conditions. This page presents the results and explains what the numbers mean for real applications.
Test Methodology
The load harness boots a full TopGun server instance in-process (all 7 domain services, partition dispatcher, WebSocket handler) and runs configurable scenarios:
- Connections: 200 concurrent WebSocket connections
- Duration: 30 seconds per test
- Payload: OpBatch messages containing CRDT write operations
- Measurement: HDR histograms for latency, throughput counters for ops/sec
The harness source code is at packages/server-rust/benches/load_harness/.
Fire-and-Wait (Round-Trip Latency)
In fire-and-wait mode, each connection sends an OpBatch, waits for the server’s OP_ACK, and records the round-trip latency before sending the next batch. This measures end-to-end request latency including server processing and acknowledgement.
| Metric | Measured | Baseline floor |
|---|---|---|
| Throughput | 37,000+ ops/sec | 30,000 ops/sec |
| p50 latency | 1.5ms | 5ms max |
| Acked ratio | >= 80% | >= 80% |
What this means
Fire-and-wait latency represents the worst case for a client that needs confirmation before proceeding. In practice, TopGun clients write locally first (zero latency to the user) and sync in the background, so server-side latency does not affect the user experience.
Note: The CI baseline tracks p50 latency for regression detection. Running
cargo bench --bench load_harnesslocally prints the full HDR histogram including p95 and p99 percentiles in the terminal output.
Fire-and-Forget (Raw Throughput)
In fire-and-forget mode, connections send batches as fast as possible without waiting for acknowledgement. This measures the server’s maximum ingestion rate.
| Metric | Measured | Baseline floor |
|---|---|---|
| Throughput | 480,000+ ops/sec | 380,000 ops/sec |
| p50 latency | < 1,000 ms | < 1,000 ms |
What this means
480,000+ ops/sec throughput means the server can handle large numbers of concurrent active users, each writing multiple times per second, on a single node. For context (these are aspirational ceilings for a single Rust server node with default settings; production capacity depends on payload size, query complexity, network egress, and hardware):
- A collaborative document editor generating 10 ops/sec per user can support 48,000+ concurrent editors
- A real-time dashboard ingesting sensor data at 100 ops/sec per source can handle 4,800+ data sources
- A chat application sending 1 message/sec per user can support 480,000+ active chatters
See performance tuning for production capacity planning guidance.
Measurement provenance
The numbers above were measured on 2026-04-18 (two consecutive runs: 483K, 487K ops/sec fire-and-forget) on an M1 Max MacBook Pro with the load_harness driving 200 concurrent WebSocket connections against an in-process server. The harness retries on ENOBUFS with exponential backoff (introduced by SPEC-214) to avoid macOS kernel-buffer exhaustion artifacts.
An earlier 2026-03-27 measurement reported 560K ops/sec fire-and-forget; that figure was retired after SPEC-214’s break-on-ENOBUFS fix surfaced that the prior harness was over-counting kernel-buffered drops. The current 480K+ figure is the post-fix steady-state measurement.
Baseline Thresholds
The load harness enforces pass/fail thresholds defined in baseline.json. These are FLOORS (minimum acceptable), not the measured numbers above:
| Mode | Metric | Floor (baseline.json) | Measured (2026-04-18) |
|---|---|---|---|
| Fire-and-wait | Min ops/sec | 30,000 | 37,000+ |
| Fire-and-wait | Max p50 latency | 5ms | 1.5ms |
| Fire-and-forget | Min ops/sec | 380,000 | 480,000+ |
| Fire-and-forget | Max p50 latency | 1,000ms | < 1,000ms |
| Both | Regression tolerance | 20% | — |
These thresholds are checked in CI. A regression greater than 20% from baseline triggers a warning.
Running Benchmarks Yourself
You can reproduce these results on your own hardware:
# Quick smoke test (50 connections, 10 seconds)
cargo bench --bench load_harness -- --connections 50 --duration 10
# Full run (200 connections, 30 seconds)
cargo bench --bench load_harness
# Fire-and-forget throughput test
cargo bench --bench load_harness -- --fire-and-forget --interval 0
# Write results as JSON for automated comparison
cargo bench --bench load_harness -- --json-output
Results are printed as ASCII tables in the terminal. Add --json-output to write results as JSON for automated comparison.
Hardware Considerations
Benchmark results vary with hardware. The numbers above were measured on an M1 Max MacBook Pro (2026-04-18). Key factors:
- CPU cores: More cores improve throughput (tokio uses a multi-threaded runtime)
- Memory: The in-process harness runs server and clients in the same process, requiring more RAM than a standalone server
- OS: Linux generally provides better networking performance than macOS for high-connection-count scenarios
For production capacity planning, run the load harness on hardware similar to your deployment target. See performance tuning for production configuration guidance.