Benchmarking
Ze includes ze-perf, a standalone tool for measuring BGP route propagation
latency through a device under test (DUT). It works with any BGP implementation,
not just Ze.
Architecture
+-----------+
ze-perf | | ze-perf
(sender) ----> | DUT | ----> (receiver)
AS 65001 | AS 65000 | AS 65002
+-----------+
The sender establishes a BGP session with the DUT, injects routes, and the
receiver measures when those routes arrive. Both sessions are managed by
ze-perf in a single process. Timing starts when the sender writes the first
UPDATE and stops when the receiver has collected all expected prefixes.
Quick Start
Run a benchmark against Ze on localhost:
# Start ze with a config that accepts peers from 127.0.0.1 and 127.0.0.2
ze test-config.conf &
# Run the benchmark
ze-perf run --dut-addr 127.0.0.1 --dut-asn 65000 --dut-name ze --routes 1000
With JSON output saved to a file:
ze-perf run --dut-addr 127.0.0.1 --dut-asn 65000 --dut-name ze \
--routes 1000 --output result-ze.json
Running Benchmarks
For a complete flag reference, see ze-perf run.
Encoding Modes
Three encoding modes measure different code paths through the DUT:
| Mode | Flag | What It Tests |
|---|---|---|
| IPv4 inline NLRI | --family ipv4/unicast (default) |
Standard IPv4 unicast path with NLRI at the end of the UPDATE |
| IPv4 force-MP | --family ipv4/unicast --force-mp |
IPv4 encoded via MP_REACH_NLRI attribute (exercises multiprotocol path) |
| IPv6 MP | --family ipv6/unicast |
IPv6 unicast via MP_REACH_NLRI (standard for non-IPv4 families) |
Multi-Iteration
By default, ze-perf run executes 5 iterations with 1 warmup run. The warmup
run is discarded, and outliers beyond 2 standard deviations from the median
convergence time are removed (minimum 3 iterations kept). Final results report
median and standard deviation across the kept iterations.
| Flag | Default | Purpose |
|---|---|---|
--repeat |
5 |
Total iterations (including warmup) |
--warmup-runs |
1 |
Warmup iterations discarded from results |
--iter-delay |
3s |
Pause between iterations for clean separation |
More iterations improve statistical confidence. For reliable results, use at
least --repeat 10:
ze-perf run --dut-addr 172.31.0.2 --dut-asn 65000 --repeat 10 --warmup-runs 2
Timing
| Flag | Default | Purpose |
|---|---|---|
--warmup |
2s |
Delay after session establishment before injecting routes |
--connect-timeout |
10s |
TCP connection timeout |
--duration |
60s |
Maximum wait time for convergence per iteration |
--iter-delay |
3s |
Delay between iterations |
The --iter-delay flag controls whether iterations run back-to-back or with
a pause. Longer delays give the DUT time to settle between measurements.
Cross-Implementation Comparison
Automated Docker Runner
The included test runner benchmarks all five supported implementations in Docker:
| DUT | Image | Config | Forwarding mechanism |
|---|---|---|---|
| Ze | ze-interop (built) | test/perf/configs/ze.conf |
bgp-rs plugin |
| FRR | quay.io/frrouting/frr:10.3.1 | test/perf/configs/frr.conf |
route-map PERMIT |
| BIRD | bird-interop (built) | test/perf/configs/bird.conf |
import/export all |
| GoBGP | gobgp-interop (built) | test/perf/configs/gobgp.toml |
default accept policy |
| rustbgpd | rustbgpd-interop (built from source) | test/perf/configs/rustbgpd.toml |
route_server_client |
# All DUTs
python3 test/perf/run.py
# Specific DUTs
python3 test/perf/run.py ze bird rustbgpd
# Override defaults
DUT_ROUTES=10000 DUT_REPEAT=5 python3 test/perf/run.py
# Via Make
make ze-perf-bench
make ze-perf-bench PERF_DUT=ze
Results are written to test/perf/results/ as JSON files. An HTML comparison report is generated automatically.
Manual Reports
After running benchmarks, generate reports from the result files:
# Markdown report (default)
ze-perf report result-ze.json result-gobgp.json result-rustbgpd.json
# HTML report
ze-perf report --html result-ze.json result-gobgp.json > comparison.html
For the full flag reference, see ze-perf report.
Tracking Performance Over Time
NDJSON History
Each ze-perf run --json invocation produces a single JSON object. Append
results to an NDJSON (newline-delimited JSON) file to build a history:
ze-perf run --dut-addr 127.0.0.1 --dut-asn 65000 --json >> history.ndjson
Trend Reports
Generate a trend report from a history file:
ze-perf track history.ndjson
ze-perf track --html history.ndjson > trend.html
Regression Detection
Use --check in CI to detect performance regressions. The tool compares the
most recent entry against the previous one using stddev-aware thresholds:
ze-perf track --check history.ndjson
Exit code 0 means no regression. Exit code 1 means a regression was detected.
Regressions are flagged when a metric exceeds its threshold percentage AND the delta exceeds the combined standard deviation of the two measurements. This prevents false positives from noisy measurements.
Default thresholds:
| Metric | Default Threshold | Direction |
|---|---|---|
| Convergence time | 20% | Higher is worse |
| Throughput (avg) | 20% | Lower is worse |
| P99 latency | 30% | Higher is worse |
| Routes lost | Any loss | Always flagged |
Custom thresholds:
ze-perf track --check --threshold-convergence 15 --threshold-throughput 10 --threshold-p99 25 history.ndjson
Limit the comparison window to the last N entries with --last:
ze-perf track --check --last 5 history.ndjson
For the full flag reference, see ze-perf track.
Understanding Results
JSON Output Fields
The JSON result object contains these key fields:
| Field | Unit | Description |
|---|---|---|
convergence-ms |
ms | Time from first UPDATE sent to last route received (median) |
convergence-stddev-ms |
ms | Standard deviation of convergence across iterations |
first-route-ms |
ms | Time to first route arrival |
throughput-avg |
routes/s | Average route propagation rate |
throughput-avg-stddev |
routes/s | Standard deviation of throughput |
throughput-peak |
routes/s | Peak routes/s in any 1-second window |
latency-p50-ms |
ms | 50th percentile per-route latency |
latency-p90-ms |
ms | 90th percentile per-route latency |
latency-p99-ms |
ms | 99th percentile per-route latency |
latency-p99-stddev-ms |
ms | Standard deviation of P99 across iterations |
latency-max-ms |
ms | Maximum per-route latency |
routes-sent |
count | Routes injected by sender |
routes-received |
count | Routes received by receiver |
routes-lost |
count | Difference between sent and received |
repeat |
count | Total iterations run |
repeat-kept |
count | Iterations kept after outlier removal |
Interpreting Results
Convergence time is the primary metric. It measures how long the DUT takes to forward all injected routes from sender to receiver. Lower is better.
Standard deviation indicates measurement stability. A high stddev relative
to the median suggests noisy measurements. Increase --repeat and
--iter-delay for more stable results.
Throughput shows sustained forwarding rate. The average is computed over the full convergence window. The peak shows the maximum 1-second burst rate.
Latency percentiles show per-route propagation time distribution. P50 is the typical latency, P99 captures tail latency, and max shows the worst case.
Routes lost should always be zero. Any loss indicates the DUT dropped routes, which the regression checker always flags.