Benchmarking

Ze includes ze-perf, a standalone tool for measuring BGP route propagation latency through a device under test (DUT). It works with any BGP implementation, not just Ze.

Architecture

                  +-----------+
  ze-perf         |           |         ze-perf
  (sender)  ----> |    DUT    | ---->  (receiver)
  AS 65001        |  AS 65000 |         AS 65002
                  +-----------+

The sender establishes a BGP session with the DUT, injects routes, and the receiver measures when those routes arrive. Both sessions are managed by ze-perf in a single process. Timing starts when the sender writes the first UPDATE and stops when the receiver has collected all expected prefixes.

Quick Start

Run a benchmark against Ze on localhost:

# Start ze with a config that accepts peers from 127.0.0.1 and 127.0.0.2
ze test-config.conf &

# Run the benchmark
ze-perf run --dut-addr 127.0.0.1 --dut-asn 65000 --dut-name ze --routes 1000

With JSON output saved to a file:

ze-perf run --dut-addr 127.0.0.1 --dut-asn 65000 --dut-name ze \
  --routes 1000 --output result-ze.json

Running Benchmarks

For a complete flag reference, see ze-perf run.

Encoding Modes

Three encoding modes measure different code paths through the DUT:

Mode Flag What It Tests
IPv4 inline NLRI --family ipv4/unicast (default) Standard IPv4 unicast path with NLRI at the end of the UPDATE
IPv4 force-MP --family ipv4/unicast --force-mp IPv4 encoded via MP_REACH_NLRI attribute (exercises multiprotocol path)
IPv6 MP --family ipv6/unicast IPv6 unicast via MP_REACH_NLRI (standard for non-IPv4 families)

Multi-Iteration

By default, ze-perf run executes 5 iterations with 1 warmup run. The warmup run is discarded, and outliers beyond 2 standard deviations from the median convergence time are removed (minimum 3 iterations kept). Final results report median and standard deviation across the kept iterations.

Flag Default Purpose
--repeat 5 Total iterations (including warmup)
--warmup-runs 1 Warmup iterations discarded from results
--iter-delay 3s Pause between iterations for clean separation

More iterations improve statistical confidence. For reliable results, use at least --repeat 10:

ze-perf run --dut-addr 172.31.0.2 --dut-asn 65000 --repeat 10 --warmup-runs 2

Timing

Flag Default Purpose
--warmup 2s Delay after session establishment before injecting routes
--connect-timeout 10s TCP connection timeout
--duration 60s Maximum wait time for convergence per iteration
--iter-delay 3s Delay between iterations

The --iter-delay flag controls whether iterations run back-to-back or with a pause. Longer delays give the DUT time to settle between measurements.

Cross-Implementation Comparison

Automated Docker Runner

The included test runner benchmarks all five supported implementations in Docker:

DUT Image Config Forwarding mechanism
Ze ze-interop (built) test/perf/configs/ze.conf bgp-rs plugin
FRR quay.io/frrouting/frr:10.3.1 test/perf/configs/frr.conf route-map PERMIT
BIRD bird-interop (built) test/perf/configs/bird.conf import/export all
GoBGP gobgp-interop (built) test/perf/configs/gobgp.toml default accept policy
rustbgpd rustbgpd-interop (built from source) test/perf/configs/rustbgpd.toml route_server_client
# All DUTs
python3 test/perf/run.py

# Specific DUTs
python3 test/perf/run.py ze bird rustbgpd

# Override defaults
DUT_ROUTES=10000 DUT_REPEAT=5 python3 test/perf/run.py

# Via Make
make ze-perf-bench
make ze-perf-bench PERF_DUT=ze

Results are written to test/perf/results/ as JSON files. An HTML comparison report is generated automatically.

Manual Reports

After running benchmarks, generate reports from the result files:

# Markdown report (default)
ze-perf report result-ze.json result-gobgp.json result-rustbgpd.json

# HTML report
ze-perf report --html result-ze.json result-gobgp.json > comparison.html

For the full flag reference, see ze-perf report.

Tracking Performance Over Time

NDJSON History

Each ze-perf run --json invocation produces a single JSON object. Append results to an NDJSON (newline-delimited JSON) file to build a history:

ze-perf run --dut-addr 127.0.0.1 --dut-asn 65000 --json >> history.ndjson

Trend Reports

Generate a trend report from a history file:

ze-perf track history.ndjson
ze-perf track --html history.ndjson > trend.html

Regression Detection

Use --check in CI to detect performance regressions. The tool compares the most recent entry against the previous one using stddev-aware thresholds:

ze-perf track --check history.ndjson

Exit code 0 means no regression. Exit code 1 means a regression was detected.

Regressions are flagged when a metric exceeds its threshold percentage AND the delta exceeds the combined standard deviation of the two measurements. This prevents false positives from noisy measurements.

Default thresholds:

Metric Default Threshold Direction
Convergence time 20% Higher is worse
Throughput (avg) 20% Lower is worse
P99 latency 30% Higher is worse
Routes lost Any loss Always flagged

Custom thresholds:

ze-perf track --check --threshold-convergence 15 --threshold-throughput 10 --threshold-p99 25 history.ndjson

Limit the comparison window to the last N entries with --last:

ze-perf track --check --last 5 history.ndjson

For the full flag reference, see ze-perf track.

Understanding Results

JSON Output Fields

The JSON result object contains these key fields:

Field Unit Description
convergence-ms ms Time from first UPDATE sent to last route received (median)
convergence-stddev-ms ms Standard deviation of convergence across iterations
first-route-ms ms Time to first route arrival
throughput-avg routes/s Average route propagation rate
throughput-avg-stddev routes/s Standard deviation of throughput
throughput-peak routes/s Peak routes/s in any 1-second window
latency-p50-ms ms 50th percentile per-route latency
latency-p90-ms ms 90th percentile per-route latency
latency-p99-ms ms 99th percentile per-route latency
latency-p99-stddev-ms ms Standard deviation of P99 across iterations
latency-max-ms ms Maximum per-route latency
routes-sent count Routes injected by sender
routes-received count Routes received by receiver
routes-lost count Difference between sent and received
repeat count Total iterations run
repeat-kept count Iterations kept after outlier removal

Interpreting Results

Convergence time is the primary metric. It measures how long the DUT takes to forward all injected routes from sender to receiver. Lower is better.

Standard deviation indicates measurement stability. A high stddev relative to the median suggests noisy measurements. Increase --repeat and --iter-delay for more stable results.

Throughput shows sustained forwarding rate. The average is computed over the full convergence window. The peak shows the maximum 1-second burst rate.

Latency percentiles show per-route propagation time distribution. P50 is the typical latency, P99 captures tail latency, and max shows the worst case.

Routes lost should always be zero. Any loss indicates the DUT dropped routes, which the regression checker always flags.