VPP Data Plane
Status: the VPP component manages the VPP process lifecycle (startup,
crash recovery, DPDK NIC binding, GoVPP connection), the fib-vpp plugin
programs routes from ze's system RIB directly into VPP's FIB via the
GoVPP binary API, and the stats segment is polled for per-interface,
per-node and system-wide Prometheus metrics. MPLS label programming,
a VPP-native interface backend, and VPP-native features (L2XC, bridge
domains, VXLAN, policers, ACLs, SRv6, sFlow) are designed but not yet
wired.
Why this matters
Ze is a BGP daemon. BGP produces forwarding decisions; something else has
to carry the packets. The default answer on Linux is the kernel route
table, programmed via netlink by the fib-kernel plugin. That works,
but it puts every packet through the kernel's software forwarding path,
which tops out at a few Mpps on commodity hardware and starts dropping
under bursty load.
VPP (the FD.io Vector Packet Processor) is a user-space software router built on DPDK. It takes the NICs away from the kernel, batches packets into vectors, and walks each vector through a graph of forwarding nodes with hot caches. On the same hardware where the kernel loses packets, VPP forwards at line rate: roughly 35 Mpps on a 4-core Xeon D-1518, with numbers like 18 Mpps per thread for MPLS and around 14 Mpps per thread for VXLAN. IPng Networks has run this stack in production on AS8298 for several years.
Adding VPP to ze is not about replacing BGP. It is about giving ze a credible answer when someone asks "can I use this for an IXP route server, a production edge router, or a gokrazy appliance on an N100 mini-PC?" The control plane stays in ze (BGP, RIB, config, CLI, web UI). The forwarding plane becomes VPP. The two talk through a small, typed interface (GoVPP over a Unix socket).
How the two halves fit together
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ze β
β β
β BGP reactor βββΆ protocol RIB βββΆ sysRIB β
β β β
β ββββββββ΄βββββββ β
β β β β
β fib-kernel fib-vpp β
β (netlink) (GoVPP) β
β β β β
β βΌ βΌ β
β Linux FIB VPP FIB β
β (local) (transit) β
β β
β vpp component: starts VPP, binds NICs, watches health β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Ze keeps two FIB backends and runs them side by side. fib-kernel
still programs the Linux route table so local services (SSH, the web
UI, BGP TCP sessions) keep working. fib-vpp pushes the same best
routes into VPP's FIB via the GoVPP binary API, so transit packets
are forwarded by VPP at DPDK speed. Both plugins subscribe to the
(system-rib, best-change) event on the EventBus and react
independently.
BGP sessions themselves still use Linux sockets. VPP's Linux Control
Plane (LCP) plugin mirrors every VPP interface as a TAP device in a
dataplane network namespace, so ze's BGP reactor can bind() and
connect() exactly as it does today. VPP is invisible to the BGP code
path.
What ze does for you
When vpp.enabled is true, ze takes ownership of the whole VPP
lifecycle. You do not run systemd start vpp; you do not edit
/etc/vpp/startup.conf; you do not dpdk-devbind your NICs by hand.
Ze does the following, in order, every time it starts or VPP crashes:
| Step | What ze does | Code |
|---|---|---|
| 1 | Parses the vpp { ... } YANG section into VPPSettings |
internal/component/vpp/config.go |
| 2 | Validates PCI addresses, socket paths, netns names, size strings | config.go: Validate |
| 3 | Renders startup.conf (unix, cpu, buffers, dpdk, plugins, linux-cp, linux-nl, heapsize, statseg sections) |
startupconf.go: GenerateStartupConf |
| 4 | Loads vfio, vfio_pci, vfio_iommu_type1 kernel modules |
dpdk.go: loadVFIOModules |
| 5 | For each configured PCI address: reads the current driver, saves it, unbinds, binds to vfio-pci | dpdk.go: bindPCI |
| 6 | Execs the VPP binary with -c <generated startup.conf> |
vpp.go: runOnce |
| 7 | AsyncConnects GoVPP to /run/vpp/api.sock (10 attempts, 1 s apart) |
conn.go: Connect |
| 8 | Emits ("vpp", "connected") on the EventBus |
vpp.go: emitEvent |
| 9 | Starts the stats poller (per-interface, per-node, system metrics) | telemetry.go |
| 10 | Waits for VPP to exit. On crash: emits ("vpp", "disconnected"), backs off (1 s, 2 s, 4 s, capped at 30 s), restarts, emits ("vpp", "reconnected"). fib-vpp replays the RIB on reconnect. |
vpp.go: Run loop, fibvpp.go: reconnect handler |
| 11 | On clean shutdown: SIGTERM VPP, triggers a PCI rescan, restores the original NIC drivers | vpp.go: defer, dpdk.go: UnbindAll |
The point is that VPP's system-level prerequisites (vfio module load, NIC unbind, driver save, rescan-on-teardown) are part of ze's job, not the operator's. This matters on a gokrazy appliance where there is no systemd and ze is PID 1 for the data plane.
Running against an externally supervised VPP
Set vpp.external true when something other than ze owns the VPP
process -- a systemd unit, a container sidecar, or the Python stub
the functional tests use. In this mode ze skips steps 3, 4, 5, 6 and 11
of the table above: no startup.conf generation, no vfio module load,
no NIC unbind, no exec vpp, no PCI rescan on shutdown. Ze still
connects via GoVPP at step 7, emits the same lifecycle events at step 8,
runs the stats poller at step 9, and blocks on context cancellation
instead of cmd.Wait at step 10.
Typical configurations:
vpp {
enabled true;
external true;
api-socket /run/vpp/api.sock;
}
The operator owns startup.conf, owns the systemd unit, owns the vfio
bind. Ze owns the API socket conversation and the FIB programming.
Use this for:
- Systemd deployments: the system VPP unit starts before ze, ze
connects to its API socket on startup.
- Gokrazy containers: a sidecar image bundles vpp and ze; the
supervisor starts VPP first, then ze with external true.
- Functional tests: bin/ze-test vpp runs test/vpp/*.ci which
drive a Python vpp_stub.py listening on the API socket instead of
a real VPP. See docs/functional-tests.md for the harness.
Configuring VPP
The vpp { ... } container lives in the main ze config. Minimal example:
vpp {
enabled true;
dpdk {
interface 0000:03:00.0 {
name xe0;
}
interface 0000:03:00.1 {
name xe1;
}
}
}
This is enough to boot VPP with the default heap, default buffer count, default stats segment and LCP enabled. Add cores, tune memory, or change the stats poll interval only when the defaults do not fit the workload.
Every leaf, what it does, what it defaults to
| Path | Type | Default | What it controls |
|---|---|---|---|
vpp.enabled |
boolean | false |
Master switch. false means ze does not start VPP at all. |
vpp.external |
boolean | false |
When true, ze connects to an existing VPP via api-socket but does NOT generate startup.conf, bind DPDK NICs, or exec the VPP binary. Use this on systemd-managed hosts, container sidecars, or the ze-test vpp stub harness. Default false preserves the ze-owned-lifecycle behavior. |
vpp.api-socket |
string | /run/vpp/api.sock |
GoVPP Unix socket. Ze validates it is absolute, has no .., and fits in 108 characters. |
vpp.cpu.main-core |
uint8 | auto | CPU core pinned to the VPP main thread. Omit for VPP default. |
vpp.cpu.workers |
uint8 | auto | Number of worker threads. Ze allocates main-core+1 .. main-core+workers for corelist-workers in startup.conf. |
vpp.memory.main-heap |
size string | 1G |
VPP main heap. Use 1536M for a full DFZ (approximately 958k IPv4 + 198k IPv6 routes). |
vpp.memory.hugepage-size |
2M or 1G |
2M |
Hugepage size. 2M is the common case; 1G for large installations. |
vpp.memory.buffers |
uint32 | 128000 |
Buffers per NUMA node. 128k is proven for full DFZ at 10G. |
vpp.dpdk.interface[pci-address].name |
string | (required) | Short interface name used in ze (e.g. xe0). Must start with a letter, max 15 chars. |
vpp.dpdk.interface[pci-address].rx-queues |
uint8 | VPP default | Receive queues. Omit unless the NIC needs more. |
vpp.dpdk.interface[pci-address].tx-queues |
uint8 | VPP default | Transmit queues. Omit unless the NIC needs more. |
vpp.stats.segment-size |
size string | 512M |
Stats segment shared memory size. |
vpp.stats.socket-path |
string | /run/vpp/stats.sock |
Stats Unix socket path. Separate from the API socket. |
vpp.stats.poll-interval |
uint16 seconds, 1..3600 | 30 |
How often ze reads the stats segment for Prometheus metrics. |
vpp.lcp.enabled |
boolean | true |
Whether ze asks VPP to load linux_cp_plugin.so and linux_nl_plugin.so. Leave on when BGP uses VPP-owned NICs. |
vpp.lcp.sync |
boolean | true |
Mirror VPP state changes (link, MTU, IP) into the Linux TAPs. |
vpp.lcp.auto-subint |
boolean | true |
Auto-create Linux TAPs for dot1q and QinQ sub-interfaces. |
vpp.lcp.netns |
string | dataplane |
Network namespace where LCP TAPs appear. Must not contain path separators. |
Enabling FIB programming
The VPP component starts VPP, but it does not program routes. That is
fib-vpp's job, and it has its own switch under the fib container:
fib {
vpp {
enabled true;
table-id 0;
}
}
| Path | Type | Default | What it controls |
|---|---|---|---|
fib.vpp.enabled |
boolean | false |
Enable route programming. Off means the plugin loads but does nothing. |
fib.vpp.table-id |
uint32 | 0 |
VRF table ID. 0 is the default VRF. |
fib.vpp.batch-size |
uint16 | 256 |
Max routes per GoVPP batch. |
fib.vpp.batch-interval-ms |
uint16 | 10 |
Max milliseconds to wait before dispatching a partial batch. |
fib-vpp depends on the vpp subsystem and on the RIB plugin. If VPP
is disabled or the GoVPP channel fails to open, the plugin falls back to
a noop backend and logs a warning instead of blocking the rest of ze.
System prerequisites
VPP is not a user-space toy; DPDK needs real kernel cooperation. Ze validates its own config, but it cannot set kernel boot parameters or allocate hugepages. Before enabling VPP:
| Requirement | How to provide it |
|---|---|
| Hugepages (approximately 6 GB for production 10G, 2 GB for lab) | echo 3072 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages or via /etc/sysctl.d/ |
| IOMMU enabled | BIOS: enable VT-d / AMD-Vi. Kernel cmdline: intel_iommu=on iommu=pt |
| CPU isolation for VPP workers | Kernel cmdline: isolcpus=<worker-cores> so Linux does not schedule on them |
| Netlink buffer for route injection | sysctl net.core.rmem_default=67108864 |
| Minimum hardware | 4 CPU cores, 8 GB RAM, a DPDK-compatible NIC |
Supported NICs through DPDK include Intel i210/i350, X520/X540, X710/XL710 and E810 families, plus VirtIO for VMs. Mellanox ConnectX-5 and later use RDMA in a bifurcated driver mode; the current ze implementation is DPDK only and does not yet bind Mellanox NICs.
Observing what VPP is doing
Two places expose VPP state through ze:
- Prometheus metrics. The stats poller exports per-interface
counters (rx/tx packets, bytes, drops, errors), per-node clocks and
vectors-per-call, and system-wide vector and input rates. Poll
interval is configurable via
vpp.stats.poll-interval.
- The
show fib vppcommand. Dumps the routesfib-vppbelieves it has installed in VPP, as JSON[{"prefix": ..., "next-hop": ...}, ...].
Direct VPP introspection (vppctl show int, vppctl show ip fib) is
still available through the CLI socket ze writes to /run/vpp/cli.sock.
What is not yet wired
Today, VPP process lifecycle, IPv4/IPv6 FIB programming, and stats telemetry are in the tree. The remaining phases:
| Phase | What it adds | Why not yet |
|---|---|---|
| vpp-3 | MPLS label push / swap / pop driven from BGP labelled unicast | In tree. Labels stripped at NLRI parse (SplitLabeled, RFC 8277), stored as FamilyRIB side-data, propagated through bgp-rib and sysRIB BestChangeEntry.Labels, programmed into VPP via IPRouteAddDel with LabelStack (push) or MplsRouteAddDel (swap/pop). 20-bit label range and stack depth 16 validated before GoVPP call. |
| vpp-4 | VPP-native iface.Backend: managing interfaces directly via GoVPP instead of through the kernel |
In tree. Backend registers as "vpp" and loads cleanly under interface { backend vpp; }. Interface lifecycle (CreateDummy/Bridge/VLAN, Delete, SetAdminUp/Down, SetMTU), addressing, bridge port add/del, query (ListInterfaces, GetInterface, GetMACAddress, SetMACAddress), and monitor (WantInterfaceEvents -> EventBus) all wired against vendored GoVPP. Tunnels (VXLAN/GRE/IPIP), LCP TAP pairs, VPP stats segment, mirror, and wireguard are deferred to vpp-4b/4c/5/6b (each blocked on vendoring the matching go.fd.io/govpp/binapi/* package). Iface-component reconciliation also currently races the vpp handshake at startup and degrades to additive-only -- tracked in spec-iface-vpp-ready-gate. |
| vpp-5 | L2 cross-connect, bridge domains, VXLAN tunnels, policers, ACLs, SRv6, sFlow | Depends on vpp-4. Each feature is independent. |
The three-strategy framing in
docs/research/ze-vpp-analysis.md
explains the bigger picture: strategy 1 (IPng / VyOS style, netlink
intermediary) is the safe default, strategy 3 (direct FIB programming,
what ze does) is where ze differentiates by skipping the kernel entirely
and converging sub-second on a full table.
References and further reading
| Resource | What to look at |
|---|---|
docs/research/vpp-deployment-reference.md |
startup.conf reference, NIC matrix, performance baselines, LCP details |
docs/research/ze-vpp-analysis.md |
Three-strategy feasibility analysis (strategy 1 / 2 / 3, LOC estimates, risks) |
docs/research/vpp-deployment-notes.md |
Consolidated notes from 83 IPng.ch articles (production architecture, article index by topic, key tools, upstream contributions) |
| IPng.ch blog, VPP + LCP series (2021-08 to 2021-09, 7 parts) | How the LCP plugin works, end to end |
| IPng.ch blog, VPP configuration series (2022-03 / 2022-04) | vppcfg's DAG-based declarative config (the non-ze way) |
| IPng.ch blog, VPP monitoring (2023-04) | Stats segment interpretation, vectors-per-call |
| IPng.ch blog, VPP MPLS series (2023-05, 4 parts) | MPLS label operations in VPP (context for vpp-3) |
| IPng.ch blog, VPP sFlow series (2024-09 to 2025-02, 3 parts) | Context for future vpp-5 sFlow feature |
| go.fd.io/govpp documentation | Binary API client, stats client, binapi code generation |
| VPP 25.02 documentation | API reference for the modules ze targets |