
Monitor Packet Drops with eBPF
You feel the pressure when network flows stall, and packet drop monitoring eBPF is the fastest way to surface the kernel’s reason for loss so you can act with confidence.
We outline a practical path—attach an ebpf program to a tracepoint, capture events at the linux kernel, and export clear flow-level data to dashboards.
This approach moves you from guessing to precise root cause work—showing real drop reasons across TCP, UDP, SCTP, and ICMP families.
Key Takeaways
- Hooking tracepoints lets you capture loss reasons right where they occur in the kernel.
- We send event data from ebpf programs to user space and expose metrics for dashboards.
- OpenShift Network Observability maps these signals into filters, graphs, and topology views.
- RHEL 9.2+ kernels support the drop-reason API; privileged agents are required.
- You’ll get concrete steps, code paths (Go exporter and Python/BCC), and test methods to validate results.
What you’ll build and why packet drop insights matter for modern networks
I’ll walk you through building a lightweight exporter that turns kernel tracepoints into clear, actionable signals for teams. The exporter attaches a small program to tcp_retransmit_skb and skb/kfree_skb, captures events, and enriches each record with TCP state and flags.
The goal is practical: expose Prometheus metrics over HTTP on port 2112 and provide a simple endpoint for dashboards and alerts. That gives you not just counts, but reasons—NO_SOCKET, PKT_TOO_SMALL, and similar causes—so ops can triage faster.
- Attach tracepoint, filter in-kernel, push compact data to user space.
- Export metrics and labels for flow-level analysis and topology views.
- Keep object files and config files organized with consistent naming.
Tracepoint | What it shows | Typical output |
---|---|---|
tcp_retransmit_skb | Retransmit events and TCP state | Retransmit rate, flow id, flags |
skb/kfree_skb | Free/reason for free | Cause labels (NO_SOCKET, PKT_TOO_SMALL) |
Exporter | Prometheus endpoint | HTTP :2112, metrics and basic status |
For a quick start with tools and examples, see this guide to BCC tooling. In OpenShift, these signals map to UI filters—fully vs containing cases, TCP state filters, and top cause graphs—so teams jump from metric to root cause without guesswork.
Requirements, kernel support, and environment setup
Start by confirming your linux kernel features and access model — this keeps the setup safe and reproducible.
First, verify the kernel baseline. RHEL 9.2+ exposes the standardized drop-reason tracepoint that gives high-fidelity loss reasons. Older kernels will ignore that API and deliver less detail.
Privileges and safe access
Production hosts need deliberate privilege choices. In OpenShift, enable PacketDrop by creating a FlowCollector with ebpf.privileged: true and features: PacketDrop.
Limit privileged runs to validated nodes and set resource requests to protect other services. On Ubuntu 21.10+ and 22.04+ unprivileged BPF is disabled by default; re-enable with sysctl kernel.unprivileged_bpf_disabled=0
only for short-term dev work.
Tools and file layout
Choose a toolchain that fits your team. For Go builds, generate vmlinux.h
with bpftool and compile C code using clang/LLVM. For rapid testing, install python3-bpfcc, bpfcc-tools, and libbpfcc for Python/BCC workflows.
- Place object and header files in a stable folder (src/bpf/, build/, config/).
- Plan port exposure for metrics (default exporter port 2112) and confirm user space can reach exported data.
- Document which tracepoints and functions your programs attach to for reproducibility.
Area | Action | Why it matters |
---|---|---|
Kernel check | Confirm RHEL 9.2+ or equivalent | Enables standardized drop-reason tracepoints for better data |
Privileges | Use ebpf.privileged: true only after review | Protects security posture and resource budgets |
Toolchain | clang/LLVM, bpftool, Prometheus, Go or Python/BCC | Compiles code, generates headers, and exports metrics |
How it works: eBPF programs, maps, and user space pipelines
Let’s unpack how tracepoints and maps cooperate to move structured data from kernel space into a running exporter.
We attach small programs to two tracepoints — skb/kfree_skb for drop events and tcp_retransmit_skb for retransmissions. That captures the event where the kernel records a reason or retransmit action.
Inside the kernel, these programs write compact records into eBPF maps. A perf buffer or ring buffer then moves those records into user space with minimal overhead.
Portable builds and data flow
We use BPF CO-RE with a generated vmlinux.h
header so the same code runs across linux kernel variants. That keeps the program portable and maintainable.
- Maps stage structured fields — addrs, ports, reasons, and tcp state.
- Perf vs ring buffers — choose ring buffers on newer kernels for throughput and lower syscall cost.
- User space reads via a non-blocking loop, decodes events, and updates Prometheus metrics on :2112.
Component | Role | Notes |
---|---|---|
Tracepoints | Emit event when kernel frees skb or retransmits TCP | skb/kfree_skb and tcp_retransmit_skb |
eBPF maps | Stage structured event data | Used for aggregation and short-term state |
Perf/Ring buffer | Transport events to user space | Ring buffer preferred on modern kernels |
User space exporter | Decode, label, and expose metrics | HTTP :2112 for Prometheus scrapes |
We keep object files, program code, and config files together to simplify CI and rollout. The user space loop decodes events quickly, avoids blocking, and handles clean shutdowns so no data is lost during upgrades.
packet drop monitoring eBPF in OpenShift with Network Observability
Turning on PacketDrop in OpenShift surfaces kernel signals so you can separate host-stack issues from OVS pipeline behavior.
Enabling PacketDrop: FlowCollector spec with privileged eBPF
Enable PacketDrop by applying a FlowCollector with ebpf.privileged: true
and features: PacketDrop
. This single change turns on tracepoint-driven programs that feed labeled events into the flow pipeline.
Drop categories: core subsystem vs OVS-based reasons
Two reason families appear: core subsystem (SKB_DROP_REASON) and OVS-based reasons on supported kernels. Seeing both side-by-side helps you tell host issues from pipeline actions.
OCP UI enhancements and overview panels
The console adds filters—Fully dropped, Containing drops, Without drops, and All—plus selectors for Packet drop TCP state and latest cause. Overview panels show total dropped rate, top states, and top causes.
- Topology edges with drops render in red for quick spotting.
- Export file snapshots let you attach evidence to postmortems.
- Expect ~22% vCPU and ~9% memory uplift for the flowlogs-pipeline process.
Feature | What you get | Notes |
---|---|---|
FlowCollector flag | Privileged tracepoint programs | Set ebpf.privileged: true |
Reason categories | Core subsystem vs OVS | Consistent tracepoint naming |
UI & graphs | Filters, top rates, causes | Red topology edges; exportable file views |
Hands-on path with Go: load eBPF, attach tracepoints, expose Prometheus
We’ll use Go and cilium/ebpf to load verified bytecode, bind tracepoints, and turn events into labeled metrics you can chart.
Loading and verifying bytecode
Compile your C into an object file and generate vmlinux.h
with bpftool:
bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h
In Go call ebpf.LoadCollectionSpec
and NewCollectionWithOptions
with verbose verifier logs. That gives early feedback if the kernel rejects a function or map.
Attaching tracepoints and reading events
Attach via link.Tracepoint("tcp", "tcp_retransmit_skb", ...)
. Create a perf reader with perf.NewReader(coll.Maps["events"], os.Getpagesize())
.
In user space run a non-blocking loop: read raw records, decode fields, and update Prometheus metrics with meaningful labels — source, destination, tcp state, and reason name.
Prometheus scrape config and HTTP exporter
Serve metrics with promhttp
on :2112. Add a simple scrape job in prometheus.yml
that targets 127.0.0.1:2112
. Use descriptive metric names and source/dest labels for easy queries.
Step | What | Why |
---|---|---|
Build | Compile C → object file | Deterministic artifacts for CI |
Load | ebpf.LoadCollectionSpec + NewCollectionWithOptions | Verifier logs surface issues early |
Attach | link.Tracepoint & perf reader | Low-overhead event flow to user space |
- Structure source code and files so CI produces the same object for dev and prod.
- Test tcp behavior with
tc netem
to inject loss/delay and validate counters rise. - Checklist: load without warnings, attach tracepoints, confirm events flow, validate Prometheus collection.
Alternative path with Python/BCC: rapid prototyping and DNS visibility
I like to prototype with Python/BCC when I want quick feedback. It’s a fast way to join network analytics and process context without a full build pipeline.
Use socket filters to capture DNS (UDP 53) traffic and attach kprobes to functions like execve
to learn which process generated an event. Emit structured records with BPF_PERF_OUTPUT
and read them in user space to correlate queries with PIDs and names.
Socket filters, kprobes, and mapping to processes
Attach a packet filter that parses IP/UDP headers and applies a berkeley packet filter expression for efficiency.
In the same program add kprobes that capture process context. Use helpers—bpf_get_current_pid_tgid
and bpf_get_current_comm
—to join network data with the running process.
Privilege considerations and enabling unprivileged BPF
On Ubuntu 21.10+ and 22.04 LTS, unprivileged BPF is disabled by default. Enable it temporarily with:
sudo sysctl kernel.unprivileged_bpf_disabled=0
Only use that setting for short-term dev work—revert it for production to preserve security posture.
- I recommend installing
python3-bpfcc
,bpfcc-tools
,libbpfcc
, andlinux-headers-$(uname -r)
. - Keep prototype source code and header files together for quick edits before porting to compiled code.
- Read perf buffers in a loop and handle backpressure to avoid lost events.
Action | What it captures | Why it helps |
---|---|---|
Socket filter | DNS UDP 53 packets | Low-cost filtering with berkeley packet filter syntax |
Kprobe (execve) | Process context at syscall | Maps network events to process name and PID |
BPF_PERF_OUTPUT | Structured events | Reliable transport to user space for correlation |
Analyzing and visualizing drops across flows, tables, and topology
When red edges appear in a topology view, they point you straight to the resource that needs attention. I begin in the Traffic flows table where sent bytes and packets show as green and failed counts show in red.
Open a side panel for any flow to see cause labels, TCP state, and links to documentation. That panel ties flow-level data to the application and the destination so you can act fast.
Traffic flows: bytes/packets vs drops and side-panel details
Compare high-throughput flows with those that have elevated failure rates. Use labels — cause, state, source, destination — to answer application-level questions without writing queries.
Topology view: highlighting edges with red drop indicators
Topology marks failing edges in red so you can follow the path upstream or to a destination process. That visual cue reduces time to isolate issues across resources.
- Example panels: top dropped rate, top causes, and top TCP states for quick trend spotting.
- Process context helps separate app restarts from infra faults.
- Capture screenshots or export views to share evidence during incident reviews.
View | What to check | Why it helps |
---|---|---|
Traffic table | Bytes, packets, failed counts | Find flows with mismatched throughput vs success |
Side panel | Cause, state, docs link | Root-cause context without extra queries |
Topology | Red edges to destination/process | Quickly identify upstream or downstream issues |
Workflow: spot red signals, open the side panel, correlate with process and time, then apply corrective action — configuration, code change, or resource adjustment.
Testing, troubleshooting, and practical scenarios
A quick lab test can tell you whether the exporter, kernel hooks, or network are at fault.
Simulate loss and latency
Use netem to inject controlled loss and delay and validate that TCP counters and metrics move as expected.
Example:
tc qdisc add dev eth0 root netem loss 10% delay 100ms — be careful: high values can break SSH. Roll back with tc qdisc del.
Common causes and what they reveal
NO_SOCKET signals an unreachable destination port — generate traffic to a closed port and confirm the reason appears in metrics.
PKT_TOO_SMALL usually points to parsing or MTU issues. OVS_DROP_LAST_ACTION (RHEL 9.2+) indicates policy or pipeline decisions.
Performance footprint and system impact
Expect the flowlogs-pipeline (FLP) to use ~22% more vCPU and ~9% more memory with PacketDrop enabled; other components usually rise
- Validate tracepoint attachments by checking file configs and logs for permission or name errors.
- Avoid high label cardinality — keep metric names stable and filter labels that cause explosion.
- If the program loop lags, raise buffer sizes or simplify per-event work to prevent lost events.
- Isolate faults by toggling one variable at a time: exporter, netem, or kernel setting.
Scenario | Indicator | Quick remediation |
---|---|---|
Closed destination port | NO_SOCKET in metrics and UI | Confirm service, adjust firewall or port config |
Small frames / parsing | PKT_TOO_SMALL logged | Check MTU, fragment handling, and parser code |
OVS policy | OVS_DROP_LAST_ACTION seen on RHEL 9.2+ | Review OVS flows and ACL rules; test with policy off |
Next steps to deepen your eBPF monitoring practice
Start small and iterate. Add DNS socket filters or process-mapping probes with Python/BCC to broaden visibility beyond TCP. Prototype quickly, then port stable logic to CO‑RE builds so your code stays portable across the linux kernel.
Script vmlinux.h generation and keep a tidy file layout—object files, headers, and configs—so builds remain reproducible. Measure performance as you add probes and track system resource use in OpenShift.
Document alerts, runbooks, and clear names for each metric and function. Use dashboards for rising retransmissions or specific reasons and tune thresholds with real incident data.
Finally, review results in retrospectives—what worked, what cost too much—and prioritize the next iteration. That close-the-loop habit scales knowledge and reduces time to repair.