
Start eBPF Programming with bcc Tools
you want a practical bcc tools eBPF tutorial that helps you fix slow services and blind spots in your system without risky kernel hacks.
I know that kernel work can feel scary, so we’ll walk step-by-step and keep things friendly. We’ll show how a tiny ebpf program runs safely in the kernel and how BCC makes writing code approachable for both new and seasoned engineers.
We’ll focus on real wins: live tracing, performance troubleshooting, and monitoring across network paths. You’ll build two labs—a syscall Hello World and an XDP UDP counter—so you can reuse the patterns in your own program development.
Along the way, we explain what each command does, how long steps take, and how to check results without breaking the system. By the end, you’ll read hooks, attach probes, capture events, and extract insights with confidence.
Key Takeaways
- Learn a hands-on path to run safe eBPF code in the kernel.
- Use BCC to speed up tracing, monitoring, and performance fixes.
- Complete two practical labs you can run on a laptop or server.
- Understand the program lifecycle: write, load, observe, and clean up.
- Gain reproducible skills for system and network troubleshooting.
Why eBPF and BCC matter for modern Linux performance, tracing, and monitoring
Seeing into a live Linux system becomes practical when small, verified programs run in the kernel. eBPF is an extended, verifier-checked virtual machine inside the linux kernel that lets us run user code safely and with low overhead.
These programs are event-driven and attach at well-defined hook points—function entries/exits, syscall handlers, kernel tracepoints, and fast-path network spots like XDP. XDP itself can operate in Offloaded, Native, or Generic modes depending on your NIC and driver.
Common hooks you’ll use
- kprobe — dynamic function tracing when you need a focused view.
- tracepoints — stable fields for reproducible instrumentation.
- syscall interception — visibility into every system call and call path.
- Network/XDP — packet fate and fast-path metrics for performance work.
We use BCC for fast iteration: small C or Python bindings make it easy to compile, load, and observe results without rebooting. That flow gives practical outcomes — low-cost tracing in production, sharper debugging, and faster feedback when investigating regressions.
Safety matters: the verifier and constrained program models protect the kernel while letting you ask precise questions of your system.
Environment setup for this bcc tools eBPF tutorial
Before you run any examples, let’s pick an environment that keeps experiments safe and repeatable.
We compare running natively on Ubuntu versus a lightweight virtual machine so you choose what fits your machine and comfort level. Native Ubuntu works well if your kernel headers match the running kernel; that avoids compilation errors when loading probes into the kernel.
Mac on Apple Silicon: Lima VM with Ubuntu
For Apple Silicon, we recommend Lima. Spin up an Ubuntu instance, mount a work directory, and provision dependencies in one pass. Example commands: brew install lima, limactl start –name ubuntu ubuntu.yaml, limactl list, and limactl shell ubuntu.
Dependencies and quick checks
Provision the YAML to install bpfcc-tools, linux-headers-$(uname -r), build-essential, pkg-config, and libssl-dev. Install clang/LLVM if you plan to build C snippets. BCC offers Python bindings so a user space loader can compile and inject code into the kernel with minimal ceremony.
- Keep your work folder mounted so files persist outside the VM.
- Pin images or snapshot the instance for repeatability over time.
- Do a quick sanity check with a sample BCC example to confirm headers, compiler collection, and permissions are correct.
Option | Pros | Cons |
---|---|---|
Native Ubuntu | Faster I/O, direct kernel access | Must match kernel headers |
Lightweight VM | Isolated, repeatable machine | Small overhead and setup time |
Lima on Mac | Easy for Apple Silicon, shared folders | Requires limactl commands and YAML |
Quick orientation: programs in kernel space and user space
Let’s map how two cooperating programs—one in kernel space and one in user space—work together to observe a running system.
In practice, you write a tiny kernel program that handles fast, event-driven work and a user space program that loads, configures, and reads results.
The kernel side must stay small and safe so the verifier accepts it. It uses bpf helpers to read time, task info, or memory safely.
Maps bridge the two spaces. They act as key/value stores, arrays, and histograms so both the kernel program and the user reader share state and durable results.
- The user program invokes BCC to compile and load the kernel code, attach a hook, and then poll or subscribe for updates.
- Keep complex logic in user space—control flow, aggregation, and presentation—to avoid verifier issues and speed iteration.
- Debug by using trace buffers for quick messages and structured maps for parseable, long-lived data.
Typical flow: event fires in the kernel, the small program records a metric to a map, then the user polls the map and transforms that data for logs or UIs.
Once this split clicks, writing more ebpf programs becomes a repeatable pattern you can reuse across many system tasks.
Hello World with kprobe: trace a system call every time it runs
We’ll start with a tiny, hands-on example that prints a message every time the kernel creates a new process. This demo uses a kprobe on the clone syscall so you can watch activity in real time.
Attach a kprobe to the clone syscall and print from the kernel
Write a minimal C snippet with a function named syscall__clone that calls bpf_trace_printk(“Hello, World!\n”).
Load it from Python using BPF(text=…) and pick the correct event name with b.get_syscall_prefix().decode() + ‘clone’. That handles different symbol names across architectures.
Streaming events to user space with BCC’s trace_print in Python
After loading, call b.attach_kprobe(event=event_name, fn_name=”syscall__clone”), then stream output with b.trace_print(). Each clone call prints the Hello World line every time the kernel executes that system call.
- This is great for quick inspection and hypothesis testing when a new process spawns often.
- Watch out for symbol differences and run under sudo so you have permission to attach probes.
- For production-grade collection, move from print strings to maps for structured data.
Count UDP packets with XDP and BPF maps using BCC
I’ll show a concise XDP flow that parses Ethernet, IP, and UDP headers and records destination-port counts. The goal is simple: parse packets safely in the kernel, persist counts in a histogram, and read results from a Python loader.
Write the C program to parse headers
In C, define KBUILD_MODNAME and include linux/bpf.h, if_ether.h, ip.h, and udp.h. Declare a BPF_HISTOGRAM(counter, u64).
The function udp_counter(struct xdp_md *ctx) walks packet bounds using ctx->data and ctx->data_end. Check each header—ethhdr, iphdr, udphdr—and return early on malformed packets. If IP protocol == UDP, read udp->dest, convert with htons, and increment the histogram bucket keyed by that port.
Load, attach, and collect with a Python loader
From Python use BPF(src_file=”udp_counter.c”); fn = b.load_func(“udp_counter”, BPF.XDP); then attach_xdp(device, fn, 0). While running, b.trace_print() streams quick events for sanity. On KeyboardInterrupt, read dist = b.get_table(“counter”) and iterate keys to print port and count. Finally call b.remove_xdp(device, 0) to clean up.
Step | Command or code | Note |
---|---|---|
Compile & load | BPF(src_file=”udp_counter.c”) | Loads C file and compiles for kernel |
Attach | b.attach_xdp(device, fn, 0) | Device can be “lo” for testing |
Generate packets | nc -u 127.0.0.1 5005 | Send UDP packets to test counts |
Read & cleanup | dist = b.get_table(“counter”); b.remove_xdp(device,0) | Print DEST_PORT and COUNT, then detach |
XDP runs before the main network stack, so it is fast. You can choose an action code—XDP_PASS, XDP_DROP, XDP_TX, XDP_REDIRECT—depending on whether you only count or want to alter packet fate.
Where to go next: deeper tracing, networking, and real-world monitoring
Try a concrete next project: trace the openat system call to see which process touches which file and which flags it uses. This gives high-signal data for auditing and monitoring.
In kernel code define a struct with uid, comm, fname, and flags. Use helper functions like bpf_get_current_comm and bpf_get_current_uid_gid, read the user filename with bpf_probe_read_user_str, and emit via a perf buffer named “events”.
On the user side mirror the struct with ctypes, open the perf buffer, and poll with perf_buffer_poll to print UID, COMM, flags, and filenames as events arrive. Run strace first to confirm the right syscall paths.
Keep iterating: generalize to life-cycle tracking, use tracepoints or kretprobes where stable, and consult the bcc docs and Brendan Gregg’s examples for patterns that scale in the linux kernel.