bcc tools eBPF tutorial
eBPF Tooling
William Patterson  

Start eBPF Programming with bcc Tools

you want a practical bcc tools eBPF tutorial that helps you fix slow services and blind spots in your system without risky kernel hacks.

I know that kernel work can feel scary, so we’ll walk step-by-step and keep things friendly. We’ll show how a tiny ebpf program runs safely in the kernel and how BCC makes writing code approachable for both new and seasoned engineers.

We’ll focus on real wins: live tracing, performance troubleshooting, and monitoring across network paths. You’ll build two labs—a syscall Hello World and an XDP UDP counter—so you can reuse the patterns in your own program development.

Along the way, we explain what each command does, how long steps take, and how to check results without breaking the system. By the end, you’ll read hooks, attach probes, capture events, and extract insights with confidence.

Table of Contents

Key Takeaways

  • Learn a hands-on path to run safe eBPF code in the kernel.
  • Use BCC to speed up tracing, monitoring, and performance fixes.
  • Complete two practical labs you can run on a laptop or server.
  • Understand the program lifecycle: write, load, observe, and clean up.
  • Gain reproducible skills for system and network troubleshooting.

Why eBPF and BCC matter for modern Linux performance, tracing, and monitoring

Seeing into a live Linux system becomes practical when small, verified programs run in the kernel. eBPF is an extended, verifier-checked virtual machine inside the linux kernel that lets us run user code safely and with low overhead.

These programs are event-driven and attach at well-defined hook points—function entries/exits, syscall handlers, kernel tracepoints, and fast-path network spots like XDP. XDP itself can operate in Offloaded, Native, or Generic modes depending on your NIC and driver.

Common hooks you’ll use

  • kprobe — dynamic function tracing when you need a focused view.
  • tracepoints — stable fields for reproducible instrumentation.
  • syscall interception — visibility into every system call and call path.
  • Network/XDP — packet fate and fast-path metrics for performance work.

We use BCC for fast iteration: small C or Python bindings make it easy to compile, load, and observe results without rebooting. That flow gives practical outcomes — low-cost tracing in production, sharper debugging, and faster feedback when investigating regressions.

Safety matters: the verifier and constrained program models protect the kernel while letting you ask precise questions of your system.

Environment setup for this bcc tools eBPF tutorial

Before you run any examples, let’s pick an environment that keeps experiments safe and repeatable.

We compare running natively on Ubuntu versus a lightweight virtual machine so you choose what fits your machine and comfort level. Native Ubuntu works well if your kernel headers match the running kernel; that avoids compilation errors when loading probes into the kernel.

Mac on Apple Silicon: Lima VM with Ubuntu

For Apple Silicon, we recommend Lima. Spin up an Ubuntu instance, mount a work directory, and provision dependencies in one pass. Example commands: brew install lima, limactl start –name ubuntu ubuntu.yaml, limactl list, and limactl shell ubuntu.

Dependencies and quick checks

Provision the YAML to install bpfcc-tools, linux-headers-$(uname -r), build-essential, pkg-config, and libssl-dev. Install clang/LLVM if you plan to build C snippets. BCC offers Python bindings so a user space loader can compile and inject code into the kernel with minimal ceremony.

  • Keep your work folder mounted so files persist outside the VM.
  • Pin images or snapshot the instance for repeatability over time.
  • Do a quick sanity check with a sample BCC example to confirm headers, compiler collection, and permissions are correct.
OptionProsCons
Native UbuntuFaster I/O, direct kernel accessMust match kernel headers
Lightweight VMIsolated, repeatable machineSmall overhead and setup time
Lima on MacEasy for Apple Silicon, shared foldersRequires limactl commands and YAML

Quick orientation: programs in kernel space and user space

Let’s map how two cooperating programs—one in kernel space and one in user space—work together to observe a running system.

In practice, you write a tiny kernel program that handles fast, event-driven work and a user space program that loads, configures, and reads results.

user space

The kernel side must stay small and safe so the verifier accepts it. It uses bpf helpers to read time, task info, or memory safely.

Maps bridge the two spaces. They act as key/value stores, arrays, and histograms so both the kernel program and the user reader share state and durable results.

  • The user program invokes BCC to compile and load the kernel code, attach a hook, and then poll or subscribe for updates.
  • Keep complex logic in user space—control flow, aggregation, and presentation—to avoid verifier issues and speed iteration.
  • Debug by using trace buffers for quick messages and structured maps for parseable, long-lived data.

Typical flow: event fires in the kernel, the small program records a metric to a map, then the user polls the map and transforms that data for logs or UIs.

Once this split clicks, writing more ebpf programs becomes a repeatable pattern you can reuse across many system tasks.

Hello World with kprobe: trace a system call every time it runs

We’ll start with a tiny, hands-on example that prints a message every time the kernel creates a new process. This demo uses a kprobe on the clone syscall so you can watch activity in real time.

Attach a kprobe to the clone syscall and print from the kernel

Write a minimal C snippet with a function named syscall__clone that calls bpf_trace_printk(“Hello, World!\n”).

Load it from Python using BPF(text=…) and pick the correct event name with b.get_syscall_prefix().decode() + ‘clone’. That handles different symbol names across architectures.

Streaming events to user space with BCC’s trace_print in Python

After loading, call b.attach_kprobe(event=event_name, fn_name=”syscall__clone”), then stream output with b.trace_print(). Each clone call prints the Hello World line every time the kernel executes that system call.

  • This is great for quick inspection and hypothesis testing when a new process spawns often.
  • Watch out for symbol differences and run under sudo so you have permission to attach probes.
  • For production-grade collection, move from print strings to maps for structured data.

Count UDP packets with XDP and BPF maps using BCC

I’ll show a concise XDP flow that parses Ethernet, IP, and UDP headers and records destination-port counts. The goal is simple: parse packets safely in the kernel, persist counts in a histogram, and read results from a Python loader.

Write the C program to parse headers

In C, define KBUILD_MODNAME and include linux/bpf.h, if_ether.h, ip.h, and udp.h. Declare a BPF_HISTOGRAM(counter, u64).

The function udp_counter(struct xdp_md *ctx) walks packet bounds using ctx->data and ctx->data_end. Check each header—ethhdr, iphdr, udphdr—and return early on malformed packets. If IP protocol == UDP, read udp->dest, convert with htons, and increment the histogram bucket keyed by that port.

Load, attach, and collect with a Python loader

From Python use BPF(src_file=”udp_counter.c”); fn = b.load_func(“udp_counter”, BPF.XDP); then attach_xdp(device, fn, 0). While running, b.trace_print() streams quick events for sanity. On KeyboardInterrupt, read dist = b.get_table(“counter”) and iterate keys to print port and count. Finally call b.remove_xdp(device, 0) to clean up.

StepCommand or codeNote
Compile & loadBPF(src_file=”udp_counter.c”)Loads C file and compiles for kernel
Attachb.attach_xdp(device, fn, 0)Device can be “lo” for testing
Generate packetsnc -u 127.0.0.1 5005Send UDP packets to test counts
Read & cleanupdist = b.get_table(“counter”); b.remove_xdp(device,0)Print DEST_PORT and COUNT, then detach

XDP runs before the main network stack, so it is fast. You can choose an action code—XDP_PASS, XDP_DROP, XDP_TX, XDP_REDIRECT—depending on whether you only count or want to alter packet fate.

Where to go next: deeper tracing, networking, and real-world monitoring

Try a concrete next project: trace the openat system call to see which process touches which file and which flags it uses. This gives high-signal data for auditing and monitoring.

In kernel code define a struct with uid, comm, fname, and flags. Use helper functions like bpf_get_current_comm and bpf_get_current_uid_gid, read the user filename with bpf_probe_read_user_str, and emit via a perf buffer named “events”.

On the user side mirror the struct with ctypes, open the perf buffer, and poll with perf_buffer_poll to print UID, COMM, flags, and filenames as events arrive. Run strace first to confirm the right syscall paths.

Keep iterating: generalize to life-cycle tracking, use tracepoints or kretprobes where stable, and consult the bcc docs and Brendan Gregg’s examples for patterns that scale in the linux kernel.

FAQ

What is the difference between BPF and the in-kernel virtual machine used for eBPF?

The original BPF was a packet-filtering mechanism. The modern in-kernel virtual machine—extended BPF—adds a safe, sandboxed runtime that runs custom bytecode inside the Linux kernel. That lets us attach programs to kprobes, tracepoints, syscalls, and network hooks like XDP for high-performance tracing and packet processing without changing kernel source code.

Do I need a virtual machine to run programs for learning and testing?

You can run natively on a Linux host if you have a recent kernel and the required compiler and headers. A lightweight virtual machine (VM) is a good option when you’re on macOS or want an isolated environment—Lima with Ubuntu works well on Apple Silicon. VMs make it safer to experiment with kernel attachments and networking hooks.

Which dependencies should I install before writing and loading BPF programs?

Install compiler toolchains like clang/LLVM, kernel headers matching your running kernel, and build-essential packages. Also install the BPF Compiler Collection and supporting user-space libraries to compile C programs, use helper functions, and build Python scripts that load and read maps and events.

How do kernel-space programs interact with user-space tools?

Kernel-space programs emit events, update BPF maps, or use perf buffers. User-space programs (often Python or Go) open those buffers, poll for events, and read maps to display counts, histograms, or traces. This separation keeps heavy logic and I/O out of the kernel while providing low-overhead visibility.

What’s the simplest “hello world” example for tracing a syscall?

Attach a kprobe to a syscall like clone, and use a trace helper to print a message or push an event to a perf buffer. A small C snippet runs in the kernel to capture arguments; a Python loader attaches the probe and streams events to the terminal every time the syscall runs.

How do I track and stream syscall events to user space reliably?

Use perf buffers or ring buffers in the kernel program and a user-space consumer that polls them. The consumer can format and print events, aggregate statistics, or persist data in a file. This approach prevents kernel blocking and supports high-frequency event streams.

Can I count network packets and group them by port using XDP and maps?

Yes. An XDP program running at the driver level can parse Ethernet/IP/UDP headers and increment entries in a BPF map keyed by destination port. The program runs at line rate, and a user-space reader periodically dumps map contents to produce histograms or summaries.

What map types should I use for counts and histograms?

Use hash maps for arbitrary keys and array or per-CPU maps for fixed-size counters. BPF_HISTOGRAM-style helpers (or manual bucket maps) work well for latency distributions or port histograms. Pick a map type that minimizes contention and fits your aggregation needs.

How do I attach a kprobe or a tracepoint safely on a production system?

Test first in a VM or staging host. Use minimal, well-audited kernel code and keep the program simple—avoid loops and heavy memory allocation. Prefer tracepoints when available; they’re stable and safer than raw kprobes for some subsystems. Monitor performance and fallback quickly if CPU or latency spikes appear.

What tools and languages do people use to write user-space loaders and dashboards?

Common choices are Python with bindings that simplify attaching probes and reading maps, or Go and C for higher performance. Many teams integrate collectors into Prometheus exporters or real-time dashboards to visualize events, packet rates, or syscall counts.

How do helper functions and the compiler collection help when writing kernel programs?

Helper functions provide safe access to kernel facilities—map updates, perf events, time reads—without exposing raw kernel APIs. The compiler collection (clang/LLVM) generates the verified bytecode. Use verified helper calls to keep programs loadable and secure in the kernel verifier.

Will tracing every syscall cause performance problems?

Tracing every syscall at scale can add overhead. Selective kprobes, sampling, or filtering by PID/process or syscall name reduces cost. Using lightweight probes that push minimal data to maps or perf buffers and doing heavy aggregation in user space keeps impact low.

How can I debug a program that fails to load in the kernel?

Check verifier logs and kernel messages for reject reasons. Use smaller programs, remove complex loops, and verify map definitions. Tools that compile and emit verifier output help iterate quickly. Running in a VM helps when you need to reproduce and fix loader issues safely.

Where should I go next after learning simple probes and XDP counters?

Dive deeper into advanced tracing—stack traces, histograms, per-CPU maps—and networking, like connection tracking or DDoS mitigation with XDP. Learn to build end-to-end pipelines: kernel probes that populate maps, user-space collectors that export metrics, and dashboards that visualize real-world traffic and system behavior.