eBPF without the pain: Cilium and network observability in Kubernetes

eBPF lets you observe and change network behavior right inside the Linux kernel — with no sidecars in every pod and no kernel rebuild. For Kubernetes that means a different model: instead of thousands of iptables rules and proxy containers next to the app, all the networking and observability logic lives in a single layer on the node. Cilium is the most mature implementation of this approach. Let’s cover what eBPF is in plain terms, why you’d swap your CNI, and what you need to try it all in a test cluster.

Open Table of contents

What eBPF is in plain terms
The problem eBPF solves
What Cilium changes
Hubble: observability without a proxy
eBPF is not only about networking
Comparison at a glance
What you need to stand up Cilium
- Policy: deny egress except one host
How to verify it works
Pitfalls worth knowing up front
Bottom line

What eBPF is in plain terms

To intervene in networking or syscalls at the kernel level you used to write a kernel module — dangerous (a bug takes down the whole node) and inconvenient (rebuild, reboot). eBPF changes the rules: you load a tiny program into the kernel and it runs in response to events — a packet arrives, a socket opens, a syscall fires.

The key piece is the verifier. Before loading, the kernel checks the program: that it terminates (no infinite loops), won’t touch memory it shouldn’t, won’t crash the system. Only after passing does the program attach to a hook point. The result is a safe “live kernel extension”: native-code performance without the risks of a module and without recompilation.

For networking that means routing, load balancing, filtering, and metrics collection can happen at the hottest spot — where the packet enters the node — instead of shuttling it through long rule chains in user space and back.

The problem eBPF solves

In classic Kubernetes two things handle networking, and both scale poorly.

The first is kube-proxy on iptables. Each Service turns into a set of iptables rules, and the kernel walks them linearly. With dozens of services it’s invisible. With thousands the rule chain gets long and updating it on every endpoint change is expensive. Latency and control-plane load grow with cluster size.

Service routing: kube-proxy vs eBPF

The second is observability via sidecars. A classic service mesh puts a proxy container next to every app: it intercepts traffic and collects metrics. It works, but the price is high — an extra container per pod, additional CPU and memory, added latency per hop, and overall complexity. For hundreds of pods the “sidecar tax” becomes a noticeable line item.

Observability: a sidecar in every pod vs eBPF in the kernel

eBPF removes both at once: instead of linear rules, a hash table in the kernel with constant-time lookup; instead of a proxy in every pod, a single dataplane layer per node.

What Cilium changes

Cilium is a CNI (Kubernetes network plugin) built on eBPF. Installing it instead of flannel or calico gets you several things:

kube-proxy replacement. Cilium can drop kube-proxy entirely: service load balancing is done via eBPF maps. Lower latency, better scaling on large clusters.
Identity-based policies. Plain NetworkPolicy operates on IPs. In a dynamic cluster pod IPs change constantly, so IP-based rules are fragile. Cilium assigns each workload a stable identity from its labels and filters on that — a “frontend may talk to backend” policy survives any pod reshuffle.
L7 policies. You can filter not only by port but by HTTP methods and paths, gRPC methods, Kafka topics — without a full sidecar mesh.
Hubble. A built-in observability layer: who talks to whom, what’s blocked and why — with no proxy in any pod.

Hubble: observability without a proxy

Hubble is Cilium’s eyes. Since all traffic already passes through the eBPF layer, Hubble simply reads those events and surfaces them as flows. You see a service graph, individual connections, and — most useful when debugging — drop reasons.

The typical scenario: a service “can’t reach” another and it’s unclear why. Instead of tcpdump across pods and reading iptables, you run one command and immediately see packets dropped with verdict DROPPED and reason Policy denied. The network didn’t break — your own NetworkPolicy is cutting the traffic. That debugging takes seconds instead of hours.

The important part is observability without instrumenting the application. You don’t embed an SDK, add a proxy, or change the service’s code — eBPF sees the traffic at the kernel level as it actually flows. So Hubble shows your own Go service, a closed-source binary, and a legacy app no one has touched in years equally well. Flow metrics (request counts, drop ratios, per-connection latency) are exported to Prometheus, so you can build ordinary dashboards and alerts on top — without a sidecar under every pod.

eBPF is not only about networking

While in the Cilium context eBPF is usually discussed as a network dataplane, the approach is broader. The same in-kernel programs can observe syscalls, process execution, file opens, and network connections at the per-process level. Tetragon — a component of the Cilium ecosystem for security observability and runtime enforcement — is built on this.

In practice this gives you a runtime “black box”: you see an unexpected process start inside a container, an app reading a file it shouldn’t touch, or a connection to a suspicious address — all with no agent inside the container. And you can not only observe but block: a kernel-level rule kills a process that violates policy before it can do harm. For a team that means one technology covers both network observability and basic runtime security — without a zoo of separate agents.

Comparison at a glance

	kube-proxy + sidecars	Cilium (eBPF)
Service routing	iptables, O(N)	eBPF map, ~O(1)
Observability	proxy in every pod	Hubble, per-node layer
Policy basis	IP addresses	label identity
Filtering level	L3/L4	L3/L4 and L7
Per-pod tax	+container, CPU/RAM	no sidecar
Debugging drops	tcpdump + iptables	`hubble observe`

What you need to stand up Cilium

The easiest way to try it is a local kind cluster. You’ll need kind, helm, and the cilium CLI. Create a cluster without the default CNI so you can install your own:

kind create cluster --config - <<'EOF'
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
  disableDefaultCNI: true     # disable kindnet, install Cilium
  kubeProxyMode: none         # Cilium will replace kube-proxy
nodes:
  - role: control-plane
  - role: worker
EOF

Install Cilium with Hubble enabled:

helm repo add cilium https://helm.cilium.io
helm install cilium cilium/cilium --namespace kube-system \
  --set kubeProxyReplacement=true \
  --set hubble.enabled=true \
  --set hubble.relay.enabled=true \
  --set hubble.ui.enabled=true

cilium status --wait        # wait until ready

Policy: deny egress except one host

Now lock down a pod’s egress, leaving access only to a needed external API. Cilium supports L7 rules and DNS-name filtering:

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: egress-only-api
  namespace: default
spec:
  endpointSelector:
    matchLabels:
      app: worker
  egress:
    - toFQDNs:
        - matchName: "api.example.com"
      toPorts:
        - ports:
            - port: "443"
              protocol: TCP
    - toEndpoints:                 # allow DNS, or resolution fails
        - matchLabels:
            k8s:io.kubernetes.pod.namespace: kube-system
            k8s-app: kube-dns
      toPorts:
        - ports:
            - port: "53"
              protocol: UDP

A pod labeled app: worker can now reach only api.example.com:443 (and DNS). Any other egress is dropped — and it shows up immediately in Hubble.

How to verify it works

First confirm kube-proxy is really replaced:

cilium status | grep -i kubeproxy
# KubeProxyReplacement:   True

Now watch live flows and look for drops from our policy:

hubble observe --namespace default --verdict DROPPED
# ... worker -> 1.2.3.4:443  DROPPED  (Policy denied)

Try reaching any address other than api.example.com from the worker pod and a line with verdict DROPPED and a reason appears — that’s “packet tracing without sidecars.” And hubble observe --verdict FORWARDED shows the allowed traffic. For a visual picture there’s the Hubble UI with a service graph (cilium hubble ui).

Pitfalls worth knowing up front

Cilium is powerful but not “install and forget.” A few places where newcomers trip up.

Replacing kube-proxy needs a compatible kernel. eBPF features depend on the node’s kernel version. On older distros some features (full kube-proxy replacement, certain L7 capabilities) may be unavailable — check kernel requirements before prod.
Host networking and host processes. Pods with hostNetwork: true live in the node’s network namespace and are covered by Cilium policies differently. A common source of “the policy exists but traffic still flows.”
DNS in egress policies. If you lock down egress, don’t forget to explicitly allow DNS to kube-dns, or the app can’t even resolve a name and hits a confusing timeout instead of a clear Policy denied.
toFQDNs is not magic. Domain-name filtering works by intercepting DNS responses. If an app talks to a raw IP bypassing DNS, a name-based rule won’t catch it.
Debugging “it died in eBPF.” When something fails at the dataplane level, start with hubble observe and cilium monitor — they show kernel verdicts. You almost never need to dig into the bytecode itself.

Bottom line

eBPF removes the two big taxes of classic Kubernetes networking: linear iptables rules and a proxy container in every pod. Cilium packages this into a ready CNI — with kube-proxy replacement, identity policies, L7 filtering, and Hubble observability, where packet tracing and the drop reason are one command away. The cost of entry is swapping the CNI and paying attention to kernel version and host networking. If your cluster already has more than five to ten services and network debugging regularly eats hours, the switch almost always pays off: you gain eyes on the network that, in the sidecar world, would have cost a noticeable overhead.