Skip to content
Hogin Hogin
Go back

eBPF without the pain: Cilium and network observability in Kubernetes

9 мин чтения

eBPF lets you observe and change network behavior right inside the Linux kernel — with no sidecars in every pod and no kernel rebuild. For Kubernetes that means a different model: instead of thousands of iptables rules and proxy containers next to the app, all the networking and observability logic lives in a single layer on the node. Cilium is the most mature implementation of this approach. Let’s cover what eBPF is in plain terms, why you’d swap your CNI, and what you need to try it all in a test cluster.

Table of contents

Open Table of contents

What eBPF is in plain terms

To intervene in networking or syscalls at the kernel level you used to write a kernel module — dangerous (a bug takes down the whole node) and inconvenient (rebuild, reboot). eBPF changes the rules: you load a tiny program into the kernel and it runs in response to events — a packet arrives, a socket opens, a syscall fires.

The key piece is the verifier. Before loading, the kernel checks the program: that it terminates (no infinite loops), won’t touch memory it shouldn’t, won’t crash the system. Only after passing does the program attach to a hook point. The result is a safe “live kernel extension”: native-code performance without the risks of a module and without recompilation.

For networking that means routing, load balancing, filtering, and metrics collection can happen at the hottest spot — where the packet enters the node — instead of shuttling it through long rule chains in user space and back.

The problem eBPF solves

In classic Kubernetes two things handle networking, and both scale poorly.

The first is kube-proxy on iptables. Each Service turns into a set of iptables rules, and the kernel walks them linearly. With dozens of services it’s invisible. With thousands the rule chain gets long and updating it on every endpoint change is expensive. Latency and control-plane load grow with cluster size.

Service routing: kube-proxy vs eBPF

The second is observability via sidecars. A classic service mesh puts a proxy container next to every app: it intercepts traffic and collects metrics. It works, but the price is high — an extra container per pod, additional CPU and memory, added latency per hop, and overall complexity. For hundreds of pods the “sidecar tax” becomes a noticeable line item.

Observability: a sidecar in every pod vs eBPF in the kernel

eBPF removes both at once: instead of linear rules, a hash table in the kernel with constant-time lookup; instead of a proxy in every pod, a single dataplane layer per node.

What Cilium changes

Cilium is a CNI (Kubernetes network plugin) built on eBPF. Installing it instead of flannel or calico gets you several things:

Hubble: observability without a proxy

Hubble is Cilium’s eyes. Since all traffic already passes through the eBPF layer, Hubble simply reads those events and surfaces them as flows. You see a service graph, individual connections, and — most useful when debugging — drop reasons.

The typical scenario: a service “can’t reach” another and it’s unclear why. Instead of tcpdump across pods and reading iptables, you run one command and immediately see packets dropped with verdict DROPPED and reason Policy denied. The network didn’t break — your own NetworkPolicy is cutting the traffic. That debugging takes seconds instead of hours.

The important part is observability without instrumenting the application. You don’t embed an SDK, add a proxy, or change the service’s code — eBPF sees the traffic at the kernel level as it actually flows. So Hubble shows your own Go service, a closed-source binary, and a legacy app no one has touched in years equally well. Flow metrics (request counts, drop ratios, per-connection latency) are exported to Prometheus, so you can build ordinary dashboards and alerts on top — without a sidecar under every pod.

eBPF is not only about networking

While in the Cilium context eBPF is usually discussed as a network dataplane, the approach is broader. The same in-kernel programs can observe syscalls, process execution, file opens, and network connections at the per-process level. Tetragon — a component of the Cilium ecosystem for security observability and runtime enforcement — is built on this.

In practice this gives you a runtime “black box”: you see an unexpected process start inside a container, an app reading a file it shouldn’t touch, or a connection to a suspicious address — all with no agent inside the container. And you can not only observe but block: a kernel-level rule kills a process that violates policy before it can do harm. For a team that means one technology covers both network observability and basic runtime security — without a zoo of separate agents.

Comparison at a glance

kube-proxy + sidecarsCilium (eBPF)
Service routingiptables, O(N)eBPF map, ~O(1)
Observabilityproxy in every podHubble, per-node layer
Policy basisIP addresseslabel identity
Filtering levelL3/L4L3/L4 and L7
Per-pod tax+container, CPU/RAMno sidecar
Debugging dropstcpdump + iptableshubble observe

What you need to stand up Cilium

The easiest way to try it is a local kind cluster. You’ll need kind, helm, and the cilium CLI. Create a cluster without the default CNI so you can install your own:

kind create cluster --config - <<'EOF'
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
  disableDefaultCNI: true     # disable kindnet, install Cilium
  kubeProxyMode: none         # Cilium will replace kube-proxy
nodes:
  - role: control-plane
  - role: worker
EOF

Install Cilium with Hubble enabled:

helm repo add cilium https://helm.cilium.io
helm install cilium cilium/cilium --namespace kube-system \
  --set kubeProxyReplacement=true \
  --set hubble.enabled=true \
  --set hubble.relay.enabled=true \
  --set hubble.ui.enabled=true

cilium status --wait        # wait until ready

Policy: deny egress except one host

Now lock down a pod’s egress, leaving access only to a needed external API. Cilium supports L7 rules and DNS-name filtering:

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: egress-only-api
  namespace: default
spec:
  endpointSelector:
    matchLabels:
      app: worker
  egress:
    - toFQDNs:
        - matchName: "api.example.com"
      toPorts:
        - ports:
            - port: "443"
              protocol: TCP
    - toEndpoints:                 # allow DNS, or resolution fails
        - matchLabels:
            k8s:io.kubernetes.pod.namespace: kube-system
            k8s-app: kube-dns
      toPorts:
        - ports:
            - port: "53"
              protocol: UDP

A pod labeled app: worker can now reach only api.example.com:443 (and DNS). Any other egress is dropped — and it shows up immediately in Hubble.

How to verify it works

First confirm kube-proxy is really replaced:

cilium status | grep -i kubeproxy
# KubeProxyReplacement:   True

Now watch live flows and look for drops from our policy:

hubble observe --namespace default --verdict DROPPED
# ... worker -> 1.2.3.4:443  DROPPED  (Policy denied)

Try reaching any address other than api.example.com from the worker pod and a line with verdict DROPPED and a reason appears — that’s “packet tracing without sidecars.” And hubble observe --verdict FORWARDED shows the allowed traffic. For a visual picture there’s the Hubble UI with a service graph (cilium hubble ui).

Pitfalls worth knowing up front

Cilium is powerful but not “install and forget.” A few places where newcomers trip up.

Bottom line

eBPF removes the two big taxes of classic Kubernetes networking: linear iptables rules and a proxy container in every pod. Cilium packages this into a ready CNI — with kube-proxy replacement, identity policies, L7 filtering, and Hubble observability, where packet tracing and the drop reason are one command away. The cost of entry is swapping the CNI and paying attention to kernel version and host networking. If your cluster already has more than five to ten services and network debugging regularly eats hours, the switch almost always pays off: you gain eyes on the network that, in the sidecar world, would have cost a noticeable overhead.


Share this post:

Previous Post
OpenTelemetry Collector: a minimal setup you can ship to prod
Next Post
SLSA Level 2: what build provenance is and why it isn't SBOM