Skip to content
Hogin Hogin
Go back

OpenTelemetry Collector: a minimal setup you can ship to prod

10 мин чтения

OpenTelemetry has outgrown the “here’s a spec, figure it out” phase. Today it’s the de-facto standard for telemetry, and its central piece is the Collector: a single process between your applications and the backend for metrics, traces, and logs. It removes vendor lock-in and gives you a control plane over your signals. Let’s see how it works and build a minimal config you can ship to prod without blushing.

Table of contents

Open Table of contents

Why you need a Collector at all

You can send telemetry from the app straight to the backend — so why an extra process in the middle? Because without it every app is hard-wired to a specific vendor: endpoint, format, and API key baked into the code. Decide to switch from Datadog to Grafana — re-roll every service. Want to redact sensitive fields or sample traces — do it in each app separately.

The Collector breaks that coupling. The app knows only one address — the local collector — and one protocol, OTLP. Everything else (where to send, what to filter, how to batch and sample) is decided in the collector’s config, not in code. It’s the same idea as a reverse proxy for HTTP: a small process in the middle that gives you a point of control.

There are more down-to-earth reasons too. Without a collector each service is responsible for retries, buffering, and behavior when the backend is down — which means an observability outage can take down the app itself, choking on a queue of unsent telemetry. The collector takes that dirty work off your hands: the app makes a fast local call and forgets, while reliable delivery, retries, and burst smoothing are the job of a separate process that doesn’t share fate with your business logic.

receivers, processors, exporters — three words

All of the collector’s work is described by three kinds of components, and once you get them you get the OTel Collector entirely.

These components are assembled into a pipeline — a separate one per signal type: one for traces, one for metrics, one for logs. Within a pipeline the signal flows strictly receivers → processors → exporters.

Collector pipeline: receivers, processors, exporters

What the collector actually buys you

Four things people install it for:

Agent vs Gateway

The collector is deployed in two topologies, and usually both at once.

Agent — a collector next to the app, typically a DaemonSet on each node. It receives signals locally (low latency), adds host and Kubernetes metadata, and takes retries off the app’s plate.

Gateway — a central collector (a Deployment with autoscaling) where agents forward what they’ve collected. It’s the right place for heavy work: tail-based sampling across a whole trace, deduplication, a single egress point.

Agent vs Gateway

A small project is fine with a single collector, but even then keep this split in mind: it tells you which processors belong where.

Semantic conventions: names matter more than values

A separate but critical point: OTel defines semantic conventions — standard attribute names. The HTTP method is http.request.method, not method, verb, or httpMethod to each team’s taste. This is boring right up until you want one dashboard across all services or to compare the latency of two apps. If names drift, no query can stitch the data together. So the value of telemetry is largely set by naming discipline — and the collector helps maintain it by normalizing attributes in processors.

Three signals, one process

A big part of the collector’s value is that metrics, traces, and logs go through the same abstraction. Adding metrics to our trace pipeline is a few more lines in service.pipelines:

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [otlp/tempo]
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [prometheus]

Same logic — only the exporter changes for each signal’s backend. That’s what “a single control plane” means: instead of three different agents (one for metrics, one for traces, one for logs) you have one process with three pipelines and a single policy for batching, sampling, and redaction. When tomorrow you need to reroute logs to different storage or add PII stripping across all three signals at once, those are edits in one file, not three separate projects.

The reverse matters too: pipelines are independent. You can sample traces aggressively but keep 100% of metrics; you can strip attributes only in logs. The collector doesn’t force a single policy on everything — it gives you a place to describe those policies side by side.

Comparison at a glance

Vendor agent in every podOTel Collector
Vendor couplinghard-wired, in codeone config line
Switching backendsrelease every serviceedit an exporter
Sampling / redactionin each appcentralized
ProtocolproprietaryOTLP (open)
Batchinghit or missbuilt in
Signalsoften just its own settraces, metrics, logs

What you need for a minimal setup

Let’s stand up a local trio of containers: an app (sends OTLP), the collector, and a trace backend — we’ll use Grafana Tempo. Start with the collector config.

# otel-collector.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  memory_limiter:
    check_interval: 1s
    limit_percentage: 80           # don't let the collector eat all memory
  batch:
    timeout: 5s                    # accumulate for 5s, then send a batch

exporters:
  otlp/tempo:
    endpoint: tempo:4317
    tls:
      insecure: true               # local only; use real TLS in prod

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [otlp/tempo]

The config reads like a sentence: receive OTLP → limit memory and batch → send to Tempo. Now the docker-compose that wires it together:

services:
  tempo:
    image: grafana/tempo:latest
    command: ["-config.file=/etc/tempo.yaml"]
    volumes:
      - ./tempo.yaml:/etc/tempo.yaml

  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    command: ["--config=/etc/otel-collector.yaml"]
    volumes:
      - ./otel-collector.yaml:/etc/otel-collector.yaml
    ports:
      - "4317:4317"     # OTLP gRPC
      - "4318:4318"     # OTLP HTTP
    depends_on: [tempo]

  app:
    image: your-app:latest
    environment:
      OTEL_EXPORTER_OTLP_ENDPOINT: "http://otel-collector:4317"
      OTEL_SERVICE_NAME: "demo-app"
    depends_on: [otel-collector]

The app needs only two environment variables — where to send and what to call itself. No vendor SDKs: most languages have OTel instrumentation that reads OTEL_EXPORTER_OTLP_ENDPOINT on its own.

We use opentelemetry-collector-contrib, not the base distribution: contrib bundles nearly all receivers/processors/exporters, including vendor ones. For prod you can later build your own trimmed distribution with only the components you need.

How to verify it works

Bring up the stack and check the collector logs — at startup it prints which pipelines were assembled:

docker compose up -d
docker compose logs otel-collector | grep -i "Everything is ready"

Then run a couple of requests through the app and confirm traces reached Tempo. The easiest way is to add a temporary debug exporter to the collector and see the signals in the logs directly:

exporters:
  debug:
    verbosity: detailed
# and add debug to the exporters of the relevant pipeline

If the collector logs show spans with your service.name=demo-app and Tempo finds a trace by that name — the chain works. After checking, remove the debug exporter so it doesn’t pollute prod logs.

From contrib to your own distribution

For getting started, opentelemetry-collector-contrib is ideal: it has everything and you don’t have to think about where each component comes from. But you don’t have to ship an image with every existing receiver and exporter to prod — that’s extra size and extra attack surface. Once the config settles and you know which components you need, it’s worth building a trimmed distribution via the OpenTelemetry Collector Builder (ocb): you list only the needed modules in a manifest and get a minimal binary tailored to your pipeline.

The collector is light on resources: a single agent instance usually fits in tens to hundreds of megabytes of memory, and load scales via batching and horizontally (a gateway behind an HPA). The one thing not to forget is memory_limiter: it’s what turns “the collector occasionally dies under a spike” into “the collector cleanly throttles intake and survives.”

Pitfalls

Bottom line

The OTel Collector is “nginx for telemetry”: a small process in the middle that removes vendor lock-in and gives a single point of control over metrics, traces, and logs. A minimal production setup is literally one OTLP receiver, two processors (memory_limiter + batch), and one exporter. It’s where any new observability system should start: dragging a proprietary agent into every pod in 2026 is an anachronism, and moving to an open protocol later costs far more than baking it in from the start.


Share this post:

Previous Post
Flux in one evening: GitOps for a single small cluster
Next Post
eBPF without the pain: Cilium and network observability in Kubernetes