Why OpenTelemetry Over Proprietary Agents
Before OpenTelemetry, every observability vendor shipped their own agent — a proprietary daemon that intercepted your application's traffic, reformatted it into the vendor's schema, and shipped it to the vendor's backend. Switching from Datadog to New Relic meant ripping out the Datadog Agent, installing the New Relic agent, rewriting all your custom instrumentation, and losing months of historical data.
OpenTelemetry (OTel) is the CNCF-standardized solution to vendor lock-in. It defines a universal instrumentation API, a collection of auto-instrumentation libraries for every major framework, and an export protocol (OTLP) that every major observability backend now supports. You instrument your application once with OTel and can send the data to any backend — Datadog, Grafana, ObservabilityOS, or all three simultaneously.
The OTel ecosystem has reached production-ready maturity. The Node.js SDK is stable, the instrumentation libraries for Express, Fastify, MongoDB, Redis, gRPC, and HTTP are actively maintained by the OpenTelemetry community and major vendors, and the OTLP protocol is supported by essentially every modern observability platform.
Installing the OpenTelemetry SDK for Node.js
The OpenTelemetry Node.js SDK is split into modular packages. You install the core SDK, the auto-instrumentation packages for your specific frameworks, and an exporter that sends data to your backend. This modularity means you only install what you need.
# Core SDK
npm install @opentelemetry/sdk-node
# Auto-instrumentation (loads all supported libraries automatically)
npm install @opentelemetry/auto-instrumentations-node
# OTLP exporter (sends to any OTLP-compatible backend)
npm install @opentelemetry/exporter-trace-otlp-http
npm install @opentelemetry/exporter-metrics-otlp-httpConfiguring Auto-Instrumentation
Auto-instrumentation patches the libraries you're already using — Express, HTTP, MongoDB, Redis — and creates spans automatically for every operation. You configure it once in a separate file that you load before anything else. This is the most important pattern: the SDK must load before your application code.
The setup file should be loaded with Node's --require flag or by importing it as the very first statement in your entry point. Loading it after Express or Mongoose is initialized means the patching does not take effect, and your auto-instrumentation will silently produce no traces.
// instrumentation.ts — load BEFORE any other imports
import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-http';
import { PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics';
import { Resource } from '@opentelemetry/resources';
import { SEMRESATTRS_SERVICE_NAME, SEMRESATTRS_SERVICE_VERSION } from '@opentelemetry/semantic-conventions';
const sdk = new NodeSDK({
resource: new Resource({
[SEMRESATTRS_SERVICE_NAME]: process.env.SERVICE_NAME ?? 'my-service',
[SEMRESATTRS_SERVICE_VERSION]: process.env.npm_package_version ?? '0.0.0',
'deployment.environment': process.env.NODE_ENV ?? 'development',
}),
traceExporter: new OTLPTraceExporter({
url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT + '/v1/traces',
headers: { Authorization: `Bearer ${process.env.OBSERVABILITY_API_KEY}` },
}),
metricReader: new PeriodicExportingMetricReader({
exporter: new OTLPMetricExporter({
url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT + '/v1/metrics',
}),
exportIntervalMillis: 30_000,
}),
instrumentations: [
getNodeAutoInstrumentations({
'@opentelemetry/instrumentation-http': { enabled: true },
'@opentelemetry/instrumentation-express': { enabled: true },
'@opentelemetry/instrumentation-mongodb': { enabled: true },
'@opentelemetry/instrumentation-ioredis': { enabled: true },
// Disable filesystem instrumentation — too noisy in production
'@opentelemetry/instrumentation-fs': { enabled: false },
}),
],
});
sdk.start();
// Ensure clean shutdown sends remaining spans
process.on('SIGTERM', () => sdk.shutdown());
process.on('SIGINT', () => sdk.shutdown());Adding Custom Spans for Business Logic
Auto-instrumentation captures framework-level operations: HTTP requests, database queries, cache lookups. But your business logic — the part that makes your application unique — is invisible to auto-instrumentation. Custom spans let you wrap any significant operation in a trace context, making it visible in your distributed traces.
The most valuable places to add custom spans: payment processing, third-party API calls, computationally expensive algorithms, queue processing, and any operation where latency directly affects user experience. A rule of thumb: if you've ever debugged this code path, it deserves a span.
import { trace, SpanStatusCode } from '@opentelemetry/api';
const tracer = trace.getTracer('checkout-service');
async function processPayment(userId: string, amount: number) {
// Create a custom span — visible in traces and dashboards
return tracer.startActiveSpan('payment.process', async (span) => {
// Add attributes — these become searchable dimensions
span.setAttributes({
'payment.user_id': userId,
'payment.amount_cents': amount,
'payment.currency': 'USD',
'payment.provider': 'stripe',
});
try {
const result = await stripe.charges.create({ amount, currency: 'usd' });
span.setAttributes({ 'payment.charge_id': result.id });
span.setStatus({ code: SpanStatusCode.OK });
return result;
} catch (err) {
// Record the error — shows up in error tracking
span.setStatus({ code: SpanStatusCode.ERROR, message: (err as Error).message });
span.recordException(err as Error);
throw err;
} finally {
span.end(); // Always end the span
}
});
}Context Propagation: Tracing Across Service Boundaries
In a microservices architecture, a single user request touches multiple services. For distributed tracing to work, the trace context must be propagated from service to service — each service tells the next one which trace it belongs to. Without context propagation, each service creates isolated traces that cannot be stitched together.
OpenTelemetry uses the W3C TraceContext standard: two HTTP headers, traceparent and tracestate, that carry the trace ID and span ID between services. The OTel HTTP instrumentation injects these headers automatically on outbound requests and extracts them automatically on inbound requests. You get distributed tracing with zero manual context threading.
// Context propagation is automatic with OTel HTTP instrumentation.
// This outbound request will automatically include W3C TraceContext headers:
import axios from 'axios';
// The OTel SDK patches axios automatically — no manual code needed:
const response = await axios.get('https://inventory-service/api/stock', {
params: { productId: '123' },
});
// → Sends headers: traceparent: 00-{traceId}-{spanId}-01
// → inventory-service continues the same trace automaticallyProduction Configuration: Sampling and Performance
Tracing every single request in production is expensive — both in terms of latency overhead and backend storage costs. Sampling is the practice of only recording a percentage of traces. For most production systems, a 1–10% head-based sampling rate is appropriate for normal operation, with 100% sampling for requests that encounter errors.
Tail-based sampling — deciding whether to keep a trace after it has completed — is more accurate but requires infrastructure to hold traces in memory while they complete. Head-based sampling (the default) makes the decision at the start of the request. For most teams, head-based sampling at 5–10% with error-rate 100% is the right starting point.
import { TraceIdRatioBasedSampler, ParentBasedSampler } from '@opentelemetry/sdk-trace-base';
// Sample 10% of requests by default,
// but always sample requests that are already part of a trace (parent-based)
const sampler = new ParentBasedSampler({
root: new TraceIdRatioBasedSampler(
process.env.NODE_ENV === 'production' ? 0.1 : 1.0
),
});
// Add to NodeSDK config:
const sdk = new NodeSDK({ sampler, /* ... */ });Frequently Asked Questions
- Does OpenTelemetry add latency to my application? The auto-instrumentation overhead is typically 1–3ms per request for span creation and context propagation. Exporting is asynchronous and does not block your request path. The batch span processor buffers spans in memory and exports them in the background.
- Can I use OpenTelemetry with TypeScript? Yes — all OTel packages ship with TypeScript definitions. The examples in this guide are TypeScript. You do not need to configure anything special beyond the standard TypeScript setup.
- What if my backend doesn't support OTLP? The OTel SDK ships exporters for many backends: Zipkin, Jaeger, Datadog, Prometheus, and more. OTLP is the recommended format but not the only one. Most modern observability platforms now support OTLP natively.
- How do I test that my instrumentation is working? Start your service with OTEL_EXPORTER_OTLP_ENDPOINT pointing to a local collector or directly to ObservabilityOS. Run a few requests and check your traces dashboard. If you see spans, it's working. If not, set OTEL_LOG_LEVEL=debug to see instrumentation errors.
Stop debugging production in the dark
ObservabilityOS gives every engineer AI-powered incident intelligence. Zero config. Connects in 5 minutes.
About the Author
ObservabilityOS Team
Core Engineering & DevRel
The core engineering, site reliability, and developer relations team behind ObservabilityOS. We build AI-native observability infrastructure to eliminate 3 AM firefighting.