Skip to main content

This document is work in progress

OpenTelemetry-Based

The Value SDK is built on OpenTelemetry, providing:
  • Standardized Tracing: Industry-standard span format
  • Vendor Neutral: Works with any OpenTelemetry-compatible backend
  • Automatic Context Propagation: Traces flow across service boundaries

Problem Summary

Spans generated in client processes can be lost before they reach an OpenTelemetry Collector due to:
  • Process exits or crashes
  • Exporter queue overflows
  • Network outages
  • Ephemeral compute environments (serverless, containers)

SDK-Level Guarantees

The following guarantees are provided at the SDK level automatically, requiring no user action:

Adaptive Batching & Dynamic Queues

Auto-adjust max_queue_size, max_export_batch_size, and schedule_delay_millis based on observed spans/sec and queue pressure. Benefit: Avoid queue overflow during traffic spikes; reduce latency and memory when traffic is low.

Export Retry with Exponential Backoff & Jitter

Built-in retry around OTLP exporter calls with backoff and jitter. Benefit: Handles transient network errors without immediate span loss.

Manual Flush & Shutdown API

Expose flush() / forceFlush() and shutdown() to let customers force-export buffered spans during controlled shutdowns (containers, CLI, serverless invocation end). Benefit: Essential for serverless short-lived processes and graceful shutdown.
from value import ValueClient

client = ValueClient(secret="your_agent_secret")
client.initialize()

# ... perform actions ...

# Force flush before shutdown
client.flush()

# Or for graceful shutdown
client.shutdown()

Drop/Backpressure Monitoring & Telemetry

Emit internal metrics and log warnings when queue fills or spans are dropped (expose event or callback). Benefit: Operators detect undelivered spans early.

Environment-Aware Defaults

Detect runtime type (serverless vs long-running) and pick safe defaults:
EnvironmentBehavior
ServerlessMinimal buffering, immediate flush on end-of-invocation
Long-runningLarger buffers and adaptive scaling

(Optional) Soft Persistence

If runtime allows disk use (VM/container with mounted volume), optionally enable small on-disk queue as a fallback.
This requires careful security/permission handling and should be opt-in.

Why SDK-Level Matters

The SDK runs inside the application process and can:
  • Reduce short-term losses
  • Simplify defaults for users
  • Provide safe APIs (flush) for critical shutdown paths
SDK-level guarantees cannot fully protect against process crashes. For complete durability, use a local OpenTelemetry Collector with persistence enabled.

Resource Attributes

The SDK automatically sets resource attributes:
value.agent.id: agent instance ID
value.agent.name: agent instance name
value.agent.workspace_id: workspace ID
value.agent.organization_id: organization ID

User Context

Within an action_context, user attributes are propagated:
value.action.user_id: identified user
value.action.anonymous_id: anonymous session ID

Console Export

For debugging, enable console export:
export VALUE_CONSOLE_EXPORT=true
Or via code:
from value import ValueClient

client = ValueClient(
    secret="your_agent_secret",
    enable_console_export=True
)
client.initialize()
This prints spans to stdout for debugging.