Kubesense

LLM Observability

LLM Observability is an upcoming feature. The functionality described below reflects the planned capabilities.

LLM Observability extends KubeSense's monitoring capabilities to Large Language Model (LLM) applications, providing visibility into AI/ML inference pipelines, token usage, latency, and cost.

Overview

As organizations integrate LLMs into production applications, monitoring these AI systems becomes critical. LLM Observability helps you:

  • Track inference performance — Monitor latency, throughput, and error rates for LLM API calls
  • Monitor token usage — Track input and output token consumption across models and endpoints
  • Cost visibility — Understand the cost of LLM operations across your application
  • Quality monitoring — Track response quality metrics and detect regressions

Planned Capabilities

LLM Traces

End-to-end tracing of LLM request pipelines, including:

  • Prompt construction and preprocessing
  • Model inference time
  • Post-processing and response delivery
  • Embedding generation and vector store interactions

LLM Metrics

Key performance indicators for LLM workloads:

MetricDescription
Inference latencyTime to generate a response (P50, P90, P95, P99)
Token throughputTokens per second for input and output
Request rateLLM API calls per second
Error rateRate limiting, timeout, and other errors
Token usageTotal input/output tokens consumed
Cost per requestEstimated cost based on model pricing

Model Comparison

Compare performance across different LLM providers and models:

  • Latency comparison between models
  • Cost efficiency analysis
  • Error rate benchmarking

Prompt Analytics

Analyze prompt patterns and their impact on performance:

  • Prompt length distribution
  • Token efficiency
  • Cache hit rates for repeated prompts

Supported Integrations

LLM Observability is designed to work with:

  • OpenAI API
  • Anthropic Claude API
  • AWS Bedrock
  • Google Vertex AI
  • Self-hosted models (vLLM, Ollama, TGI)
  • LangChain and LlamaIndex frameworks

Getting Started

LLM Observability will be accessible from the LLM Observability item in the sidebar once the feature is generally available. Stay tuned for updates.