Kubesense

Service Levels

Service Levels is an upcoming feature. The functionality described below reflects the planned capabilities.

Service Levels allows you to define, track, and monitor Service Level Objectives (SLOs) for your applications and infrastructure, ensuring you meet reliability targets.

Overview

Service Level management is a core practice for SRE teams. KubeSense Service Levels helps you:

  • Define SLOs — Set reliability targets for your services (e.g., 99.9% availability, P95 latency < 200ms)
  • Track error budgets — Monitor how much of your error budget has been consumed
  • Burn rate alerts — Get notified when your error budget is being consumed too quickly
  • Historical SLO reporting — Review SLO compliance over time

Key Concepts

Service Level Indicator (SLI)

A quantitative measure of a service's behavior. Common SLIs include:

  • Availability — Proportion of successful requests
  • Latency — Proportion of requests faster than a threshold
  • Throughput — Request rate staying within expected bounds
  • Error rate — Proportion of requests that result in errors

Service Level Objective (SLO)

A target value for an SLI over a time window. For example:

  • "99.9% of requests succeed over a 30-day rolling window"
  • "95% of requests complete in under 200ms over a 7-day window"

Error Budget

The allowed amount of unreliability, derived from the SLO. For a 99.9% availability SLO:

  • Error budget = 0.1% of total requests
  • Over 30 days, that's approximately 43.2 minutes of downtime

Planned Capabilities

SLO Configuration

Define SLOs using:

  • Count-based SLOs — Good events / total events (e.g., successful requests / total requests)
  • Time-based SLOs — Good time slices / total time slices (e.g., minutes where latency < threshold)

Error Budget Tracking

  • Remaining budget — Percentage of error budget remaining
  • Burn rate — Rate at which error budget is being consumed
  • Budget forecast — Projected budget exhaustion date based on current burn rate

SLO Dashboard

  • Per-service SLO status at a glance
  • Error budget consumption trends
  • SLO compliance history (7-day, 30-day, 90-day windows)

Burn Rate Alerts

Automatic alerting when error budget consumption exceeds expected rates:

  • Fast burn — Immediate notification for rapid budget consumption
  • Slow burn — Warning when steady degradation will exhaust the budget within the window

Getting Started

Service Levels will be accessible from the Service Levels item in the sidebar once the feature is generally available. Stay tuned for updates.