Service Levels

Service Levels is an upcoming feature. The functionality described below reflects the planned capabilities.

Service Levels allows you to define, track, and monitor Service Level Objectives (SLOs) for your applications and infrastructure, ensuring you meet reliability targets.

Overview

Service Level management is a core practice for SRE teams. KubeSense Service Levels helps you:

Define SLOs — Set reliability targets for your services (e.g., 99.9% availability, P95 latency < 200ms)
Track error budgets — Monitor how much of your error budget has been consumed
Burn rate alerts — Get notified when your error budget is being consumed too quickly
Historical SLO reporting — Review SLO compliance over time

Key Concepts

Service Level Indicator (SLI)

A quantitative measure of a service's behavior. Common SLIs include:

Availability — Proportion of successful requests
Latency — Proportion of requests faster than a threshold
Throughput — Request rate staying within expected bounds
Error rate — Proportion of requests that result in errors

Service Level Objective (SLO)

A target value for an SLI over a time window. For example:

"99.9% of requests succeed over a 30-day rolling window"
"95% of requests complete in under 200ms over a 7-day window"

Error Budget

The allowed amount of unreliability, derived from the SLO. For a 99.9% availability SLO:

Error budget = 0.1% of total requests
Over 30 days, that's approximately 43.2 minutes of downtime

Planned Capabilities

SLO Configuration

Define SLOs using:

Count-based SLOs — Good events / total events (e.g., successful requests / total requests)
Time-based SLOs — Good time slices / total time slices (e.g., minutes where latency < threshold)

Error Budget Tracking

Remaining budget — Percentage of error budget remaining
Burn rate — Rate at which error budget is being consumed
Budget forecast — Projected budget exhaustion date based on current burn rate

SLO Dashboard

Per-service SLO status at a glance
Error budget consumption trends
SLO compliance history (7-day, 30-day, 90-day windows)

Burn Rate Alerts

Automatic alerting when error budget consumption exceeds expected rates:

Fast burn — Immediate notification for rapid budget consumption
Slow burn — Warning when steady degradation will exhaust the budget within the window

Getting Started

Service Levels will be accessible from the Service Levels item in the sidebar once the feature is generally available. Stay tuned for updates.

Service Levels

ON THIS PAGE