Architecture Overview

Kubesense Architecture

Sensor Components

KubeSensor

The KubeSensor is an advanced eBPF (Extended Berkeley Packet Filter) sensor. Its primary role is to collect trace data directly from the kernel level across all nodes in the cluster/host machines. By utilizing advanced kernel-level monitoring techniques, KubeSensor provides low-overhead, real-time observability into system and application activity, capturing network events, system calls, and process-level behavior.

Kernel-Level Observability:
- Captures low-level events such as system calls, network activity, and process execution directly from the kernel.
- Provides insights into system behavior without requiring modifications to applications.
Trace Data Collection:
- Captures detailed trace data for distributed systems monitoring and debugging.
- Tracks end-to-end request flows across services and components.

LogSensor

The LogSensor is a lightweight and highly efficient component designed to read log data from nodes or hosts in a Kubernetes cluster or standalone environments. It acts as a local agent on each node, collecting logs generated by applications, containers, and the operating system. Once collected, the logs are seamlessly pushed to the Log Aggregator for further processing, transformation, and storage.

LogSensor is optimized for low resource usage, ensuring that it can operate efficiently in environments with constrained system resources.

Log Detection -Automatically detects and monitors log files or sources based on predefined configurations or auto-discovery rules.
Source Agnostic - supports multiple filesystems, custom log paths.
Forwards log data to the Log Aggregator for transformation and storage.
Fault Tolerant - Implements retry mechanisms to ensure logs are not lost during network or aggregator outages.

Metrics Scraper

Metrics Scraper is responsible for collecting, processing, and ingesting infrastructure and application metrics into the Metrics Store. It ensures efficient and accurate collection of real-time metrics from various sources such as Kubernetes clusters, Docker containers, cloud providers (e.g., AWS, GCP), and application services. By scraping, transforming, and ingesting metrics, this component enables real-time monitoring, alerting, and performance analysis. Periodically collects metrics from various sources, including:

Kubernetes metrics APIs.
Docker container metrics.
Process metrics.
Cloud provider monitoring services (e.g., AWS CloudWatch, GCP Monitoring).
Application-level metrics (e.g., Prometheus exporters, custom endpoints).

Datastore

The Data Store is a specialized storage system designed to handle logs and trace data in a highly efficient and scalable manner. It serves as the primary repository for collected data, ensuring it is readily available for querying, analysis, and visualization. The Data Store is optimized for performance, supporting fast query execution, data compression, tiered storage management, and high availability to meet the demands of modern observability systems.

Log & trace Storage:
- Supports structured and unstructured log formats.
- Stores detailed traces, including spans, metadata, and relationships between distributed system components.
- Supports correlation of logs and traces for end-to-end debugging.
Data Compression:
- Industry-leading compression algorithms for log and trace data.
- Smart metadata management and correlation for efficient storage and optimal performance.
Query Optimization:
- Full-text search capabilities for logs.
- Aggregation queries for trace analysis.
- Range queries for filtering by time, severity, or other attributes.
Replication and Redundancy:
- Implements replication strategies to ensure data availability across nodes.
- Provides failover mechanisms to maintain uptime during outages.
Data Lifecycle Policies:
- Automatically migrates data to cold or archival storage based on predefined retention periods.
- Querying can be performed on single or multiple tiers.
- Ensures compliance with organizational data retention requirements.

Metrics Store

The Metrics Store is a specialized storage solution designed to efficiently handle time-series metrics data collected from the system's infrastructure and applications. It is optimized for high performance, ensuring fast ingestion, low-latency querying, and long-term storage of large-scale metrics data. This component plays a critical role in enabling real-time monitoring, historical trend analysis, and performance diagnostics.

Efficient Time-Series Data Storage:
- Purpose-built for time-series data with specialized indexing and compression.
- Optimized for high write and read throughput to support large volumes of incoming metrics.
Distributed Querying Layer:
- Intelligently identifies appropriate tables or views for querying.
- Ensures optimal performance with minimal compute and memory resource usage.
- Reduces query latencies by over 99%.

Log Aggregator

The Log Aggregator is a crucial component in the KubeSense architecture responsible for managing log streams. It performs three main tasks: log parsing and transformation, log enrichment, and efficient handling of high-throughput log data. By preprocessing logs before they are stored in the Data Store, the Log Aggregator ensures that logs are structured, enriched, and optimized for querying and analysis.

Log enrichment with infrastructure metadata.
Applies transformations, such as filtering, reformatting, and normalization, to standardize logs.
Redaction of sensitive data.
Replacement and extraction of key information.
Blocking or removal of unwanted data.

Kubecol

The Kubecol is a core component in the KubeSense system that processes and enhances trace data. It is designed to enrich traces with infrastructure metadata, apply auto-tagging for better organization and querying, and support infrastructure monitoring by integrating with APIs from Kubernetes, Docker, AWS, and other platforms. The Kubecol also efficiently batches trace data for storage and querying, ensuring scalability and performance.

Trace enrichment with tags and metadata.
Redaction and filtering of trace data.
Infrastructure monitoring.

Kubecol DB

It is primarily used to store relational and mutable data, such as configuration settings, metadata for infrastructure components, and supplementary data to support efficient querying in the Data Store.

The KubecolDB ensures that configuration data is consistent, highly available, and easy to update, while also playing a critical role in enhancing the querying efficiency for logs and metrics by providing infrastructure metadata.

Alert Engine

The Alert Engine allows users to configure and manage alerts based on predefined thresholds and conditions. Forwards alerts to configured contact points - Email, Slack, PagerDuty, etc.

Customizable Thresholds: Define specific conditions for metrics like CPU usage, memory consumption, or disk IO.
Notification Channels: Supports email, Slack, and webhook integrations for alert delivery.
Historical Alert Data: Provides insights into past alerts for proactive monitoring and adjustments.

User Interface

The KubeSense User Interface (UI) is the central visualization and interaction layer of the KubeSense system. It enables users to monitor, manage, and interact with the data collected by the underlying infrastructure, offering a streamlined experience for observing logs, traces, metrics, and system-generated alerts.

API Service

The API Service serves as the central access point for all data processing and communication between the KubeSense User Interface (UI) and the backend infrastructure. It is responsible for receiving and processing requests from the UI, querying the necessary data from the Data Store and Metrics Store, and sending the appropriate responses back to the UI in a structured and timely manner.

The API Service acts as a middleware layer that abstracts the complexities of backend systems, ensuring a seamless and efficient interaction between the UI and the underlying data sources.

KubeSense AI ✨

KubeSense AI leverages advanced artificial intelligence and machine learning techniques to enhance observability, automate root cause analysis (RCA), and improve operational efficiency. It integrates seamlessly with other components to provide AI-powered insights and tools such as AI RCA, AI Alert Analytics, a DevOps Chatbot, and Crash RCA for handling system anomalies, reducing downtime, and streamlining DevOps workflows.

AI Root Cause Analysis (RCA):
- Automatically identifies the root cause of system issues, crashes or anomalies.
- Correlates logs, traces, and metrics to detect patterns and pinpoint failure sources.
- Reduces mean time to resolution (MTTR) by providing actionable insights.
DevOps Bot:
- Allows querying telemetry data in natural language.
- Simplifies complex DevOps tasks by providing instant answers for metrics and other queries.

Sensor Components​

KubeSensor​

LogSensor​

Metrics Scraper​

Datastore​

Metrics Store​

Log Aggregator​

Kubecol​

Kubecol DB​

Alert Engine​

User Interface​

API Service​

KubeSense AI ✨​