Kubesense

Blob Storage Logs

Ingesting Azure Blob Storage Logs with KubeSense

KubeSense aggregator supports ingesting logs from Azure Blob Storage by using Azure Event Grid to trigger events when new blobs are created. These events are routed to Event Hub (with Kafka protocol) or an HTTP endpoint, which the aggregator consumes. This enables real-time ingestion of archived logs, log rehydration, and migration from legacy systems.

Note: Blob Storage log ingestion uses Azure Event Grid to trigger on blob creation, then routes to Event Hub (Kafka) or HTTP endpoint. The KubeSense aggregator consumes from Event Hub using Kafka source or HTTP server source.

Prerequisites

Before you begin, ensure you have:

  1. Azure Storage Account with blob containers containing log files
  2. Event Hub namespace with Kafka protocol enabled (for Event Hub method)
  3. Azure Event Grid enabled
  4. KubeSense aggregator deployed and accessible
  5. Appropriate Azure permissions to configure Event Grid, Event Hub, and storage

Supported Log Formats

The KubeSense aggregator can ingest logs from Blob Storage in various formats:

  • JSON logs - Structured JSON log files
  • Text logs - Plain text log files
  • Multi-line logs - Logs spanning multiple lines
  • Compressed logs - GZIP, BZIP2 compressed files
  • Azure Diagnostic logs - Exported Azure diagnostic logs
  • Log Analytics exports - Exported Log Analytics data
  • VPC Flow Logs - Azure NSG Flow Logs format

Architecture

Blob Storage logs can be ingested via two methods:

  1. Event Grid → Event Hub (Kafka) → KubeSense Aggregator (Real-time via Kafka)
  2. Event Grid → HTTP Endpoint → KubeSense Aggregator (Real-time via HTTP)

Step 1: Create Event Hub with Kafka Protocol

# Create Event Hub namespace with Kafka enabled
az eventhubs namespace create \
  --resource-group kubesense-rg \
  --name blob-logs-namespace \
  --location eastus \
  --sku Standard \
  --enable-kafka true

# Create Event Hub
az eventhubs eventhub create \
  --resource-group kubesense-rg \
  --namespace-name blob-logs-namespace \
  --name blob-notifications \
  --message-retention 7 \
  --partition-count 4

Step 2: Create Event Grid Subscription

Create an Event Grid subscription to route blob creation events to Event Hub:

# Get storage account resource ID
STORAGE_ACCOUNT_ID=$(az storage account show \
  --resource-group kubesense-rg \
  --name mylogsstorage \
  --query id -o tsv)

# Get Event Hub resource ID
EVENT_HUB_ID=$(az eventhubs eventhub show \
  --resource-group kubesense-rg \
  --namespace-name blob-logs-namespace \
  --name blob-notifications \
  --query id -o tsv)

# Create Event Grid subscription
az eventgrid event-subscription create \
  --name blob-created-subscription \
  --source-resource-id $STORAGE_ACCOUNT_ID \
  --endpoint-type eventhub \
  --endpoint $EVENT_HUB_ID \
  --included-event-types Microsoft.Storage.BlobCreated

Step 3: Configure KubeSense Aggregator

Configure the aggregator to consume from Event Hub using Kafka source:

aggregator:
  customSources:
    enabled: true
    sources:
      blob_storage_logs:
        type: kafka
        bootstrap_servers: "<NAMESPACE>.servicebus.windows.net:9093"
        topics:
          - blob-notifications
        group_id: blob-consumer
        auth:
          sasl:
            mechanism: PLAIN
            username: "$ConnectionString"
            password: "<EVENT_HUB_CONNECTION_STRING>"
        tls:
          enabled: true
          verify_certificate: true
          verify_hostname: true

Method 2: Event Grid → HTTP Endpoint

Step 1: Create Event Grid Subscription to HTTP Endpoint

Create an Event Grid subscription that sends blob events to an HTTP endpoint:

# Create Event Grid subscription with webhook endpoint
az eventgrid event-subscription create \
  --name blob-http-subscription \
  --source-resource-id $STORAGE_ACCOUNT_ID \
  --endpoint-type webhook \
  --endpoint https://<KUBESENSE_AGGREGATOR_HOST>:30052/blob-events \
  --included-event-types Microsoft.Storage.BlobCreated

Step 2: Configure KubeSense Aggregator HTTP Server

Configure the aggregator to receive HTTP requests from Event Grid:

aggregator:
  customSources:
    enabled: true
    sources:
      blob_storage_http:
        type: http_server
        address: 0.0.0.0:30052
        decoding:
          codec: json
        framing:
          method: newline_delimited

Storage Account Permissions

Create a storage account access key or use Managed Identity:

Using Access Key

# Get storage account connection string
az storage account show-connection-string \
  --resource-group kubesense-rg \
  --name mylogsstorage \
  --query connectionString -o tsv

How It Works

  1. Blob Creation: When a blob is created in the storage account, Event Grid triggers an event
  2. Event Grid Routing: Event Grid routes the event to Event Hub or HTTP endpoint
  3. Aggregator Consumption: The KubeSense aggregator consumes from Event Hub (via Kafka) or receives HTTP requests
  4. File Processing: The aggregator can then fetch the blob from storage using the information in the event and process it

Storage Account Permissions for Event Grid

Event Grid needs permission to read storage account events. This is automatically configured when creating Event Grid subscriptions.

# Get storage account resource ID
STORAGE_ACCOUNT_ID=$(az storage account show \
  --resource-group kubesense-rg \
  --name mylogsstorage \
  --query id -o tsv)

# Assign role to managed identity
az role assignment create \
  --assignee <MANAGED_IDENTITY_CLIENT_ID> \
  --role "Storage Blob Data Reader" \
  --scope $STORAGE_ACCOUNT_ID

Using Managed Identity (AKS)

You can use Managed Identity for Event Hub authentication. The Kafka source still requires connection strings, but you can use managed identity-based connection strings:

aggregator:
  customSources:
    enabled: true
    sources:
      blob_storage_logs:
        type: kafka
        bootstrap_servers: "<NAMESPACE>.servicebus.windows.net:9093"
        topics:
          - blob-notifications
        group_id: blob-consumer
        auth:
          sasl:
            mechanism: PLAIN
            username: "$ConnectionString"
            password: "<EVENT_HUB_CONNECTION_STRING_WITH_MANAGED_IDENTITY>"
        tls:
          enabled: true

Monitoring and Verification

After configuring Blob Storage ingestion:

  1. Check Event Grid subscriptions: Verify Event Grid subscriptions are active and delivering events
  2. Monitor Event Hub: Check Event Hub metrics for message delivery (if using Kafka method)
  3. Check aggregator logs: Verify the aggregator is consuming from Event Hub or receiving HTTP requests
  4. Verify log ingestion: Check the KubeSense dashboard for logs from Blob Storage sources
  5. Monitor processing: Track the number of files processed and any errors
  6. Check Event Grid delivery: Monitor Event Grid subscription delivery metrics

Troubleshooting

Logs Not Appearing

  1. Verify Event Grid subscription: Check that Event Grid subscription is active and configured correctly
  2. Check Event Hub: Verify events are being published to Event Hub (if using Kafka method)
  3. Verify Kafka protocol: Ensure Event Hub namespace has Kafka protocol enabled
  4. Check bootstrap servers: Verify the bootstrap server address is correct
  5. Verify connection string: Ensure Event Hub connection string is correct and has read permissions
  6. Check topic name: Verify the topic name matches the Event Hub name
  7. Review aggregator logs: Check for Kafka connection or HTTP server errors
  8. Check Event Grid delivery: Monitor Event Grid subscription delivery failures

Performance Issues

  1. Use Event Grid filters: Configure Event Grid with subject filters to reduce event volume
  2. Scale Event Hub: Increase Event Hub throughput units if needed
  3. Enable compression: Use compressed files to reduce transfer time
  4. Batch processing: Process files in batches for better performance
  5. Monitor consumer lag: Check Kafka consumer group lag metrics

Best Practices

  • Use Event Grid filters: Configure Event Grid with subject filters to only process relevant blobs
  • Organize by container/prefix: Structure Blob Storage with logical containers and prefixes for easier Event Grid filtering
  • Compress logs: Use GZIP compression to reduce storage and transfer costs
  • Monitor costs: Track Event Grid, Event Hub, Blob Storage, and data transfer costs
  • Set retention: Configure Blob Storage lifecycle policies to manage log retention
  • Use separate Event Hubs: Create separate Event Hubs for different log types or environments
  • Monitor Event Grid: Set up alerts for Event Grid delivery failures
  • Use appropriate storage tier: Use appropriate storage tier (Hot, Cool, Archive) for cost optimization
  • Enable Kafka protocol: Ensure Event Hub namespace has Kafka protocol enabled for better performance

Cost Considerations

  • Blob Storage: Charged per GB stored (varies by storage tier)
  • Event Grid: Charged per million events (first 100K events/month free)
  • Event Hub: Charged per million events and storage (if using Kafka method)
  • Blob operations: Charged per API call when fetching blobs
  • Data transfer: Consider data transfer costs when fetching blobs from storage
  • Processing: Aggregator processing resources

Advanced Configuration

Multiple Storage Accounts

Configure multiple Blob Storage accounts with separate Event Hubs:

aggregator:
  customSources:
    enabled: true
    sources:
      production_logs:
        type: kafka
        bootstrap_servers: "<NAMESPACE>.servicebus.windows.net:9093"
        topics:
          - prod-logs-notifications
        group_id: prod-consumer
        auth:
          sasl:
            mechanism: PLAIN
            username: "$ConnectionString"
            password: "<EVENT_HUB_CONNECTION_STRING>"
        tls:
          enabled: true
      development_logs:
        type: kafka
        bootstrap_servers: "<NAMESPACE>.servicebus.windows.net:9093"
        topics:
          - dev-logs-notifications
        group_id: dev-consumer
        auth:
          sasl:
            mechanism: PLAIN
            username: "$ConnectionString"
            password: "<EVENT_HUB_CONNECTION_STRING>"
        tls:
          enabled: true

Custom Log Parsing

Configure custom parsing for specific log formats using transforms (configured separately):

aggregator:
  customSources:
    enabled: true
    sources:
      custom_logs:
        type: kafka
        bootstrap_servers: "<NAMESPACE>.servicebus.windows.net:9093"
        topics:
          - custom-logs-notifications
        group_id: custom-consumer
        auth:
          sasl:
            mechanism: PLAIN
            username: "$ConnectionString"
            password: "<EVENT_HUB_CONNECTION_STRING>"
        tls:
          enabled: true

Conclusion

Blob Storage log ingestion enables KubeSense to import historical logs, perform backfills, and analyze archived data alongside real-time log streams. This provides comprehensive observability across your entire log history in Azure.