Blob Storage Logs
Ingesting Azure Blob Storage Logs with KubeSense
KubeSense aggregator supports ingesting logs from Azure Blob Storage by using Azure Event Grid to trigger events when new blobs are created. These events are routed to Event Hub (with Kafka protocol) or an HTTP endpoint, which the aggregator consumes. This enables real-time ingestion of archived logs, log rehydration, and migration from legacy systems.
Note: Blob Storage log ingestion uses Azure Event Grid to trigger on blob creation, then routes to Event Hub (Kafka) or HTTP endpoint. The KubeSense aggregator consumes from Event Hub using Kafka source or HTTP server source.
Prerequisites
Before you begin, ensure you have:
- Azure Storage Account with blob containers containing log files
- Event Hub namespace with Kafka protocol enabled (for Event Hub method)
- Azure Event Grid enabled
- KubeSense aggregator deployed and accessible
- Appropriate Azure permissions to configure Event Grid, Event Hub, and storage
Supported Log Formats
The KubeSense aggregator can ingest logs from Blob Storage in various formats:
- JSON logs - Structured JSON log files
- Text logs - Plain text log files
- Multi-line logs - Logs spanning multiple lines
- Compressed logs - GZIP, BZIP2 compressed files
- Azure Diagnostic logs - Exported Azure diagnostic logs
- Log Analytics exports - Exported Log Analytics data
- VPC Flow Logs - Azure NSG Flow Logs format
Architecture
Blob Storage logs can be ingested via two methods:
- Event Grid → Event Hub (Kafka) → KubeSense Aggregator (Real-time via Kafka)
- Event Grid → HTTP Endpoint → KubeSense Aggregator (Real-time via HTTP)
Method 1: Event Grid → Event Hub (Kafka) - Recommended
Step 1: Create Event Hub with Kafka Protocol
# Create Event Hub namespace with Kafka enabled
az eventhubs namespace create \
--resource-group kubesense-rg \
--name blob-logs-namespace \
--location eastus \
--sku Standard \
--enable-kafka true
# Create Event Hub
az eventhubs eventhub create \
--resource-group kubesense-rg \
--namespace-name blob-logs-namespace \
--name blob-notifications \
--message-retention 7 \
--partition-count 4Step 2: Create Event Grid Subscription
Create an Event Grid subscription to route blob creation events to Event Hub:
# Get storage account resource ID
STORAGE_ACCOUNT_ID=$(az storage account show \
--resource-group kubesense-rg \
--name mylogsstorage \
--query id -o tsv)
# Get Event Hub resource ID
EVENT_HUB_ID=$(az eventhubs eventhub show \
--resource-group kubesense-rg \
--namespace-name blob-logs-namespace \
--name blob-notifications \
--query id -o tsv)
# Create Event Grid subscription
az eventgrid event-subscription create \
--name blob-created-subscription \
--source-resource-id $STORAGE_ACCOUNT_ID \
--endpoint-type eventhub \
--endpoint $EVENT_HUB_ID \
--included-event-types Microsoft.Storage.BlobCreatedStep 3: Configure KubeSense Aggregator
Configure the aggregator to consume from Event Hub using Kafka source:
aggregator:
customSources:
enabled: true
sources:
blob_storage_logs:
type: kafka
bootstrap_servers: "<NAMESPACE>.servicebus.windows.net:9093"
topics:
- blob-notifications
group_id: blob-consumer
auth:
sasl:
mechanism: PLAIN
username: "$ConnectionString"
password: "<EVENT_HUB_CONNECTION_STRING>"
tls:
enabled: true
verify_certificate: true
verify_hostname: trueMethod 2: Event Grid → HTTP Endpoint
Step 1: Create Event Grid Subscription to HTTP Endpoint
Create an Event Grid subscription that sends blob events to an HTTP endpoint:
# Create Event Grid subscription with webhook endpoint
az eventgrid event-subscription create \
--name blob-http-subscription \
--source-resource-id $STORAGE_ACCOUNT_ID \
--endpoint-type webhook \
--endpoint https://<KUBESENSE_AGGREGATOR_HOST>:30052/blob-events \
--included-event-types Microsoft.Storage.BlobCreatedStep 2: Configure KubeSense Aggregator HTTP Server
Configure the aggregator to receive HTTP requests from Event Grid:
aggregator:
customSources:
enabled: true
sources:
blob_storage_http:
type: http_server
address: 0.0.0.0:30052
decoding:
codec: json
framing:
method: newline_delimitedStorage Account Permissions
Create a storage account access key or use Managed Identity:
Using Access Key
# Get storage account connection string
az storage account show-connection-string \
--resource-group kubesense-rg \
--name mylogsstorage \
--query connectionString -o tsvHow It Works
- Blob Creation: When a blob is created in the storage account, Event Grid triggers an event
- Event Grid Routing: Event Grid routes the event to Event Hub or HTTP endpoint
- Aggregator Consumption: The KubeSense aggregator consumes from Event Hub (via Kafka) or receives HTTP requests
- File Processing: The aggregator can then fetch the blob from storage using the information in the event and process it
Storage Account Permissions for Event Grid
Event Grid needs permission to read storage account events. This is automatically configured when creating Event Grid subscriptions.
# Get storage account resource ID
STORAGE_ACCOUNT_ID=$(az storage account show \
--resource-group kubesense-rg \
--name mylogsstorage \
--query id -o tsv)
# Assign role to managed identity
az role assignment create \
--assignee <MANAGED_IDENTITY_CLIENT_ID> \
--role "Storage Blob Data Reader" \
--scope $STORAGE_ACCOUNT_IDUsing Managed Identity (AKS)
You can use Managed Identity for Event Hub authentication. The Kafka source still requires connection strings, but you can use managed identity-based connection strings:
aggregator:
customSources:
enabled: true
sources:
blob_storage_logs:
type: kafka
bootstrap_servers: "<NAMESPACE>.servicebus.windows.net:9093"
topics:
- blob-notifications
group_id: blob-consumer
auth:
sasl:
mechanism: PLAIN
username: "$ConnectionString"
password: "<EVENT_HUB_CONNECTION_STRING_WITH_MANAGED_IDENTITY>"
tls:
enabled: trueMonitoring and Verification
After configuring Blob Storage ingestion:
- Check Event Grid subscriptions: Verify Event Grid subscriptions are active and delivering events
- Monitor Event Hub: Check Event Hub metrics for message delivery (if using Kafka method)
- Check aggregator logs: Verify the aggregator is consuming from Event Hub or receiving HTTP requests
- Verify log ingestion: Check the KubeSense dashboard for logs from Blob Storage sources
- Monitor processing: Track the number of files processed and any errors
- Check Event Grid delivery: Monitor Event Grid subscription delivery metrics
Troubleshooting
Logs Not Appearing
- Verify Event Grid subscription: Check that Event Grid subscription is active and configured correctly
- Check Event Hub: Verify events are being published to Event Hub (if using Kafka method)
- Verify Kafka protocol: Ensure Event Hub namespace has Kafka protocol enabled
- Check bootstrap servers: Verify the bootstrap server address is correct
- Verify connection string: Ensure Event Hub connection string is correct and has read permissions
- Check topic name: Verify the topic name matches the Event Hub name
- Review aggregator logs: Check for Kafka connection or HTTP server errors
- Check Event Grid delivery: Monitor Event Grid subscription delivery failures
Performance Issues
- Use Event Grid filters: Configure Event Grid with subject filters to reduce event volume
- Scale Event Hub: Increase Event Hub throughput units if needed
- Enable compression: Use compressed files to reduce transfer time
- Batch processing: Process files in batches for better performance
- Monitor consumer lag: Check Kafka consumer group lag metrics
Best Practices
- Use Event Grid filters: Configure Event Grid with subject filters to only process relevant blobs
- Organize by container/prefix: Structure Blob Storage with logical containers and prefixes for easier Event Grid filtering
- Compress logs: Use GZIP compression to reduce storage and transfer costs
- Monitor costs: Track Event Grid, Event Hub, Blob Storage, and data transfer costs
- Set retention: Configure Blob Storage lifecycle policies to manage log retention
- Use separate Event Hubs: Create separate Event Hubs for different log types or environments
- Monitor Event Grid: Set up alerts for Event Grid delivery failures
- Use appropriate storage tier: Use appropriate storage tier (Hot, Cool, Archive) for cost optimization
- Enable Kafka protocol: Ensure Event Hub namespace has Kafka protocol enabled for better performance
Cost Considerations
- Blob Storage: Charged per GB stored (varies by storage tier)
- Event Grid: Charged per million events (first 100K events/month free)
- Event Hub: Charged per million events and storage (if using Kafka method)
- Blob operations: Charged per API call when fetching blobs
- Data transfer: Consider data transfer costs when fetching blobs from storage
- Processing: Aggregator processing resources
Advanced Configuration
Multiple Storage Accounts
Configure multiple Blob Storage accounts with separate Event Hubs:
aggregator:
customSources:
enabled: true
sources:
production_logs:
type: kafka
bootstrap_servers: "<NAMESPACE>.servicebus.windows.net:9093"
topics:
- prod-logs-notifications
group_id: prod-consumer
auth:
sasl:
mechanism: PLAIN
username: "$ConnectionString"
password: "<EVENT_HUB_CONNECTION_STRING>"
tls:
enabled: true
development_logs:
type: kafka
bootstrap_servers: "<NAMESPACE>.servicebus.windows.net:9093"
topics:
- dev-logs-notifications
group_id: dev-consumer
auth:
sasl:
mechanism: PLAIN
username: "$ConnectionString"
password: "<EVENT_HUB_CONNECTION_STRING>"
tls:
enabled: trueCustom Log Parsing
Configure custom parsing for specific log formats using transforms (configured separately):
aggregator:
customSources:
enabled: true
sources:
custom_logs:
type: kafka
bootstrap_servers: "<NAMESPACE>.servicebus.windows.net:9093"
topics:
- custom-logs-notifications
group_id: custom-consumer
auth:
sasl:
mechanism: PLAIN
username: "$ConnectionString"
password: "<EVENT_HUB_CONNECTION_STRING>"
tls:
enabled: trueConclusion
Blob Storage log ingestion enables KubeSense to import historical logs, perform backfills, and analyze archived data alongside real-time log streams. This provides comprehensive observability across your entire log history in Azure.