GCS Log Archives
Ingesting GCS Log Archives with KubeSense
KubeSense aggregator supports ingesting logs from Google Cloud Storage (GCS) buckets by using Pub/Sub notifications. When new files are added to your GCS bucket, notifications are sent to a Pub/Sub topic, which the aggregator consumes. This enables real-time ingestion of archived logs, log rehydration, and migration from legacy systems.
Note: GCS log ingestion uses GCS bucket notifications to Pub/Sub, then the KubeSense aggregator consumes from Pub/Sub. This provides real-time ingestion as files are added to the bucket.
Prerequisites
Before you begin, ensure you have:
- GCS bucket containing log files
- Pub/Sub API enabled in your GCP project
- GCP service account with permissions to read GCS and consume Pub/Sub
- KubeSense aggregator deployed and accessible
- Appropriate permissions to configure the aggregator
Supported Log Formats
The KubeSense aggregator can ingest logs from GCS in various formats:
- JSON logs - Structured JSON log files
- Text logs - Plain text log files
- Multi-line logs - Logs spanning multiple lines
- Compressed logs - GZIP, BZIP2 compressed files
- Cloud Logging export format - Exported Cloud Logging logs
- VPC Flow Logs - GCP VPC Flow Log format
- Load balancer logs - GCP load balancer access logs
Step 1: Create Pub/Sub Topic and Subscription
Create a Pub/Sub topic to receive GCS bucket notifications:
# Create Pub/Sub topic
gcloud pubsub topics create gcs-logs-notifications \
--project=YOUR_PROJECT_ID
# Create subscription
gcloud pubsub subscriptions create gcs-logs-subscription \
--topic=gcs-logs-notifications \
--project=YOUR_PROJECT_IDStep 2: Configure GCS Bucket Notifications
Configure your GCS bucket to send notifications to Pub/Sub when files are added:
# Enable notifications for the bucket
gsutil notification create -t gcs-logs-notifications \
-f json \
-e OBJECT_FINALIZE \
gs://my-logs-bucket
# For specific prefix (optional)
gsutil notification create -t gcs-logs-notifications \
-f json \
-e OBJECT_FINALIZE \
-p logs/2024/ \
gs://my-logs-bucketNotification Event Types
OBJECT_FINALIZE- Triggered when a new object is created (recommended)OBJECT_METADATA_UPDATE- Triggered when object metadata is updatedOBJECT_DELETE- Triggered when an object is deletedOBJECT_ARCHIVE- Triggered when an object is archived (for Archive storage class)
Step 3: Configure KubeSense Aggregator
Configure the aggregator to consume from the Pub/Sub subscription:
aggregator:
customSources:
enabled: true
sources:
gcs_logs:
type: gcp_pubsub
project: YOUR_PROJECT_ID
subscription: projects/YOUR_PROJECT_ID/subscriptions/gcs-logs-subscription
credentials_path: /etc/kubesense/gcs-key.jsonService Account Permissions
Create a service account with the following IAM roles:
# Create service account
gcloud iam service-accounts create kubesense-gcs-reader \
--display-name="KubeSense GCS Reader" \
--project=YOUR_PROJECT_ID
# Grant Storage Object Viewer role (to read GCS objects)
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
--member="serviceAccount:kubesense-gcs-reader@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/storage.objectViewer"
# Grant Pub/Sub Subscriber role (to consume from Pub/Sub)
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
--member="serviceAccount:kubesense-gcs-reader@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/pubsub.subscriber"
# Grant Pub/Sub Viewer role (to list subscriptions)
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
--member="serviceAccount:kubesense-gcs-reader@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/pubsub.viewer"
# Create and download key
gcloud iam service-accounts keys create kubesense-gcs-key.json \
--iam-account=kubesense-gcs-reader@YOUR_PROJECT_ID.iam.gserviceaccount.com \
--project=YOUR_PROJECT_IDGrant Permissions for GCS Notifications
The bucket needs permission to publish to Pub/Sub:
# Grant Pub/Sub Publisher role to the Cloud Storage service account
gcloud pubsub topics add-iam-policy-binding gcs-logs-notifications \
--member="serviceAccount:service-$(gcloud projects describe YOUR_PROJECT_ID --format='value(projectNumber)')@gs-project-accounts.iam.gserviceaccount.com" \
--role="roles/pubsub.publisher" \
--project=YOUR_PROJECT_IDUse Cases
Historical Log Import
For historical logs, configure notifications and the aggregator will process files as they're added:
aggregator:
customSources:
enabled: true
sources:
historical_logs:
type: gcp_pubsub
project: YOUR_PROJECT_ID
subscription: projects/YOUR_PROJECT_ID/subscriptions/historical-logs-subscription
credentials_path: /etc/kubesense/gcs-key.jsonSet up the bucket notification with a prefix filter:
gsutil notification create -t historical-logs-notifications \
-f json \
-e OBJECT_FINALIZE \
-p archive/2023/ \
gs://historical-logsVPC Flow Logs from GCS
Ingest VPC Flow Logs stored in GCS:
aggregator:
customSources:
enabled: true
sources:
vpc_flow_logs:
type: gcp_pubsub
project: YOUR_PROJECT_ID
subscription: projects/YOUR_PROJECT_ID/subscriptions/vpc-flow-logs-subscription
credentials_path: /etc/kubesense/gcs-key.jsonSet up bucket notification:
gsutil notification create -t vpc-flow-logs-notifications \
-f json \
-e OBJECT_FINALIZE \
-p vpc-flow-logs/ \
gs://vpc-flow-logs-bucketCloud Logging Exports
Import exported Cloud Logging logs:
aggregator:
customSources:
enabled: true
sources:
cloud_logging_exports:
type: gcp_pubsub
project: YOUR_PROJECT_ID
subscription: projects/YOUR_PROJECT_ID/subscriptions/cloud-logging-exports-subscription
credentials_path: /etc/kubesense/gcs-key.jsonSet up bucket notification:
gsutil notification create -t cloud-logging-exports-notifications \
-f json \
-e OBJECT_FINALIZE \
-p exports/ \
gs://cloud-logging-exportsBackfill Jobs
For backfill operations, you can trigger notifications for existing files or copy files to trigger new notifications:
aggregator:
customSources:
enabled: true
sources:
backfill_logs:
type: gcp_pubsub
project: YOUR_PROJECT_ID
subscription: projects/YOUR_PROJECT_ID/subscriptions/backfill-logs-subscription
credentials_path: /etc/kubesense/gcs-key.jsonTo trigger notifications for existing files, you can use gsutil to touch files or copy them:
# Option 1: Copy files to trigger notifications
gsutil -m cp gs://backfill-bucket/logs/*.log gs://backfill-bucket/logs/processed/
# Option 2: Use gsutil to list and process (for manual backfill)
gsutil ls gs://backfill-bucket/logs/ | while read file; do
# Process file manually or trigger notification
doneUsing Workload Identity (GKE)
If running on GKE, you can use Workload Identity instead of service account keys:
Step 1: Enable Workload Identity
# Enable Workload Identity on GKE cluster
gcloud container clusters update CLUSTER_NAME \
--workload-pool=YOUR_PROJECT_ID.svc.id.goog \
--zone=ZONEStep 2: Create Kubernetes Service Account
apiVersion: v1
kind: ServiceAccount
metadata:
name: kubesense-aggregator
namespace: kubesense
annotations:
iam.gke.io/gcp-service-account: kubesense-gcs-reader@YOUR_PROJECT_ID.iam.gserviceaccount.comStep 3: Configure Aggregator
aggregator:
customSources:
enabled: true
sources:
gcs_logs:
type: gcp_pubsub
project: YOUR_PROJECT_ID
subscription: projects/YOUR_PROJECT_ID/subscriptions/gcs-logs-subscription
credentials_path: /etc/kubesense/gcs-key.json
# Workload Identity is handled via service account annotation in KubernetesHow It Works
- File Upload: When a file is uploaded to the GCS bucket, GCS sends a notification to the Pub/Sub topic
- Pub/Sub Message: The notification contains metadata about the object (bucket name, object name, size, etc.)
- Aggregator Consumption: The KubeSense aggregator consumes messages from the Pub/Sub subscription
- File Processing: The aggregator can then fetch the file from GCS using the information in the Pub/Sub message and process it
Monitoring and Verification
After configuring GCS ingestion:
- Check bucket notifications: Verify notifications are configured:
gsutil notification list gs://my-logs-bucket - Monitor Pub/Sub: Check Pub/Sub metrics for message delivery
- Check aggregator logs: Verify the aggregator is consuming from Pub/Sub successfully
- Verify log ingestion: Check the KubeSense dashboard for logs from GCS sources
- Monitor processing: Track the number of files processed and any errors
Troubleshooting
Logs Not Appearing
- Verify bucket notifications: Check that notifications are configured:
gsutil notification list gs://my-logs-bucket - Check Pub/Sub topic: Verify messages are being published to the Pub/Sub topic
- Verify subscription: Ensure the subscription exists and is active
- Check service account permissions: Ensure the service account has Pub/Sub Subscriber and Storage Object Viewer roles
- Verify credentials: Ensure service account key is valid and accessible
- Review aggregator logs: Check for Pub/Sub consumption errors or parsing issues
- Check notification permissions: Verify GCS service account has Pub/Sub Publisher role on the topic
Performance Issues
- Use prefixes: Configure notifications with specific prefixes to filter relevant files
- Enable compression: Use compressed files to reduce transfer time
- Batch processing: Process files in batches for better performance
- Scale Pub/Sub: Increase Pub/Sub throughput units if needed
- Monitor message backlog: Check Pub/Sub subscription metrics for message accumulation
Best Practices
- Use Workload Identity: Prefer Workload Identity over service account keys when running on GKE
- Organize by prefix: Structure GCS buckets with logical prefixes for easier filtering
- Compress logs: Use GZIP compression to reduce storage and transfer costs
- Monitor costs: Track GCS API calls and data transfer costs
- Set retention: Configure GCS lifecycle policies to manage log retention
- Use object metadata: Use object metadata for better organization and filtering
- Enable versioning: Enable object versioning for important logs
- Use nearline/coldline: Use appropriate storage classes for cost optimization
Cost Considerations
- GCS storage: Charged per GB stored (varies by storage class)
- GCS operations: Charged per API call (Get operations when fetching files)
- Pub/Sub messages: Charged per million messages (notifications are free, but consumption may incur costs)
- Data transfer: Consider data transfer costs when fetching files from GCS
- Processing: Aggregator processing resources
Advanced Configuration
Multiple Buckets
Configure multiple GCS buckets with separate Pub/Sub topics:
aggregator:
customSources:
enabled: true
sources:
production_logs:
type: gcp_pubsub
project: YOUR_PROJECT_ID
subscription: projects/YOUR_PROJECT_ID/subscriptions/prod-logs-subscription
credentials_path: /etc/kubesense/gcs-key.json
development_logs:
type: gcp_pubsub
project: YOUR_PROJECT_ID
subscription: projects/YOUR_PROJECT_ID/subscriptions/dev-logs-subscription
credentials_path: /etc/kubesense/gcs-key.json
archive_logs:
type: gcp_pubsub
project: YOUR_PROJECT_ID
subscription: projects/YOUR_PROJECT_ID/subscriptions/archive-logs-subscription
credentials_path: /etc/kubesense/gcs-key.jsonSet up notifications for each bucket:
# Production logs
gsutil notification create -t prod-logs-notifications \
-f json -e OBJECT_FINALIZE -p prod/ gs://production-logs
# Development logs
gsutil notification create -t dev-logs-notifications \
-f json -e OBJECT_FINALIZE -p dev/ gs://development-logs
# Archive logs
gsutil notification create -t archive-logs-notifications \
-f json -e OBJECT_FINALIZE -p archive/ gs://archive-logsCustom Log Parsing
Configure custom parsing for specific log formats (parsing is handled via transforms, configured separately):
aggregator:
customSources:
enabled: true
sources:
custom_logs:
type: gcp_pubsub
project: YOUR_PROJECT_ID
subscription: projects/YOUR_PROJECT_ID/subscriptions/custom-logs-subscription
credentials_path: /etc/kubesense/gcs-key.jsonConclusion
GCS log ingestion enables KubeSense to import historical logs, perform backfills, and analyze archived data alongside real-time log streams. This provides comprehensive observability across your entire log history in GCP.