Kubesense

GCS Log Archives

Ingesting GCS Log Archives with KubeSense

KubeSense aggregator supports ingesting logs from Google Cloud Storage (GCS) buckets by using Pub/Sub notifications. When new files are added to your GCS bucket, notifications are sent to a Pub/Sub topic, which the aggregator consumes. This enables real-time ingestion of archived logs, log rehydration, and migration from legacy systems.

Note: GCS log ingestion uses GCS bucket notifications to Pub/Sub, then the KubeSense aggregator consumes from Pub/Sub. This provides real-time ingestion as files are added to the bucket.

Prerequisites

Before you begin, ensure you have:

  1. GCS bucket containing log files
  2. Pub/Sub API enabled in your GCP project
  3. GCP service account with permissions to read GCS and consume Pub/Sub
  4. KubeSense aggregator deployed and accessible
  5. Appropriate permissions to configure the aggregator

Supported Log Formats

The KubeSense aggregator can ingest logs from GCS in various formats:

  • JSON logs - Structured JSON log files
  • Text logs - Plain text log files
  • Multi-line logs - Logs spanning multiple lines
  • Compressed logs - GZIP, BZIP2 compressed files
  • Cloud Logging export format - Exported Cloud Logging logs
  • VPC Flow Logs - GCP VPC Flow Log format
  • Load balancer logs - GCP load balancer access logs

Step 1: Create Pub/Sub Topic and Subscription

Create a Pub/Sub topic to receive GCS bucket notifications:

# Create Pub/Sub topic
gcloud pubsub topics create gcs-logs-notifications \
  --project=YOUR_PROJECT_ID

# Create subscription
gcloud pubsub subscriptions create gcs-logs-subscription \
  --topic=gcs-logs-notifications \
  --project=YOUR_PROJECT_ID

Step 2: Configure GCS Bucket Notifications

Configure your GCS bucket to send notifications to Pub/Sub when files are added:

# Enable notifications for the bucket
gsutil notification create -t gcs-logs-notifications \
  -f json \
  -e OBJECT_FINALIZE \
  gs://my-logs-bucket

# For specific prefix (optional)
gsutil notification create -t gcs-logs-notifications \
  -f json \
  -e OBJECT_FINALIZE \
  -p logs/2024/ \
  gs://my-logs-bucket

Notification Event Types

  • OBJECT_FINALIZE - Triggered when a new object is created (recommended)
  • OBJECT_METADATA_UPDATE - Triggered when object metadata is updated
  • OBJECT_DELETE - Triggered when an object is deleted
  • OBJECT_ARCHIVE - Triggered when an object is archived (for Archive storage class)

Step 3: Configure KubeSense Aggregator

Configure the aggregator to consume from the Pub/Sub subscription:

aggregator:
  customSources:
    enabled: true
    sources:
      gcs_logs:
        type: gcp_pubsub
        project: YOUR_PROJECT_ID
        subscription: projects/YOUR_PROJECT_ID/subscriptions/gcs-logs-subscription
        credentials_path: /etc/kubesense/gcs-key.json

Service Account Permissions

Create a service account with the following IAM roles:

# Create service account
gcloud iam service-accounts create kubesense-gcs-reader \
  --display-name="KubeSense GCS Reader" \
  --project=YOUR_PROJECT_ID

# Grant Storage Object Viewer role (to read GCS objects)
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
  --member="serviceAccount:kubesense-gcs-reader@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/storage.objectViewer"

# Grant Pub/Sub Subscriber role (to consume from Pub/Sub)
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
  --member="serviceAccount:kubesense-gcs-reader@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/pubsub.subscriber"

# Grant Pub/Sub Viewer role (to list subscriptions)
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
  --member="serviceAccount:kubesense-gcs-reader@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/pubsub.viewer"

# Create and download key
gcloud iam service-accounts keys create kubesense-gcs-key.json \
  --iam-account=kubesense-gcs-reader@YOUR_PROJECT_ID.iam.gserviceaccount.com \
  --project=YOUR_PROJECT_ID

Grant Permissions for GCS Notifications

The bucket needs permission to publish to Pub/Sub:

# Grant Pub/Sub Publisher role to the Cloud Storage service account
gcloud pubsub topics add-iam-policy-binding gcs-logs-notifications \
  --member="serviceAccount:service-$(gcloud projects describe YOUR_PROJECT_ID --format='value(projectNumber)')@gs-project-accounts.iam.gserviceaccount.com" \
  --role="roles/pubsub.publisher" \
  --project=YOUR_PROJECT_ID

Use Cases

Historical Log Import

For historical logs, configure notifications and the aggregator will process files as they're added:

aggregator:
  customSources:
    enabled: true
    sources:
      historical_logs:
        type: gcp_pubsub
        project: YOUR_PROJECT_ID
        subscription: projects/YOUR_PROJECT_ID/subscriptions/historical-logs-subscription
        credentials_path: /etc/kubesense/gcs-key.json

Set up the bucket notification with a prefix filter:

gsutil notification create -t historical-logs-notifications \
  -f json \
  -e OBJECT_FINALIZE \
  -p archive/2023/ \
  gs://historical-logs

VPC Flow Logs from GCS

Ingest VPC Flow Logs stored in GCS:

aggregator:
  customSources:
    enabled: true
    sources:
      vpc_flow_logs:
        type: gcp_pubsub
        project: YOUR_PROJECT_ID
        subscription: projects/YOUR_PROJECT_ID/subscriptions/vpc-flow-logs-subscription
        credentials_path: /etc/kubesense/gcs-key.json

Set up bucket notification:

gsutil notification create -t vpc-flow-logs-notifications \
  -f json \
  -e OBJECT_FINALIZE \
  -p vpc-flow-logs/ \
  gs://vpc-flow-logs-bucket

Cloud Logging Exports

Import exported Cloud Logging logs:

aggregator:
  customSources:
    enabled: true
    sources:
      cloud_logging_exports:
        type: gcp_pubsub
        project: YOUR_PROJECT_ID
        subscription: projects/YOUR_PROJECT_ID/subscriptions/cloud-logging-exports-subscription
        credentials_path: /etc/kubesense/gcs-key.json

Set up bucket notification:

gsutil notification create -t cloud-logging-exports-notifications \
  -f json \
  -e OBJECT_FINALIZE \
  -p exports/ \
  gs://cloud-logging-exports

Backfill Jobs

For backfill operations, you can trigger notifications for existing files or copy files to trigger new notifications:

aggregator:
  customSources:
    enabled: true
    sources:
      backfill_logs:
        type: gcp_pubsub
        project: YOUR_PROJECT_ID
        subscription: projects/YOUR_PROJECT_ID/subscriptions/backfill-logs-subscription
        credentials_path: /etc/kubesense/gcs-key.json

To trigger notifications for existing files, you can use gsutil to touch files or copy them:

# Option 1: Copy files to trigger notifications
gsutil -m cp gs://backfill-bucket/logs/*.log gs://backfill-bucket/logs/processed/

# Option 2: Use gsutil to list and process (for manual backfill)
gsutil ls gs://backfill-bucket/logs/ | while read file; do
  # Process file manually or trigger notification
done

Using Workload Identity (GKE)

If running on GKE, you can use Workload Identity instead of service account keys:

Step 1: Enable Workload Identity

# Enable Workload Identity on GKE cluster
gcloud container clusters update CLUSTER_NAME \
  --workload-pool=YOUR_PROJECT_ID.svc.id.goog \
  --zone=ZONE

Step 2: Create Kubernetes Service Account

apiVersion: v1
kind: ServiceAccount
metadata:
  name: kubesense-aggregator
  namespace: kubesense
  annotations:
    iam.gke.io/gcp-service-account: kubesense-gcs-reader@YOUR_PROJECT_ID.iam.gserviceaccount.com

Step 3: Configure Aggregator

aggregator:
  customSources:
    enabled: true
    sources:
      gcs_logs:
        type: gcp_pubsub
        project: YOUR_PROJECT_ID
        subscription: projects/YOUR_PROJECT_ID/subscriptions/gcs-logs-subscription
        credentials_path: /etc/kubesense/gcs-key.json
        # Workload Identity is handled via service account annotation in Kubernetes

How It Works

  1. File Upload: When a file is uploaded to the GCS bucket, GCS sends a notification to the Pub/Sub topic
  2. Pub/Sub Message: The notification contains metadata about the object (bucket name, object name, size, etc.)
  3. Aggregator Consumption: The KubeSense aggregator consumes messages from the Pub/Sub subscription
  4. File Processing: The aggregator can then fetch the file from GCS using the information in the Pub/Sub message and process it

Monitoring and Verification

After configuring GCS ingestion:

  1. Check bucket notifications: Verify notifications are configured: gsutil notification list gs://my-logs-bucket
  2. Monitor Pub/Sub: Check Pub/Sub metrics for message delivery
  3. Check aggregator logs: Verify the aggregator is consuming from Pub/Sub successfully
  4. Verify log ingestion: Check the KubeSense dashboard for logs from GCS sources
  5. Monitor processing: Track the number of files processed and any errors

Troubleshooting

Logs Not Appearing

  1. Verify bucket notifications: Check that notifications are configured: gsutil notification list gs://my-logs-bucket
  2. Check Pub/Sub topic: Verify messages are being published to the Pub/Sub topic
  3. Verify subscription: Ensure the subscription exists and is active
  4. Check service account permissions: Ensure the service account has Pub/Sub Subscriber and Storage Object Viewer roles
  5. Verify credentials: Ensure service account key is valid and accessible
  6. Review aggregator logs: Check for Pub/Sub consumption errors or parsing issues
  7. Check notification permissions: Verify GCS service account has Pub/Sub Publisher role on the topic

Performance Issues

  1. Use prefixes: Configure notifications with specific prefixes to filter relevant files
  2. Enable compression: Use compressed files to reduce transfer time
  3. Batch processing: Process files in batches for better performance
  4. Scale Pub/Sub: Increase Pub/Sub throughput units if needed
  5. Monitor message backlog: Check Pub/Sub subscription metrics for message accumulation

Best Practices

  • Use Workload Identity: Prefer Workload Identity over service account keys when running on GKE
  • Organize by prefix: Structure GCS buckets with logical prefixes for easier filtering
  • Compress logs: Use GZIP compression to reduce storage and transfer costs
  • Monitor costs: Track GCS API calls and data transfer costs
  • Set retention: Configure GCS lifecycle policies to manage log retention
  • Use object metadata: Use object metadata for better organization and filtering
  • Enable versioning: Enable object versioning for important logs
  • Use nearline/coldline: Use appropriate storage classes for cost optimization

Cost Considerations

  • GCS storage: Charged per GB stored (varies by storage class)
  • GCS operations: Charged per API call (Get operations when fetching files)
  • Pub/Sub messages: Charged per million messages (notifications are free, but consumption may incur costs)
  • Data transfer: Consider data transfer costs when fetching files from GCS
  • Processing: Aggregator processing resources

Advanced Configuration

Multiple Buckets

Configure multiple GCS buckets with separate Pub/Sub topics:

aggregator:
  customSources:
    enabled: true
    sources:
      production_logs:
        type: gcp_pubsub
        project: YOUR_PROJECT_ID
        subscription: projects/YOUR_PROJECT_ID/subscriptions/prod-logs-subscription
        credentials_path: /etc/kubesense/gcs-key.json
      development_logs:
        type: gcp_pubsub
        project: YOUR_PROJECT_ID
        subscription: projects/YOUR_PROJECT_ID/subscriptions/dev-logs-subscription
        credentials_path: /etc/kubesense/gcs-key.json
      archive_logs:
        type: gcp_pubsub
        project: YOUR_PROJECT_ID
        subscription: projects/YOUR_PROJECT_ID/subscriptions/archive-logs-subscription
        credentials_path: /etc/kubesense/gcs-key.json

Set up notifications for each bucket:

# Production logs
gsutil notification create -t prod-logs-notifications \
  -f json -e OBJECT_FINALIZE -p prod/ gs://production-logs

# Development logs
gsutil notification create -t dev-logs-notifications \
  -f json -e OBJECT_FINALIZE -p dev/ gs://development-logs

# Archive logs
gsutil notification create -t archive-logs-notifications \
  -f json -e OBJECT_FINALIZE -p archive/ gs://archive-logs

Custom Log Parsing

Configure custom parsing for specific log formats (parsing is handled via transforms, configured separately):

aggregator:
  customSources:
    enabled: true
    sources:
      custom_logs:
        type: gcp_pubsub
        project: YOUR_PROJECT_ID
        subscription: projects/YOUR_PROJECT_ID/subscriptions/custom-logs-subscription
        credentials_path: /etc/kubesense/gcs-key.json

Conclusion

GCS log ingestion enables KubeSense to import historical logs, perform backfills, and analyze archived data alongside real-time log streams. This provides comprehensive observability across your entire log history in GCP.