Kubesense

S3 Log Archives

Ingesting S3 Log Archives with KubeSense

KubeSense aggregator supports ingesting logs from AWS S3 buckets, enabling you to import historical logs, perform log rehydration, and migrate from legacy systems. This is particularly useful for batch imports, backfill jobs, and analyzing archived logs.

Note: S3 log ingestion is handled by the KubeSense aggregator using Vector's S3 source. This supports periodic polling, batch import, and backfill jobs.

Prerequisites

Before you begin, ensure you have:

  1. AWS S3 bucket containing log files
  2. AWS IAM credentials with read access to the S3 bucket
  3. KubeSense aggregator deployed and accessible
  4. Appropriate permissions to configure the aggregator

Supported Log Formats

The KubeSense aggregator can ingest logs from S3 in various formats:

  • JSON logs - Structured JSON log files
  • Text logs - Plain text log files
  • Multi-line logs - Logs spanning multiple lines
  • Compressed logs - GZIP, BZIP2 compressed files
  • CloudWatch Logs format - Exported CloudWatch logs
  • VPC Flow Logs - AWS VPC Flow Log format
  • ALB/NLB/ELB access logs - Load balancer access logs

Configuration Methods

Helm Configuration

Configure S3 ingestion through Helm values:

aggregator:
  customSources:
    enabled: true
    sources:
      s3_logs:
        type: aws_s3
        region: us-east-1
        bucket: my-logs-bucket
        key_prefix: logs/2024/
        poll_interval_secs: 300
        compression: gzip
        auth:
          access_key_id: "<AWS_ACCESS_KEY_ID>"
          secret_access_key: "<AWS_SECRET_ACCESS_KEY>"

IAM Permissions

Create an IAM policy with the following permissions for S3 access:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:GetObject"
      ],
      "Resource": [
        "arn:aws:s3:::my-logs-bucket",
        "arn:aws:s3:::my-logs-bucket/*"
      ]
    }
  ]
}

Use Cases

Historical Log Import

Import logs from S3 for historical analysis:

aggregator:
  customSources:
    enabled: true
    sources:
      historical_logs:
        type: aws_s3
        region: us-east-1
        bucket: historical-logs
        key_prefix: archive/2023/
        poll_interval_secs: 3600
        auth:
          access_key_id: "<AWS_ACCESS_KEY_ID>"
          secret_access_key: "<AWS_SECRET_ACCESS_KEY>"

VPC Flow Logs from S3

Ingest VPC Flow Logs stored in S3:

aggregator:
  customSources:
    enabled: true
    sources:
      vpc_flow_logs:
        type: aws_s3
        region: us-east-1
        bucket: vpc-flow-logs
        key_prefix: AWSLogs/123456789012/vpcflowlogs/
        auth:
          access_key_id: "<AWS_ACCESS_KEY_ID>"
          secret_access_key: "<AWS_SECRET_ACCESS_KEY>"

Load Balancer Access Logs

Import ALB/NLB/ELB access logs:

aggregator:
  customSources:
    enabled: true
    sources:
      alb_access_logs:
        type: aws_s3
        region: us-east-1
        bucket: alb-access-logs
        key_prefix: AWSLogs/123456789012/elasticloadbalancing/
        auth:
          access_key_id: "<AWS_ACCESS_KEY_ID>"
          secret_access_key: "<AWS_SECRET_ACCESS_KEY>"

Backfill Jobs

For one-time backfill operations, you can configure the aggregator to process all files in a bucket:

aggregator:
  customSources:
    enabled: true
    sources:
      backfill_logs:
        type: aws_s3
        region: us-east-1
        bucket: backfill-bucket
        key_prefix: logs/
        auth:
          access_key_id: "<AWS_ACCESS_KEY_ID>"
          secret_access_key: "<AWS_SECRET_ACCESS_KEY>"

Monitoring and Verification

After configuring S3 ingestion:

  1. Check aggregator logs: Verify the aggregator is polling S3 successfully
  2. Monitor S3 access: Check CloudTrail logs for S3 access patterns
  3. Verify log ingestion: Check the KubeSense dashboard for logs from S3 sources
  4. Monitor processing: Track the number of files processed and any errors

Troubleshooting

Logs Not Appearing

  1. Verify IAM permissions: Ensure the credentials have read access to the S3 bucket
  2. Check bucket region: Verify the region matches your bucket's actual region
  3. Verify prefix path: Ensure the prefix path is correct
  4. Check file format: Verify the log format matches the expected format
  5. Review aggregator logs: Check for S3 access errors or parsing issues

Performance Issues

  1. Adjust poll interval: Increase poll_interval_secs for large buckets
  2. Use prefixes: Narrow down to specific prefixes to reduce scanning
  3. Enable compression: Use compressed files to reduce transfer time
  4. Batch processing: Process files in batches for better performance

Best Practices

  • Use IAM roles: Prefer IAM roles over access keys when possible
  • Organize by prefix: Structure S3 buckets with logical prefixes for easier filtering
  • Compress logs: Use GZIP compression to reduce storage and transfer costs
  • Monitor costs: Track S3 API calls and data transfer costs
  • Set retention: Configure S3 lifecycle policies to manage log retention
  • Use tags: Tag S3 objects with metadata for better organization

Conclusion

S3 log ingestion enables KubeSense to import historical logs, perform backfills, and analyze archived data alongside real-time log streams. This provides comprehensive observability across your entire log history.