S3 Log Archives

Ingesting S3 Log Archives with KubeSense

KubeSense aggregator supports ingesting logs from AWS S3 buckets, enabling you to import historical logs, perform log rehydration, and migrate from legacy systems. This is particularly useful for batch imports, backfill jobs, and analyzing archived logs.

Note: S3 log ingestion is handled by the KubeSense aggregator using Vector's S3 source. This supports periodic polling, batch import, and backfill jobs.

Prerequisites

Before you begin, ensure you have:

AWS S3 bucket containing log files
AWS IAM credentials with read access to the S3 bucket
KubeSense aggregator deployed and accessible
Appropriate permissions to configure the aggregator

Supported Log Formats

The KubeSense aggregator can ingest logs from S3 in various formats:

JSON logs - Structured JSON log files
Text logs - Plain text log files
Multi-line logs - Logs spanning multiple lines
Compressed logs - GZIP, BZIP2 compressed files
CloudWatch Logs format - Exported CloudWatch logs
VPC Flow Logs - AWS VPC Flow Log format
ALB/NLB/ELB access logs - Load balancer access logs

Configuration Methods

Helm Configuration

Configure S3 ingestion through Helm values:

aggregator:
  customSources:
    enabled: true
    sources:
      s3_logs:
        type: aws_s3
        region: us-east-1
        bucket: my-logs-bucket
        key_prefix: logs/2024/
        poll_interval_secs: 300
        compression: gzip
        auth:
          access_key_id: "<AWS_ACCESS_KEY_ID>"
          secret_access_key: "<AWS_SECRET_ACCESS_KEY>"

IAM Permissions

Create an IAM policy with the following permissions for S3 access:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:GetObject"
      ],
      "Resource": [
        "arn:aws:s3:::my-logs-bucket",
        "arn:aws:s3:::my-logs-bucket/*"
      ]
    }
  ]
}

Use Cases

Historical Log Import

Import logs from S3 for historical analysis:

aggregator:
  customSources:
    enabled: true
    sources:
      historical_logs:
        type: aws_s3
        region: us-east-1
        bucket: historical-logs
        key_prefix: archive/2023/
        poll_interval_secs: 3600
        auth:
          access_key_id: "<AWS_ACCESS_KEY_ID>"
          secret_access_key: "<AWS_SECRET_ACCESS_KEY>"

VPC Flow Logs from S3

Ingest VPC Flow Logs stored in S3:

aggregator:
  customSources:
    enabled: true
    sources:
      vpc_flow_logs:
        type: aws_s3
        region: us-east-1
        bucket: vpc-flow-logs
        key_prefix: AWSLogs/123456789012/vpcflowlogs/
        auth:
          access_key_id: "<AWS_ACCESS_KEY_ID>"
          secret_access_key: "<AWS_SECRET_ACCESS_KEY>"

Load Balancer Access Logs

Import ALB/NLB/ELB access logs:

aggregator:
  customSources:
    enabled: true
    sources:
      alb_access_logs:
        type: aws_s3
        region: us-east-1
        bucket: alb-access-logs
        key_prefix: AWSLogs/123456789012/elasticloadbalancing/
        auth:
          access_key_id: "<AWS_ACCESS_KEY_ID>"
          secret_access_key: "<AWS_SECRET_ACCESS_KEY>"

Backfill Jobs

For one-time backfill operations, you can configure the aggregator to process all files in a bucket:

aggregator:
  customSources:
    enabled: true
    sources:
      backfill_logs:
        type: aws_s3
        region: us-east-1
        bucket: backfill-bucket
        key_prefix: logs/
        auth:
          access_key_id: "<AWS_ACCESS_KEY_ID>"
          secret_access_key: "<AWS_SECRET_ACCESS_KEY>"

Monitoring and Verification

After configuring S3 ingestion:

Check aggregator logs: Verify the aggregator is polling S3 successfully
Monitor S3 access: Check CloudTrail logs for S3 access patterns
Verify log ingestion: Check the KubeSense dashboard for logs from S3 sources
Monitor processing: Track the number of files processed and any errors

Troubleshooting

Logs Not Appearing

Verify IAM permissions: Ensure the credentials have read access to the S3 bucket
Check bucket region: Verify the region matches your bucket's actual region
Verify prefix path: Ensure the prefix path is correct
Check file format: Verify the log format matches the expected format
Review aggregator logs: Check for S3 access errors or parsing issues

Performance Issues

Adjust poll interval: Increase poll_interval_secs for large buckets
Use prefixes: Narrow down to specific prefixes to reduce scanning
Enable compression: Use compressed files to reduce transfer time
Batch processing: Process files in batches for better performance

Best Practices

Use IAM roles: Prefer IAM roles over access keys when possible
Organize by prefix: Structure S3 buckets with logical prefixes for easier filtering
Compress logs: Use GZIP compression to reduce storage and transfer costs
Monitor costs: Track S3 API calls and data transfer costs
Set retention: Configure S3 lifecycle policies to manage log retention
Use tags: Tag S3 objects with metadata for better organization

Conclusion

S3 log ingestion enables KubeSense to import historical logs, perform backfills, and analyze archived data alongside real-time log streams. This provides comprehensive observability across your entire log history.

S3 Log Archives

ON THIS PAGE