S3 Log Archives
Ingesting S3 Log Archives with KubeSense
KubeSense aggregator supports ingesting logs from AWS S3 buckets, enabling you to import historical logs, perform log rehydration, and migrate from legacy systems. This is particularly useful for batch imports, backfill jobs, and analyzing archived logs.
Note: S3 log ingestion is handled by the KubeSense aggregator using Vector's S3 source. This supports periodic polling, batch import, and backfill jobs.
Prerequisites
Before you begin, ensure you have:
- AWS S3 bucket containing log files
- AWS IAM credentials with read access to the S3 bucket
- KubeSense aggregator deployed and accessible
- Appropriate permissions to configure the aggregator
Supported Log Formats
The KubeSense aggregator can ingest logs from S3 in various formats:
- JSON logs - Structured JSON log files
- Text logs - Plain text log files
- Multi-line logs - Logs spanning multiple lines
- Compressed logs - GZIP, BZIP2 compressed files
- CloudWatch Logs format - Exported CloudWatch logs
- VPC Flow Logs - AWS VPC Flow Log format
- ALB/NLB/ELB access logs - Load balancer access logs
Configuration Methods
Helm Configuration
Configure S3 ingestion through Helm values:
aggregator:
customSources:
enabled: true
sources:
s3_logs:
type: aws_s3
region: us-east-1
bucket: my-logs-bucket
key_prefix: logs/2024/
poll_interval_secs: 300
compression: gzip
auth:
access_key_id: "<AWS_ACCESS_KEY_ID>"
secret_access_key: "<AWS_SECRET_ACCESS_KEY>"IAM Permissions
Create an IAM policy with the following permissions for S3 access:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::my-logs-bucket",
"arn:aws:s3:::my-logs-bucket/*"
]
}
]
}Use Cases
Historical Log Import
Import logs from S3 for historical analysis:
aggregator:
customSources:
enabled: true
sources:
historical_logs:
type: aws_s3
region: us-east-1
bucket: historical-logs
key_prefix: archive/2023/
poll_interval_secs: 3600
auth:
access_key_id: "<AWS_ACCESS_KEY_ID>"
secret_access_key: "<AWS_SECRET_ACCESS_KEY>"VPC Flow Logs from S3
Ingest VPC Flow Logs stored in S3:
aggregator:
customSources:
enabled: true
sources:
vpc_flow_logs:
type: aws_s3
region: us-east-1
bucket: vpc-flow-logs
key_prefix: AWSLogs/123456789012/vpcflowlogs/
auth:
access_key_id: "<AWS_ACCESS_KEY_ID>"
secret_access_key: "<AWS_SECRET_ACCESS_KEY>"Load Balancer Access Logs
Import ALB/NLB/ELB access logs:
aggregator:
customSources:
enabled: true
sources:
alb_access_logs:
type: aws_s3
region: us-east-1
bucket: alb-access-logs
key_prefix: AWSLogs/123456789012/elasticloadbalancing/
auth:
access_key_id: "<AWS_ACCESS_KEY_ID>"
secret_access_key: "<AWS_SECRET_ACCESS_KEY>"Backfill Jobs
For one-time backfill operations, you can configure the aggregator to process all files in a bucket:
aggregator:
customSources:
enabled: true
sources:
backfill_logs:
type: aws_s3
region: us-east-1
bucket: backfill-bucket
key_prefix: logs/
auth:
access_key_id: "<AWS_ACCESS_KEY_ID>"
secret_access_key: "<AWS_SECRET_ACCESS_KEY>"Monitoring and Verification
After configuring S3 ingestion:
- Check aggregator logs: Verify the aggregator is polling S3 successfully
- Monitor S3 access: Check CloudTrail logs for S3 access patterns
- Verify log ingestion: Check the KubeSense dashboard for logs from S3 sources
- Monitor processing: Track the number of files processed and any errors
Troubleshooting
Logs Not Appearing
- Verify IAM permissions: Ensure the credentials have read access to the S3 bucket
- Check bucket region: Verify the region matches your bucket's actual region
- Verify prefix path: Ensure the prefix path is correct
- Check file format: Verify the log format matches the expected format
- Review aggregator logs: Check for S3 access errors or parsing issues
Performance Issues
- Adjust poll interval: Increase
poll_interval_secsfor large buckets - Use prefixes: Narrow down to specific prefixes to reduce scanning
- Enable compression: Use compressed files to reduce transfer time
- Batch processing: Process files in batches for better performance
Best Practices
- Use IAM roles: Prefer IAM roles over access keys when possible
- Organize by prefix: Structure S3 buckets with logical prefixes for easier filtering
- Compress logs: Use GZIP compression to reduce storage and transfer costs
- Monitor costs: Track S3 API calls and data transfer costs
- Set retention: Configure S3 lifecycle policies to manage log retention
- Use tags: Tag S3 objects with metadata for better organization
Conclusion
S3 log ingestion enables KubeSense to import historical logs, perform backfills, and analyze archived data alongside real-time log streams. This provides comprehensive observability across your entire log history.