Traces & Metrics
KubeSense OpenTelemetry Collector for EC2
This guide covers collecting traces and metrics from your ECS EC2 instances using the OpenTelemetry Collector as a daemon service.
Installation
Step 1: Store Collector Configuration in Parameter Store
Store this configuration in AWS Parameter Store at /ecs/kubesense/otelcol-daemon.yaml:
extensions:
health_check:
endpoint: 0.0.0.0:13133
receivers:
# App telemetry (OTLP)
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
# ECS task & container metrics
awsecscontainermetrics:
collection_interval: 30s
# Host (EC2) metrics
hostmetrics:
collection_interval: 30s
root_path: /rootfs
scrapers:
cpu:
load:
memory:
network:
disk:
filesystem:
metrics:
system.filesystem.usage:
enabled: true
exclude_mount_points:
match_type: regexp
mount_points:
- /rootfs/boot/.*
- /rootfs/proc/.*
- /rootfs/sys/.*
# Docker container metrics
docker_stats:
endpoint: unix:///var/run/docker.sock
collection_interval: 30s
timeout: 20s
processors:
# Safety
memory_limiter:
check_interval: 1s
limit_mib: 1024
spike_limit_mib: 256
# ECS + EC2 metadata enrichment
resourcedetection:
detectors: [env, ecs, ec2]
override: false
# Docker metrics tag
attributes/docker:
actions:
- key: kubesense.metric_source
value: docker_stats
action: insert
# Custom labels
resource:
attributes:
- key: kubesense.cluster
value: <YOUR_CLUSTER_NAME>
action: insert
- key: kubesense.env_type
value: <YOUR_ENV_TYPE>
action: insert
batch:
timeout: 10s
send_batch_size: 1000
exporters:
# Traces
otlphttp/kubesense-traces:
endpoint: http://<KUBESENSE_ENDPOINT>:33443
timeout: 30s
tls:
insecure: true
# Metrics
prometheusremotewrite:
endpoint: http://<KUBESENSE_ENDPOINT>:30060/api/v1/write
timeout: 30s
resource_to_telemetry_conversion:
enabled: true
send_metadata: true
service:
extensions: [health_check]
pipelines:
# Traces
traces:
receivers: [otlp]
processors:
- memory_limiter
- resource
- batch
exporters: [otlphttp/kubesense-traces]
# Metrics (ECS + Host + Docker)
metrics:
receivers:
- otlp
- hostmetrics
- awsecscontainermetrics
- docker_stats
processors:
- memory_limiter
- resourcedetection
- attributes/docker
- resource
- batch
exporters: [prometheusremotewrite]Placeholder Values
Replace the following placeholders in the configuration:
<KUBESENSE_ENDPOINT>- KubeSense ingestion endpoint hostname (provided by KubeSense platform)<YOUR_CLUSTER_NAME>- Your ECS cluster identifier (provided by KubeSense platform)<YOUR_ENV_TYPE>- Environment designation like production or staging (provided by KubeSense platform)
Configuration
Step 2: Create Daemon Service Task Definition
Create an ECS task definition for the collector daemon service that runs on every EC2 instance:
{
"family": "ecs-otel-daemon-service",
"networkMode": "host",
"requiresCompatibilities": ["EC2"],
"cpu": "1024",
"memory": "2048",
"pidMode": "host",
"taskRoleArn": "arn:aws:iam::<ACCOUNT_ID>:role/otelCollectorTaskRole",
"executionRoleArn": "arn:aws:iam::<ACCOUNT_ID>:role/ecsTaskExecutionRole",
"containerDefinitions": [
{
"name": "otel-collector",
"image": "otel/opentelemetry-collector-contrib:0.142.0",
"cpu": 1024,
"memory": 2048,
"essential": true,
"user": "0",
"command": [
"--config=env:OTEL_CONFIG"
],
"environment": [
{
"name": "ECS_ENABLE_CONTAINER_METADATA",
"value": "true"
}
],
"secrets": [
{
"name": "OTEL_CONFIG",
"valueFrom": "arn:aws:ssm:<AWS_REGION>:<ACCOUNT_ID>:parameter/ecs/kubesense/otelcol-daemon.yaml"
}
],
"mountPoints": [
{
"sourceVolume": "proc",
"containerPath": "/rootfs/proc",
"readOnly": true
},
{
"sourceVolume": "dev",
"containerPath": "/rootfs/dev",
"readOnly": true
},
{
"sourceVolume": "al1_cgroup",
"containerPath": "/rootfs/cgroup",
"readOnly": true
},
{
"sourceVolume": "al2_cgroup",
"containerPath": "/rootfs/sys/fs/cgroup",
"readOnly": true
},
{
"sourceVolume": "boot",
"containerPath": "/rootfs/boot/efi",
"readOnly": true
},
{
"sourceVolume": "docker-sock",
"containerPath": "/var/run/docker.sock",
"readOnly": true
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/otel-collector",
"awslogs-create-group": "true",
"awslogs-region": "<AWS_REGION>",
"awslogs-stream-prefix": "ecs"
}
},
"portMappings": [
{
"containerPort": 4317,
"hostPort": 4317,
"protocol": "tcp"
},
{
"containerPort": 4318,
"hostPort": 4318,
"protocol": "tcp"
}
],
"systemControls": [],
"ulimits": [
{
"name": "nofile",
"softLimit": 65535,
"hardLimit": 65535
}
],
"volumesFrom": []
}
],
"volumes": [
{
"name": "proc",
"host": {
"sourcePath": "/proc"
}
},
{
"name": "dev",
"host": {
"sourcePath": "/dev"
}
},
{
"name": "al1_cgroup",
"host": {
"sourcePath": "/cgroup"
}
},
{
"name": "al2_cgroup",
"host": {
"sourcePath": "/sys/fs/cgroup"
}
},
{
"name": "boot",
"host": {
"sourcePath": "/boot/efi"
}
},
{
"name": "docker-sock",
"host": {
"sourcePath": "/var/run/docker.sock"
}
}
]
}Placeholder Values:
<AWS_REGION>- Your AWS region<ACCOUNT_ID>- Your AWS account ID
Key Configuration Details:
networkMode: host- Access EC2 host metrics and Docker socketpidMode: host- Host process namespace for metrics collectioncpu: 0or shared allocation - Better resource utilization (can burst)user: 0- Run as root to access host metrics and Docker socketECS_ENABLE_CONTAINER_METADATA: true- Enable ECS metadata exposureulimits- Increase file descriptor limits (nofile: 65535) for high-volume metric collectionsystemControls: []- Empty for standard configurationvolumesFrom: []- No volumes inherited from other containers- Mount Points:
/rootfs/proc- Process information/rootfs/dev- Device information/rootfs/cgroupand/rootfs/sys/fs/cgroup- Cgroup metrics (AL1 and AL2 compatibility)/rootfs/boot/efi- EFI boot information/var/run/docker.sock- Docker daemon socket (read-only) for container metrics
- Port mappings with
hostPortspecified (important for daemon mode)
Deployment
Step 3: Create ECS Service as Daemon
Create an ECS service with the daemon scheduling strategy to run the collector on every EC2 instance in your cluster.
The daemon scheduling strategy automatically deploys one collector instance per EC2 node. When new nodes join the cluster, the collector automatically starts on them.
Step 4: Instrument Your Application
Add OpenTelemetry instrumentation to your application based on your programming language:
- Node.js/JavaScript - OpenTelemetry Node.js SDK
- Python - OpenTelemetry Python SDK
- Java - OpenTelemetry Java SDK
- Go - OpenTelemetry Go SDK
- Other Languages - See OpenTelemetry instrumentation docs
Rebuild your application container image with the instrumentation.
Step 5: Configure Collector Endpoint
Update your application container's entry point to discover the EC2 instance IP address at runtime:
{
"name": "your-application",
"image": "your-image:latest",
"essential": true,
"entryPoint": [
"sh",
"-c",
"export OTEL_EXPORTER_OTLP_ENDPOINT=\"http://$(curl http://169.254.169.254/latest/meta-data/local-ipv4):4318\"; <YOUR_APPLICATION_START_COMMAND>"
],
"environment": [
{
"name": "OTEL_SERVICE_NAME",
"value": "<SERVICE_NAME>"
},
{
"name": "OTEL_EXPORTER_OTLP_PROTOCOL",
"value": "http/protobuf"
},
{
"name": "OTEL_RESOURCE_ATTRIBUTES",
"value": "kubesense.env_type=<YOUR_ENV_TYPE>,kubesense.cluster=<YOUR_CLUSTER_NAME>"
}
],
"portMappings": [
{
"containerPort": 3000,
"protocol": "tcp"
}
]
}Placeholder Values:
<YOUR_APPLICATION_START_COMMAND>- Command to start your application (e.g.,node app.jsorpython app.py)<SERVICE_NAME>- Your service/application name (e.g.,frontend-nodejs,api-service)<YOUR_ENV_TYPE>- Environment designation (e.g.,production,staging,legacy)<YOUR_CLUSTER_NAME>- Your ECS cluster identifier (provided by KubeSense platform)
How It Works:
curl http://169.254.169.254/latest/meta-data/local-ipv4queries the EC2 metadata service to get the instance's private IP address- The endpoint is constructed as
http://<INSTANCE_IP>:4318 - Application connects to the collector running on the same EC2 host
- The
sh -centrypoint executes the command after setting the environment variable
Verify Setup
After deploying the daemon service, verify it's working correctly:
Task Status Check
- Navigate to ECS → Clusters → [Your Cluster] → Services → ecs-otel-daemon-service
- Check the following:
- Task Status: Should be
RUNNING(one task per EC2 instance) - Health Status: Should be
HEALTHY - Container Status: All containers should be
RUNNING
- Task Status: Should be
CloudWatch Logs Verification
- Navigate to CloudWatch → Log Groups → /ecs/otel-collector
- Verify you see logs from each EC2 instance
- Look for messages indicating:
- Collector started successfully
- Components loaded (receivers, processors, exporters)
- Traces received from applications
- Metrics collected from host and containers
- Data exported to KubeSense
Metrics Collection Verification
Check that the collector is sending telemetry to KubeSense:
- Verify in KubeSense dashboard that you see traces from your applications
- Confirm metrics are flowing (host, container, ECS metrics)
- Check that metadata enrichment is working (cluster, env_type labels present)
Troubleshooting Installation
Common Issues
Task Not Starting:
- Check ECS cluster has available capacity
- Verify the container image can be pulled from the registry
- Review CloudWatch logs for the failed tasks
- Ensure EC2 instances are in
ACTIVEstate
Parameter Store Access Issues:
- Ensure the IAM role has
ssm:GetParameterpermissions - Verify the parameter name matches exactly:
/ecs/kubesense/otelcol-daemon.yaml - Check the parameter is in the same region as your ECS cluster
- Confirm the IAM role is properly attached to the task
Docker Socket Access Issues:
- Verify EC2 instances have Docker daemon running
- Check mount point for Docker socket:
/var/run/docker.sock - Ensure collector container runs as
user: 0(root) - Review CloudWatch logs for socket permission errors
Host Metrics Not Collecting:
- Verify root filesystem is mounted at
/rootfs - Check all required mount points exist on EC2 instances
- Verify hostmetrics receiver is enabled in collector config
- Review CloudWatch logs for metric collection errors
Host Metadata Enrichment Missing:
- Ensure
resourcedetectionprocessor includesec2detector - Verify EC2 instances have proper IAM instance profile
- Check IAM instance profile has EC2 describe permissions
- Review CloudWatch logs for resource detection errors
Traces Not Appearing in KubeSense:
- Verify applications are sending OTLP telemetry to
http://127.0.0.1:4318 - Check application container can reach collector on localhost
- Verify collector exporter configuration has correct KubeSense endpoint
- Review CloudWatch logs for export errors