Kubesense

Traces & Metrics

KubeSense OpenTelemetry Collector

This guide covers collecting traces and metrics from your ECS Serverless (Fargate) applications using an OpenTelemetry Collector sidecar.

Installation

Step 1: Store Collector Configuration in Parameter Store

Store this configuration in AWS Parameter Store at /ecs/kubesense/otelcol-sidecar.yaml:

extensions:
  health_check:

receivers:
  # ECS task / container metrics
  awsecscontainermetrics:
    collection_interval: 30s

  # App → Collector
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  # Prevent OOM in Fargate
  memory_limiter:
    check_interval: 1s
    limit_mib: 512
    spike_limit_mib: 128

  # Reduce payload size
  batch:
    timeout: 10s
    send_batch_size: 1024

  # Keep only useful ECS metrics
  filter:
    metrics:
      include:
        match_type: strict
        metric_names:
          - ecs.task.cpu.reserved
          - ecs.task.cpu.utilized
          - ecs.task.memory.reserved
          - ecs.task.memory.utilized
          - ecs.task.network.rate.rx
          - ecs.task.network.rate.tx
          - ecs.task.storage.read_bytes
          - ecs.task.storage.write_bytes
          - container.duration

  # Add your platform labels
  resource:
    attributes:
      - key: kubesense.cluster
        value: <YOUR_CLUSTER_NAME>
        action: insert
      - key: kubesense.env_type
        value: <YOUR_ENV_TYPE>
        action: insert

exporters:
  # OTLP over HTTP
  otlphttp/kubesense-traces:
    endpoint: http://<KUBESENSE_ENDPOINT>:33443
    tls:
      insecure: true
    timeout: 30s

  # Metrics → VictoriaMetrics
  prometheusremotewrite:
    endpoint: http://<KUBESENSE_ENDPOINT>:30060/api/v1/write
    timeout: 30s
    resource_to_telemetry_conversion:
      enabled: true
    send_metadata: true

service:
  extensions: [health_check]

  pipelines:
    # Traces
    traces:
      receivers: [otlp]
      processors: [memory_limiter, resource, batch]
      exporters: [otlphttp/kubesense-traces]

    # App metrics (if any)
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, resource, batch]
      exporters: [prometheusremotewrite]

    # ECS task/container metrics
    metrics/aws:
      receivers: [awsecscontainermetrics]
      processors: [memory_limiter, filter, resource, batch]
      exporters: [prometheusremotewrite]

Placeholder Values

Replace the following placeholders in the configuration:

  • <KUBESENSE_ENDPOINT> - KubeSense ingestion endpoint hostname (provided by KubeSense platform)
  • <YOUR_CLUSTER_NAME> - Your ECS cluster identifier (provided by KubeSense platform)
  • <YOUR_ENV_TYPE> - Environment designation like production or staging (provided by KubeSense platform)

Configuration

Step 2: Add Collector Container to Task Definition

In your ECS task definition, add the OpenTelemetry Collector as a sidecar container:

{
  "name": "otel-collector",
  "image": "otel/opentelemetry-collector-contrib:0.142.0",
  "cpu": 256,
  "memory": 512,
  "essential": true,
  "command": [
    "--config=env:OTEL_CONFIG"
  ],
  "secrets": [
    {
      "name": "OTEL_CONFIG",
      "valueFrom": "arn:aws:ssm:<AWS_REGION>:<ACCOUNT_ID>:parameter/ecs/kubesense/otelcol-sidecar.yaml"
    }
  ],
  "logConfiguration": {
    "logDriver": "awslogs",
    "options": {
      "awslogs-group": "/ecs/otel-collector",
      "awslogs-create-group": "true",
      "awslogs-region": "<AWS_REGION>",
      "awslogs-stream-prefix": "ecs"
    }
  },
  "portMappings": [
    {
      "containerPort": 4317,
      "protocol": "tcp"
    },
    {
      "containerPort": 4318,
      "protocol": "tcp"
    }
  ]
}

Placeholder Values:

  • <AWS_REGION> - Your AWS region
  • <ACCOUNT_ID> - Your AWS account ID

Step 3: Configure Application Container

Update your application container to send traces and metrics to the collector:

{
  "name": "your-application",
  "image": "your-image:latest",
  "essential": true,
  "dependsOn": [
    {
      "containerName": "otel-collector",
      "condition": "START"
    }
  ],
  "environment": [
    {
      "name": "OTEL_SERVICE_NAME",
      "value": "<SERVICE_NAME>"
    },
    {
      "name": "OTEL_EXPORTER_OTLP_PROTOCOL",
      "value": "http/protobuf"
    },
    {
      "name": "OTEL_EXPORTER_OTLP_ENDPOINT",
      "value": "http://localhost:4318"
    }
  ],
  "portMappings": [
    {
      "containerPort": 3000,
      "protocol": "tcp"
    }
  ]
}

Placeholder Values:

  • <SERVICE_NAME> - Your application service name

Key Points:

  • dependsOn ensures the collector starts before your application
  • Application sends telemetry to http://localhost:4318 (OTLP HTTP endpoint)
  • Update OTEL_SERVICE_NAME with your service identifier

Step 4: Update IAM Task Execution Role

Your ECS Task Execution Role needs permission to read from SSM Parameter Store and write to CloudWatch Logs.

Option 1: Attach Managed Policies

Attach the following AWS managed policies to your task execution role:

  • AmazonSSMReadOnlyAccess
  • CloudWatchLogsFullAccess

Option 2: Add Inline Policy

Alternatively, add an inline policy that allows specific actions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "ssm:GetParameter",
      "Resource": "arn:aws:ssm:<AWS_REGION>:<ACCOUNT_ID>:parameter/ecs/kubesense/otelcol-sidecar.yaml"
    },
    {
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "arn:aws:logs:<AWS_REGION>:<ACCOUNT_ID>:log-group:/ecs/otel-collector:*"
    }
  ]
}

Step 5: Update ECS Task Role

The ECS Task Role (not the execution role) should also have access to SSM Parameter Store and CloudWatch Logs if your app or sidecar needs it.

Option 1: Attach Managed Policies

Attach the same managed policies as above:

  • AmazonSSMReadOnlyAccess
  • CloudWatchLogsFullAccess

Option 2: Use Minimal Inline Policy

For tighter security, use a minimal inline policy for just the required resources:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "ssm:GetParameter",
      "Resource": "arn:aws:ssm:<AWS_REGION>:<ACCOUNT_ID>:parameter/ecs/kubesense/otelcol-sidecar.yaml"
    },
    {
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "arn:aws:logs:<AWS_REGION>:<ACCOUNT_ID>:log-group:/ecs/otel-collector:*"
    }
  ]
}

Step 6: Deploy Task Definition

Deploy your updated ECS task definition as follows:

  1. Update your ECS service with the modified task definition
  2. Restart the tasks to apply changes
  3. Monitor CloudWatch Logs at /ecs/otel-collector to verify the collector is running and receiving telemetry

Verify Setup

Check CloudWatch Logs for the collector container (/ecs/otel-collector) to confirm:

  • Collector starts successfully
  • Receives traces and metrics from your application
  • Successfully exports to KubeSense

Troubleshooting Installation

Common Issues

Task Not Starting:

  • Check ECS cluster has available capacity
  • Verify the container image can be pulled from the registry
  • Review CloudWatch logs for the failed tasks

Parameter Store Access Issues:

  • Ensure the IAM role has ssm:GetParameter permissions
  • Verify the parameter name matches exactly: /ecs/kubesense/otelcol-sidecar.yaml
  • Check the parameter is in the same region as your ECS cluster

Container Health Check Failures:

  • Verify the health check endpoint is accessible
  • Check container logs for any startup errors
  • Ensure proper port mappings are configured (4317 and 4318)