Monitor Pipeline Health

The Telemetry module is designed to be reliable and resilient. However, there may be situations when the instances drop data or cannot handle the load, and you must take action.

Overview

The Telemetry module automatically handles temporary issues to prevent data loss and ensure that the OTel Collector instances of your pipelines are operational and healthy. For example, if your backend is temporarily unavailable, the module buffers your data and attempts to resend it when the connection is restored.

The Telemetry module continuously monitors the health of your pipelines (see Self Monitor). To ensure that your Telemetry pipelines operate reliably, you can monitor their health data in the following ways:

Perform manual checks by inspecting the status conditions of your pipeline resources with kubectl.
Set up continuous monitoring by using a MetricPipeline to export health metrics to your observability backend, where you can set up dashboards and alerts.

Check Pipeline Status

For a quick check, you can inspect the status of a pipeline resource directly.

Run kubectl get for the pipeline that you want to inspect:
- For LogPipeline: kubectl get logpipeline <your-pipeline-name>
- For TracePipeline: kubectl get tracepipeline <your-pipeline-name>
- For MetricPipeline: kubectl get tracepipeline <your-pipeline-name>

Review the output. A healthy pipeline shows True for all status conditions.

txt

NAME      CONFIGURATION GENERATED   GATEWAY HEALTHY   FLOW HEALTHY
backend   True                      True              True

If any condition is False, investigate problem and fix it.

To understand the meaning of each status condition, see the detailed reference for each pipeline type:

Set Up Health Monitoring and Alerts

For production environments, set up continuous monitoring by exporting the health metrics to your observability backend, where you can create dashboards and configure alerts using alert rules. For an example, see Integrate With SAP Cloud Logging

WARNING

Do not scrape the metrics endpoint of the OpenTelemetry Collector instances. These metrics are an internal implementation detail and are subject to breaking changes when the underlying Collector is updated. For stable health monitoring, rely on the status conditions of your LogPipeline, MetricPipeline, or TracePipeline custom resources.

To collect these health metrics, you must have at least one active MetricPipeline in your cluster. This pipeline automatically collects and exports health data for all of your pipelines, including LogPipeline and TracePipeline resources.

The Telemetry module emits the following metrics for health monitoring:

kyma.resource.status.conditions: Represents the status of a specific condition on a resource. It is available for all pipelines and the main Telemetry resource. Values: 1 ("True"), 0 ("False"), or -1 ("Unknown") Specific attributes:
- metric.attributes.type: The type of the status condition
- metric.attributes.status: The status of the condition
- metric.attributes.reason: A programmatic identifier indicating the reason for the condition's last transition
kyma.resource.status.state: Represents the overall state of the main Telemetry resource. Values: 1 ("Ready") or 0 ("Not Ready") Specific attributes: state: The value of the status.state field
Additionally, the following attributes are attached to all health metrics to identify the source resource:
- k8s.resource.group: The group of the resource
- k8s.resource.version: The version of the resource
- k8s.resource.kind: The kind of the resource
- k8s.resource.name: The name of the resource

To create an alert, define a rule that triggers on a specific metric value. For example, to create an alert that fires if a pipeline's TelemetryFlowHealthy condition becomes "False" (indicating data flow issues), use the following PromQL query:

txt

min by (k8s_resource_name) ((kyma_resource_status_conditions{type="TelemetryFlowHealthy",k8s_resource_kind="metricpipelines"})) == 0

If there are issues with one of the pipelines, see Troubleshooting for the Telemetry Module.

Istio Service Mesh

Tutorials

Technical Reference

Troubleshooting

Tutorials

Expose a Workload

Use APIRule v2

Use APIRule v2alpha1

Use APIRule v1beta1

Expose and Secure a Workload

Use APIRule v2

Use APIRule v2alpha1

Use APIRule v1beta1

Security

Custom Resources

APIGateway Custom Resource

APIRule Custom Resource

v2

v2alpha1

v1beta1

APIRule Migration

Technical Reference

Troubleshooting Guides

APIRule and Service Connection Issues

APIRule v2

APIRule v2alpha1

APIRule v1beta1

External DNS Management Errors

APIRule v2 Introduction

Resources

Tutorials

Technical Reference

Runtime Agent

Tutorials

Resources

Tutorials

Register a Service

VPC Peering

Resources

Tutorials

Tutorials

Resources

Resources

Troubleshooting

Tutorials

Resources

Technical Reference

Troubleshooting Guides

Collecting Logs

Collecting Traces

Collecting Metrics

Filtering and Processing Data

Integrate with your OTLP Backend

Architecture

Integration Guides

Resources

Tutorials

Resources

Technical Reference

Tutorials

Commands

Monitor Pipeline Health ​

Overview ​

Check Pipeline Status ​

Set Up Health Monitoring and Alerts ​

Monitor Pipeline Health

Overview

Check Pipeline Status

Set Up Health Monitoring and Alerts