Monitor Pipeline Health ​
The Telemetry module is designed to be reliable and resilient. However, there may be situations when the instances drop data or cannot handle the load, and you must take action.
Overview ​
The Telemetry module automatically handles temporary issues to prevent data loss and ensure that the OTel Collector instances of your pipelines are operational and healthy. For example, if your backend is temporarily unavailable, the module buffers your data and attempts to resend it when the connection is restored.
The Telemetry module continuously monitors the health of your pipelines (see Self Monitor). To ensure that your Telemetry pipelines operate reliably, you can monitor their health data in the following ways:
- Perform manual checks by inspecting the status conditions of your pipeline resources with
kubectl. - Set up continuous monitoring by using a MetricPipeline to export health metrics to your observability backend, where you can set up dashboards and alerts.
Check Pipeline Status ​
For a quick check, you can inspect the status of a pipeline resource directly.
Run
kubectl getfor the pipeline that you want to inspect:- For LogPipeline:
kubectl get logpipeline <your-pipeline-name> - For TracePipeline:
kubectl get tracepipeline <your-pipeline-name> - For MetricPipeline:
kubectl get tracepipeline <your-pipeline-name>
- For LogPipeline:
Review the output. A healthy pipeline shows
Truefor all status conditions.txtNAME CONFIGURATION GENERATED GATEWAY HEALTHY FLOW HEALTHY backend True True TrueIf any condition is
False, investigate problem and fix it.
To understand the meaning of each status condition, see the detailed reference for each pipeline type:
Set Up Health Monitoring and Alerts ​
For production environments, set up continuous monitoring by exporting the health metrics to your observability backend, where you can create dashboards and configure alerts using alert rules. For an example, see Integrate With SAP Cloud Logging
WARNING
Do not scrape the metrics endpoint of the OpenTelemetry Collector instances. These metrics are an internal implementation detail and are subject to breaking changes when the underlying Collector is updated. For stable health monitoring, rely on the status conditions of your LogPipeline, MetricPipeline, or TracePipeline custom resources.
To collect these health metrics, you must have at least one active MetricPipeline in your cluster. This pipeline automatically collects and exports health data for all of your pipelines, including LogPipeline and TracePipeline resources.
The Telemetry module emits the following metrics for health monitoring:
kyma.resource.status.conditions: Represents the status of a specific condition on a resource. It is available for all pipelines and the mainTelemetryresource. Values:1("True"),0("False"), or-1("Unknown") Specific attributes:metric.attributes.type: The type of the status conditionmetric.attributes.status: The status of the conditionmetric.attributes.reason: A programmatic identifier indicating the reason for the condition's last transition
kyma.resource.status.state: Represents the overall state of the mainTelemetryresource. Values:1("Ready") or0("Not Ready") Specific attributes:state: The value of thestatus.statefield- Additionally, the following attributes are attached to all health metrics to identify the source resource:
k8s.resource.group: The group of the resourcek8s.resource.version: The version of the resourcek8s.resource.kind: The kind of the resourcek8s.resource.name: The name of the resource
To create an alert, define a rule that triggers on a specific metric value. For example, to create an alert that fires if a pipeline's TelemetryFlowHealthy condition becomes "False" (indicating data flow issues), use the following PromQL query:
min by (k8s_resource_name) ((kyma_resource_status_conditions{type="TelemetryFlowHealthy",k8s_resource_kind="metricpipelines"})) == 0If there are issues with one of the pipelines, see Troubleshooting for the Telemetry Module.