Temporal Cloud metrics reference

A metric is a measurement or data point that provides insights into the performance and health of a system. This document describes the metrics available on the Temporal Cloud platform. Temporal Cloud metrics help you monitor performance and troubleshoot errors. They provide insights into different aspects of the Service.

This guide covers these topics:

Gather metrics: Capture Temporal Cloud metrics using a Prometheus-style endpoint so they can be visualized using Prometheus and Grafana or exported to observability platforms like Datadog.
Available Temporal Cloud metrics: The metrics emitted by Temporal Cloud include counts of gRPC errors, requests, successful task matches to a poller, and more.
Metrics labels: Temporal Cloud metrics labels can filter metrics and help categorize and differentiate results.
Operations: An operation is a special type of label that categorizes the type of operation being performed when the metric was collected.

SDK METRICS

This document discusses metrics emitted by Temporal Cloud. Temporal SDKs also emit metrics, sourced from Temporal Clients and Worker processes. You can find information about Temporal SDK metrics on its dedicated page.

Please note:

SDK metrics start with the phrase temporal_.
Temporal Cloud metrics start with temporal_cloud_.

Gather metrics

How do you capture and use Temporal Cloud metrics?

Temporal Cloud emits metrics in a Prometheus-supported format. Prometheus is an open-source toolkit for alerting and monitoring. The Temporal Service exposes Cloud metrics with a Prometheus HTTP API endpoint. Temporal Cloud metrics provide a compatible data source for visualizing, monitoring, and observability platforms like Grafana and Data Dog.

You can use functions like rate or increase to calculate the rate of increase for a Temporal Cloud metric:

rate(temporal_cloud_v0_frontend_service_request_count[$__rate_interval])

Or you might use Prometheus to calculate average latencies or histogram quartiles:

# Average latency
rate(temporal_cloud_v0_service_latency_sum[$__rate_interval])
/ rate(temporal_cloud_v0_service_latency_count[$__rate_interval])

# Approximate 99th percentile latency broken down by operation
histogram_quantile(0.99, sum(rate(temporal_cloud_v0_service_latency_bucket[$__rate_interval])) by (le, operation))

Set up Grafana with Temporal Cloud observability to view metrics by creating or getting your Prometheus endpoint for Temporal Cloud metrics and enabling SDK metrics.

Related 📚

How to set up Grafana with Temporal Cloud observabilityfeature-guide
How to monitor Worker Health with Temporal Cloud Metricsfeature-guide
How to monitor Service Health with Temporal Cloud Metricsfeature-guide

Available Temporal Cloud metrics

What metrics are emitted from Temporal Cloud?

The following metrics are emitted for your Namespaces:

Frontend Service metrics

temporal_cloud_v0_frontend_service_error_count

This is a count of gRPC errors returned aggregated by operation.

temporal_cloud_v0_frontend_service_request_count

This is a count of gRPC requests received aggregated by operation.

temporal_cloud_v0_resource_exhausted_error_count

gRPC requests received that were rate-limited by Temporal Cloud, aggregated by cause.

temporal_cloud_v0_state_transition_count

Count of state transitions for each Namespace.

temporal_cloud_v0_total_action_count

Approximate count of Temporal Cloud Actions.

Poll metrics

temporal_cloud_v0_poll_success_count

Tasks that are successfully matched to a poller.

temporal_cloud_v0_poll_success_sync_count

Tasks that are successfully sync matched to a poller.

temporal_cloud_v0_poll_timeout_count

When no tasks are available for a poller before timing out.

Replication lag metrics

temporal_cloud_v0_replication_lag_bucket

A histogram of replication lag during a specific time interval for a multi-region Namespace.

temporal_cloud_v0_replication_lag_count

The replication lag count during a specific time interval for a multi-region Namespace.

temporal_cloud_v0_replication_lag_sum

The sum of replication lag during a specific time interval for a multi-region Namespace.

Schedule metrics

temporal_cloud_v0_schedule_action_success_count

Successful execution of a Scheduled Workflow.

temporal_cloud_v0_schedule_buffer_overruns_count

When average schedule run length is greater than average schedule interval while a buffer_all overlap policy is configured.

temporal_cloud_v0_schedule_missed_catchup_window_count

Skipped Scheduled executions when Workflows were delayed longer than the catchup window.

temporal_cloud_v0_schedule_rate_limited_count

Workflows that were delayed due to exceeding a rate limit.

Service latency metrics

temporal_cloud_v0_service_latency_bucket

Latency for SignalWithStartWorkflowExecution, SignalWorkflowExecution, StartWorkflowExecution operations.

temporal_cloud_v0_service_latency_count

Count of latency observations for SignalWithStartWorkflowExecution, SignalWorkflowExecution, StartWorkflowExecution operations.

temporal_cloud_v0_service_latency_sum

Sum of latency observation time for SignalWithStartWorkflowExecution, SignalWorkflowExecution, StartWorkflowExecution operations.

Workflow metrics

temporal_cloud_v0_workflow_cancel_count

Workflows canceled before completing execution.

temporal_cloud_v0_workflow_continued_as_new_count

Workflow Executions that were Continued-As-New from a past execution.

temporal_cloud_v0_workflow_failed_count

Workflows that failed before completion.

temporal_cloud_v0_workflow_success_count

Workflows that successfully completed.

temporal_cloud_v0_workflow_terminate_count

Workflows terminated before completing execution.

temporal_cloud_v0_workflow_timeout_count

Workflows that timed out before completing execution.

Metrics labels

What labels can you use to filter metrics?

Temporal Cloud metrics include key-value pairs called labels in their associated metadata. Labels help you categorize and differentiate metrics for precise filtering, querying, and aggregation. Use labels to specific attributes or compare values, such as numeric buckets in histograms. This added context enhances the monitoring and analysis capabilities, providing deeper insights into your data.

Metrics for all Namespaces in your account are available from the metrics endpoint. Use the following labels to filter metrics:

Label	Explanation
`le`	Less than or equal to (`le`) is used in histograms to categorize observations into buckets based on their value being less than or equal to a predefined upper limit.
`operation`	This includes operations such as: SignalWorkflowExecution StartBatchOperation StartWorkflowExecution TaskQueueMgr TerminateWorkflowExecution UpdateNamespace UpdateSchedule See: Metric Operations
`resource_exhausted_cause`	Cause for resource exhaustion.
`task_type`	Activity or Workflow.
`temporal_account`	Temporal Account.
`temporal_namespace`	Temporal Namespace.
`temporal_service_type`	Frontend or Matching or History or Worker.
`is_background`	This label on `temporal_cloud_v0_total_action_count` indicates when actions are produced by a Temporal background job, for example: hourly Workflow Export.
`namespace_mode`	This label on `temporal_cloud_v0_total_action_count` indicates if actions are produced by an active vs a standby Namespace. For a regular Namespace, `namespace_mode` will always be “active”.

The following is an example of how you can filter metrics using labels:

temporal_cloud_v0_poll_success_count{__rollup__="true", operation="TaskQueueMgr", task_type="Activity", temporal_account="12345", temporal_namespace="your_namespace.12345", temporal_service_type="matching"}

Operations

What operation labels are captured by Temporal Cloud?

Operations are a special class of metrics label. They describe the context during which a metric was captured. Temporal Cloud includes the following operations labels:

AdminDescribeMutableState
AdminGetWorkflowExecutionRawHistory
AdminGetWorkflowExecutionRawHistoryV2
AdminReapplyEvents
CountWorkflowExecutions
CreateSchedule
DeleteSchedule
DeleteWorkflowExecution
DescribeBatchOperation
DescribeNamespace
DescribeSchedule
DescribeTaskQueue
DescribeWorkflowExecution
GetWorkerBuildIdCompatibility
GetWorkerTaskReachability
GetWorkflowExecutionHistory
GetWorkflowExecutionHistoryReverse
ListBatchOperations
ListClosedWorkflowExecutions
OperatorDeleteNamespace
PatchSchedule
PollActivityTaskQueue
PollWorkflowExecutionHistory
PollWorkflowExecutionUpdate
PollWorkflowTaskQueue
QueryWorkflow
RecordActivityTaskHeartbeat
RecordActivityTaskHeartbeatById
RegisterNamespace
RequestCancelWorkflowExecution
ResetStickyTaskQueue
ResetWorkflowExecution
RespondActivityTaskCanceled
RespondActivityTaskCompleted
RespondActivityTaskCompletedById
RespondActivityTaskFailed
RespondActivityTaskFailedById
RespondQueryTaskCompleted
RespondWorkflowTaskCompleted
RespondWorkflowTaskFailed
SignalWithStartWorkflowExecution
SignalWorkflowExecution
StartBatchOperation
StartWorkflowExecution
StopBatchOperation
TerminateWorkflowExecution
UpdateNamespace
UpdateSchedule
UpdateWorkerBuildIdCompatibility
UpdateWorkflowExecution

As the following table shows, certain metrics groups support operations for aggregation and filtering:

Metrics Group / Operations	All Operations	SignalWithStartWorkflowExecution / SignalWorkflowExecution / StartWorkflowExecution	TaskQueueMgr	CompletionStats
Frontend Service Metrics	X
Service Latency Metrics		X
Poll Metrics			X
Workflow Metrics				X

Gather metrics​

Available Temporal Cloud metrics​

Frontend Service metrics​

temporal_cloud_v0_frontend_service_error_count​

temporal_cloud_v0_frontend_service_request_count​

temporal_cloud_v0_resource_exhausted_error_count​

temporal_cloud_v0_state_transition_count​

temporal_cloud_v0_total_action_count​

Poll metrics​

temporal_cloud_v0_poll_success_count​

temporal_cloud_v0_poll_success_sync_count​

temporal_cloud_v0_poll_timeout_count​

Replication lag metrics​

temporal_cloud_v0_replication_lag_bucket​

temporal_cloud_v0_replication_lag_count​

temporal_cloud_v0_replication_lag_sum​

Schedule metrics​

temporal_cloud_v0_schedule_action_success_count​

temporal_cloud_v0_schedule_buffer_overruns_count​

temporal_cloud_v0_schedule_missed_catchup_window_count​

temporal_cloud_v0_schedule_rate_limited_count​

Service latency metrics​

temporal_cloud_v0_service_latency_bucket​

temporal_cloud_v0_service_latency_count​

temporal_cloud_v0_service_latency_sum​

Workflow metrics​

temporal_cloud_v0_workflow_cancel_count​

temporal_cloud_v0_workflow_continued_as_new_count​

temporal_cloud_v0_workflow_failed_count​

temporal_cloud_v0_workflow_success_count​

temporal_cloud_v0_workflow_terminate_count​

temporal_cloud_v0_workflow_timeout_count​

Metrics labels​

Operations​