Nomad
Nomad Autoscaler Telemetry
The Nomad Autoscaler agent collects various runtime metrics about the performance of different libraries and subsystems. These metrics are aggregated on a ten second interval and are retained for one minute. To configure the telemetry output please see the agent configuration.
This data can be accessed via the /v1/metrics
HTTP endpoint, via sending a
signal to the Nomad Autoscaler process or via a number of integrations.
To view this data via sending a signal to the Nomad Autoscaler process: on Unix,
this is USR1
while on Windows it is BREAK
. Once Nomad Autoscaler receives
the signal, it will dump the current telemetry information to the agent's stderr
.
This telemetry information can be used for debugging or otherwise getting a better view of what Nomad is doing.
Below is sample output of a telemetry dump:
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.sys_bytes': 74793216.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.malloc_count': 219856.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.free_count': 183613.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.total_gc_pause_ns': 348822.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.total_gc_runs': 5.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.num_goroutines': 12.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.policy.total_num': 0.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.alloc_bytes': 4316568.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.heap_objects': 36243.000
[2020-08-25 10:01:20 +0100 BST][S] 'nomad-autoscaler.runtime.gc_pause_ns': Count: 5 Min: 38083.000 Mean: 69764.400 Max: 122291.000 Stddev: 31487.808 Sum: 348822.000 LastUpdated: 2020-08-25 10:01:26.574809 +0100 BST m=+1.241576679
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.alloc_bytes': 4370504.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.malloc_count': 220853.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.free_count': 183613.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.policy.total_num': 0.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.num_goroutines': 12.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.total_gc_pause_ns': 348822.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.total_gc_runs': 5.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.sys_bytes': 74793216.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.heap_objects': 37240.000
Runtime Metrics
The runtime metrics help understand the Nomad Autoscaler agent's memory and load pressure performance.
Metric | Description | Type |
---|---|---|
nomad-autoscaler.runtime.num_goroutines | Number of running goroutines | Gauge |
nomad-autoscaler.runtime.alloc_bytes | The number of allocated heap bytes | Gauge |
nomad-autoscaler.runtime.sys_bytes | The total bytes of memory obtained from the OS | Gauge |
nomad-autoscaler.runtime.malloc_count | Cumulative count of heap objects allocated | Gauge |
nomad-autoscaler.runtime.free_count | Cumulative count of heap objects freed | Gauge |
nomad-autoscaler.runtime.heap_objects | Number of allocated heap objects | Gauge |
nomad-autoscaler.runtime.total_gc_pause_ns | Cumulative nanoseconds in GC stop-the-world pauses | Gauge |
nomad-autoscaler.runtime.total_gc_runs | Number of completed GC cycles | Gauge |
nomad-autoscaler.runtime.gc_pause_ns | Number of nanoseconds to complete the last GC cycle | Timer |
Policy Metrics
Policy metrics provide insights into the performance of the Nomad Autoscaler's policy handling.
Metric | Description | Type | Labels |
---|---|---|---|
nomad-autoscaler.policy.total_num | The number of policies currently held within the autoscaler | Gauge | |
nomad-autoscaler.policy.source.error_count | Tracks the number of errors generated by the policy sources | Counter | policy_source |
Scaling Metrics
Scaling metrics provide insight into the performance of scaling actions as well as overall success and failure counters.
Metric | Description | Type | Labels |
---|---|---|---|
nomad-autoscaler.scale.evaluate_ms | The time taken to evaluate the checks within a single policy | Timer | policy_id, target_name |
nomad-autoscaler.scale.invoke_ms | The time taken to invoke scaling based on the scaling evaluations | Timer | policy_id, target_name |
nomad-autoscaler.scale.invoke.success_count | Tracks the number of successful scaling actions triggered | Counter | |
nomad-autoscaler.scale.invoke.error_count | Tracks the number of unsuccessful scaling actions triggered | Counter |
Plugin Metrics
Plugin metrics provide insight into the performance of Nomad Autoscaler plugins and help identify potential bottle necks or latency issues.
Metric | Description | Type | Labels |
---|---|---|---|
nomad-autoscaler.plugin.manager.access_ms | The time taken to dispense a plugin | Timer | |
nomad-autoscaler.target.status.invoke_ms | The time taken to perform the target plugin status call | Timer | policy_id, plugin_name |
nomad-autoscaler.target.scale.invoke_ms | The time taken to perform the target plugin scale call | Timer | policy_id, plugin_name |
nomad-autoscaler.apm.query.invoke_ms | The time taken to perform the APM plugin query call | Timer | policy_id, plugin_name |
nomad-autoscaler.strategy.run.invoke_ms | The time taken to perform the strategy plugin run call | Timer | policy_id, plugin_name |