Vault
Raft telemetry
Raft telemetry provides information on Vault integrated storage.
Default metrics
vault.raft.apply
Metric type | Value | Description |
---|---|---|
counter | number | Number of transactions in the configured interval |
The vault.raft.apply
metric is generally a good indicator of the write load
on your raft internal storage.
vault.raft.barrier
Metric type | Value | Description |
---|---|---|
counter | number | Number of times the node started the barrier |
A node starts the barrier by issuing a blocking call when it wants to ensure that all pending operations that need to be applied to the finite state machine are properly queued.
vault.raft.candidate.electSelf
Metric type | Value | Description |
---|---|---|
summary | ms | Time required for a node to send a vote request to a peer |
vault.raft.commitNumLogs
Metric type | Value | Description |
---|---|---|
gauge | number | Number of logs processed for application to the finite state machine in a single batch |
vault.raft.commitTime
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to commit a new entry to the raft log on the leader node |
vault.raft.compactLogs
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to trim unnecessary logs |
vault.raft.fsm.apply
Metric type | Value | Description |
---|---|---|
summary | number | Number of logs committed by the finite state machine since the last interval |
vault.raft.fsm.applyBatch
Metric type | Value | Description |
---|---|---|
summary | ms | Time required by the finite state machine to apply the most recent batch of logs |
vault.raft.fsm.applyBatchNum
Metric type | Value | Description |
---|---|---|
counter | number | Number of logs applied in the most recent batch |
vault.raft.fsm.enqueue
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to queue up a batch of logs for the finite state machine to apply |
vault.raft.fsm.restore
Metric type | Value | Description |
---|---|---|
summary | ms | Time required by the finite state machine to complete a restore operation from a snapshot |
vault.raft.fsm.snapshot
Metric type | Value | Description |
---|---|---|
summary | ms | Time required by the finite state machine to record state information for the current snapshot |
vault.raft.fsm.store_config
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to store the most recent raft configuration |
vault.raft.get
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to retrieve an entry from underlying storage |
vault.raft.list
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to retrieve a list of keys from underlying storage |
vault.raft.peers
Metric type | Value | Description |
---|---|---|
guage | number | The number of peers in the raft cluster configuration |
vault.raft.restore
Metric type | Value | Description |
---|---|---|
counter | number | Number of times that the node performed a restore operation |
In the context of raft storage, a restore operation refers to the process where raft consumes an external snapshot to restore its state.
vault.raft.restoreUserSnapshot
Metric type | Value | Description |
---|---|---|
timer | ms | Time required to restore the finite state machine from a user snapshot |
vault.raft.rpc.appendEntries
Metric type | Value | Description |
---|---|---|
timer | ms | Time required to process a remote appendEntries call from a node |
vault.raft.rpc.appendEntries.processLogs
Metric type | Value | Description |
---|---|---|
timer | ms | Time required to completely process the outstanding logs for the given node |
vault.raft.rpc.appendEntries.storeLogs
Metric type | Value | Description |
---|---|---|
timer | ms | Time required to record any outstanding logs since the last request to append entries for the given node |
vault.raft.rpc.installSnapshot
Metric type | Value | Description |
---|---|---|
timer | ms | Time required to process an installSnapshot RPC call |
Only nodes currently in the follower
state report
vault.raft.rpc.installSnapshot
metrics.
vault.raft.rpc.processHeartbeat
Metric type | Value | Description |
---|---|---|
timer | ms | Time required to process a heartbeat request |
vault.raft.rpc.requestVote
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to complete a requestVote call |
vault.raft.snapshot.create
Metric type | Value | Description |
---|---|---|
timer | ms | Time required to capture a new snapshot |
vault.raft.snapshot.persist
Metric type | Value | Description |
---|---|---|
timer | ms | Time required to record snapshot meta information to disk while taking snapshots |
vault.raft.snapshot.takeSnapshot
Metric type | Value | Description |
---|---|---|
timer | ms | Total time required to create and persist the current snapshot |
In most cases, vault.raft.snapshot.takeSnapshot
is approximately equal to
vault.raft.snapshot.create + vault.raft.snapshot.persist
.
vault.raft.state.candidate
Metric type | Value | Description |
---|---|---|
counter | number | Number of times the raft server initiated an election |
vault.raft.state.follower
Metric type | Value | Description |
---|---|---|
summary | number | Number of times in the configured interval that the raft server became a follower |
Nodes transition to follower
state under the following conditions:
- when the node joins the cluster
- when a leader is elected, but the node was not elected leader
vault.raft.state.leader
Metric type | Value | Description |
---|---|---|
counter | number | Number of times the raft server became a leader |
vault.raft.transition.heartbeat_timeout
Metric type | Value | Description |
---|---|---|
summary | number | Number of times that the node transitioned to candidate state after not receiving a heartbeat message from the last known leader |
vault.raft.transition.leader_lease_timeout
Metric type | Value | Description |
---|---|---|
counter | number | The number of times the leader could not contact a quorum of nodes and therefore stepped down |
vault.raft.verify_leader
Metric type | Value | Description |
---|---|---|
counter | number | Number of times in the configured interval that the node confirmed it is still the leader |
Autopilot metrics
Note
Autopilot only runs on the active node, so autopilot metrics are only captured for the current active node.vault.autopilot.failure_tolerance
Metric type | Value | Description |
---|---|---|
gauge | nodes | The number of healthy nodes in excess of quorum |
The failure tolerance indicates how many currently healthy nodes can fail without losing quorum.
vault.autopilot.healthy
Metric type | Value | Description |
---|---|---|
gauge | boolean | Indicates whether all nodes are healthy |
- A value of
1
on the gauge means that Autopilot deems all nodes healthy. - A value of
0
on the gauge means that Autopilot deems at least 1 node unhealthy.
vault.autopilot.node.healthy
Metric type | Value | Description |
---|---|---|
gauge | boolean | Indicates whether the active node is healthy |
- A value of
1
on the gauge means that Autopilot deems the node indicated bynode_id
is healthy. - A value of
0
on the gauge means that Autopilot cannot communicate with the node indicated bynode_id
, or deems the node unhealthy.
Leadership change metrics
Leadership change metrics indicate the overall performance of the integrated storage on raft servers and the network connection between raft nodes.
vault.raft.leader.dispatchLog
Metric type | Value | Description |
---|---|---|
timer | ms | Time required for the leader node to write a log entry to disk |
vault.raft.leader.dispatchNumLogs
Metric type | Value | Description |
---|---|---|
gauge | number | Number of logs committed to disk in the most recent batch |
vault.raft.leader.lastContact
Metric type | Value | Description |
---|---|---|
summary | ms | Time since the leader was last able to contact the follower nodes when checking its leader lease |
Raft replication metrics
vault.raft.replication.appendEntries.log
Metric type | Value | Description |
---|---|---|
summary | number | Number of logs replicated to a node to establish parity with leader logs |
vault.raft.replication.appendEntries.rpc
Metric type | Value | Description |
---|---|---|
timer | ms | Time required to replicate leader node log entries to all follower nodes with appendEntries |
vault.raft.replication.heartbeat
Metric type | Value | Description |
---|---|---|
timer | ms | Time required to invoke appendEntries on a peer so the peer does not time out |
vault.raft.replication.installSnapshot
Metric type | Value | Description |
---|---|---|
timer | ms | Time required to process an installSnapshot RPC call |
Only nodes currently in the follower
state report
vault.raft.replication.installSnapshot
metrics.
Storage metrics
vault.raft_storage.bolt.cursor.count
Metric type | Value | Description |
---|---|---|
gauge | number | Number of cursors created in the Bolt database |
vault.raft_storage.bolt.freelist.allocated_bytes
Metric type | Value | Description |
---|---|---|
gauge | bytes | Total space allocated for the freelist for the Bolt database |
vault.raft_storage.bolt.freelist.free_pages
Metric type | Value | Description |
---|---|---|
gauge | number | Number of free pages in the freelist for the Bolt database |
vault.raft_storage.bolt.freelist.pending_pages
Metric type | Value | Description |
---|---|---|
gauge | number | Number of pending pages in the freelist for the Bolt database |
vault.raft_storage.bolt.freelist.used_bytes
Metric type | Value | Description |
---|---|---|
gauge | bytes | Total space used by the freelist for the Bolt database |
vault.raft_storage.bolt.node.count
Metric type | Value | Description |
---|---|---|
gauge | number | Number of node allocations for the Bolt database |
vault.raft_storage.bolt.node.dereferences
Metric type | Value | Description |
---|---|---|
gauge | number | Total number of node dereferences by the Bolt database |
vault.raft_storage.bolt.page.bytes_allocated
Metric type | Value | Description |
---|---|---|
gauge | bytes | Total space allocated to the Bolt database |
vault.raft_storage.bolt.page.count
Metric type | Value | Description |
---|---|---|
gauge | number | Number of page allocations in the Bolt database |
vault.raft_storage.bolt.rebalance.count
Metric type | Value | Description |
---|---|---|
gauge | number | Number of node rebalances performed by the Bolt database |
vault.raft_storage.bolt.rebalance.time
Metric type | Value | Description |
---|---|---|
summary | ms | Time required by the Bolt database to rebalance nodes |
vault.raft_storage.bolt.spill.count
Metric type | Value | Description |
---|---|---|
gauge | number | Number of nodes spilled by the Bolt database |
vault.raft_storage.bolt.spill.time
Metric type | Value | Description |
---|---|---|
summary | ms | Total time spent spilling by the Bolt database |
vault.raft_storage.bolt.split.count
Metric type | Value | Description |
---|---|---|
gauge | number | Number of nodes split by the Bolt database |
vault.raft_storage.bolt.transaction.currently_open_read_transactions
Metric type | Value | Description |
---|---|---|
gauge | number | Number of in-process read transactions for the Bolt DB |
vault.raft_storage.bolt.transaction.started_read_transactions
Metric type | Value | Description |
---|---|---|
gauge | number | Number of read transactions started by the Bolt DB |
vault.raft_storage.bolt.write.count
Metric type | Value | Description |
---|---|---|
gauge | number | Number of writes performed by the Bolt database |
vault.raft_storage.bolt.write.time
Metric type | Value | Description |
---|---|---|
counter | ms | Total cumulative time the Bolt database has spent writing to disk. |
vault.raft_storage.follower.applied_index_delta
Metric type | Value | Description |
---|---|---|
gauge | number | The difference between the index applied by the leader and the index applied by the follower as reported by echoes |
vault.raft_storage.follower.last_heartbeat_ms
Metric type | Value | Description |
---|---|---|
gauge | ms | Time since the follower last received a heartbeat request |
vault.raft_storage.stats.applied_index
Metric type | Value | Description |
---|---|---|
gauge | number | Highest index of raft log last applied to the finite state machine or added to fsm_pending queue |
vault.raft_storage.stats.commit_index
Metric type | Value | Description |
---|---|---|
gauge | number | Index of the last raft log committed to disk on the node |
vault.raft_storage.stats.fsm_pending
Metric type | Value | Description |
---|---|---|
gauge | number | Number of raft logs queued by the node for the finite state machine to apply |
vault.raft-storage.delete
Metric type | Value | Description |
---|---|---|
timer | ms | Time required to insert log entry to delete path |
vault.raft-storage.entry_size
Metric type | Value | Description |
---|---|---|
summary | bytes | The total size of a raft entry during log application |
vault.raft-storage.get
Metric type | Value | Description |
---|---|---|
timer | ms | Time required to retrieve a value for the given path from the finite state machine |
vault.raft-storage.list
Metric type | Value | Description |
---|---|---|
timer | ms | Time required to list all entries under the prefix from the finite state machine |
vault.raft-storage.put
Metric type | Value | Description |
---|---|---|
timer | ms | Time required to insert a log entry to the persist path |
vault.raft-storage.transaction
Metric type | Value | Description |
---|---|---|
timer | ms | Time required to insert operations into a single log |
Write-ahead logging (WAL) metrics
Metric type | Value | Description |
---|---|---|
counter | number | Number of log entries that have been truncated from the head. |
Counts the number of log entries truncated from the head (i.e. the oldest entries).
If you track the rate of change in head truncations over time, individual truncate calls appear as spikes.
Metric type | Value | Description |
---|---|---|
counter | number | Number of log entries that have been truncated from the tail |
Counts the number of log entries truncated from the tail (i.e. the newest entries).
If you track the rate of change in tail truncations over time, individual truncate calls appear as spikes.
Metric type | Value | Description |
---|---|---|
counter | number | Number of calls to GetLog() |
Metric type | Value | Description |
---|---|---|
counter | number | Number of entries written |
Metric type | Value | Description |
---|---|---|
counter | number | Number of bytes of log entries read from segments before decoding. |
The log-entry-bytes-read
counter is technically an overestimate because it
includes bytes from headers, index entries, and secondary reads for entries
too large to fit in buffers.
Metric type | Value | Description |
---|---|---|
counter | number | Number of bytes of log entry after encoding with Codec. |
The log-entry-bytes-written
counter is technically an overestimate because it
includes bytes from headers and index entries.
Metric type | Value | Description |
---|---|---|
counter | number | Number of calls to StableStore.Get() or GetUint64() |
Metric type | Value | Description |
---|---|---|
counter | number | Number of calls to StableStore.Set() or SetUint64() |
Metric type | Value | Description |
---|---|---|
counter | number | Number of calls to StoreLog() |
Counts the number of entry batches appended to the log with calls to StoreLog()
.
Metric type | Value | Description |
---|---|---|
counter | number | Number of times Vault moves to a new segment file |
Metric type | Value | Description |
---|---|---|
gauge | seconds | Number of seconds between segment creation and seal. |
The last-segment-age-seconds
gauge shows the number of seconds between when a
segment is created and when it is sealed. The gauge resets each time Vault
rotates a segment and provides a rough estimate of how quickly writes are
filling the disk.