Nomad
multiregion Block
Placement | job -> multiregion |
Enterprise
This feature requires Nomad Enterprise(opens in new tab).
The multiregion
block specifies that a job will be deployed to multiple
federated regions. If omitted, the job will be deployed to a single region—the
one specified by the region
field or the -region
command line flag to
nomad job run
.
Federated Nomad clusters are members of the same gossip cluster but not the
same raft cluster; they don't share their data stores. Each region in a
multiregion deployment gets an independent copy of the job, parameterized with
the values of the region
block. Nomad regions coordinate to rollout each
region's deployment using rules determined by the strategy
block.
job "docs" {
multiregion {
strategy {
max_parallel = 1
on_failure = "fail_all"
}
region "west" {
count = 2
datacenters = ["west-1"]
meta {
my-key = "my-value-west"
}
}
region "east" {
count = 5
datacenters = ["east-1", "east-2"]
meta {
my-key = "my-value-east"
}
}
}
}
Multiregion Deployment States
A single region deployment using one of the various upgrade strategies
begins in the running
state, and ends in the successful
state, the
canceled
state (if another deployment supersedes it before it it's
complete), or the failed
state. A failed single region deployment may
automatically revert to the previous version of the job if its update
block has the auto_revert
setting.
In a multiregion deployment, regions begin in the pending
state. This allows
Nomad to determine that all regions have accepted the job before
continuing. At this point up to max_parallel
regions will enter running
at
a time. When each region completes its local deployment, it enters a blocked
state where it waits until the last region has completed the deployment. The
final region will unblock the regions to mark them as successful
.
Parameterized Dispatch
Job dispatching is region specific. While a parameterized job can be
registered in multiple federated regions like any other job, a parameterized
job operates much like a function definition that takes variable input.
Operators are expected to invoke the job by invoking job dispatch
from the CLI or the HTTP API and provide the appropriate dispatch options
for that region.
Periodic Time Zones
Multiregion periodic jobs share time zone configuration, with UTC being the default. Operators should be mindful of this when registering multiregion jobs. For example, a periodic configuration that specifies the job should run every night at midnight New York time, may result in an undesirable execution time if one of the target regions is set to Tokyo time.
multiregion
Parameters
strategy
(Strategy: nil)
- Specifies a rollout strategy for the regions.region
(Region: nil)
- Specifies the parameters for a specific region. This can be specified multiple times to define the set of regions for the multiregion deployment. Regions are ordered; depending on the rollout strategy Nomad may roll out to each region in order or to several at a time.
Note: Regions can be added, but regions that are removed will not be stopped and will be ignored by the deployment. This behavior may change before multiregion deployments are considered GA.
strategy
Parameters
max_parallel
(int: <optional>)
- Specifies the maximum number of region deployments that a multiregion will have in a running state at a time. By default, Nomad will deploy all regions simultaneously.on_failure
(string: <optional>)
- Specifies the behavior when a region deployment fails. Available options are"fail_all"
,"fail_local"
, or the default (empty""
). This field and its interactions with the job'supdate
block is described in the examples below.Each region within a multiregion deployment follows the
auto_revert
strategy of its ownupdate
block (if any). The multiregionon_failure
field tells Nomad how many other regions should be marked as failed when one region's deployment fails:The default behavior is that the failed region and all regions that come after it in order are marked as failed.
If
on_failure: "fail_all"
is set, all regions will be marked as failed. If all regions have already completed their deployments, it's possible that a region may transition fromblocked
tosuccessful
while another region is failing. This successful region cannot be rolled back.If
on_failure: "fail_local"
is set, only the failed region will be marked as failed. The remaining regions will move on toblocked
status. At this point, you'll need to manually unblock regions to mark them successful with thenomad deployment unblock
command or correct the conditions that led to the failure and resubmit the job.
For system
jobs, only max_parallel
is enforced. The
system
scheduler will be updated to support on_failure
when the
update
block is fully supported for system jobs in a future release.
region
Parameters
The name of a region must match the name of one of the federated regions.
count
(int: <optional>)
- Specifies a count override for task groups in the region. If a task group specifies acount = 0
, its count will be replaced with this value. If a task group specifies its owncount
or omits thecount
field, this value will be ignored. This value must be non-negative.datacenters
(array<string>: <optional>)
- A list of datacenters in the region which are eligible for task placement. If not provided, thedatacenters
field of the job will be used.meta
-Meta: nil
- The meta block allows for user-defined arbitrary key-value pairs. The meta specified for each region will be merged with the meta block at the job level.
As described above, the parameters for each region replace the default values for the field with the same name for each region.
multiregion
Examples
The following examples only show the multiregion
block and the other
blocks it might be interacting with.
Max Parallel
This example shows the use of max_parallel
. This job will deploy first to
the "north" and "south" regions. If either "north" finishes and enters the
blocked
state, then "east" will be next. At most 2 regions will be in a
running
state at any given time.
multiregion {
strategy {
max_parallel = 2
}
region "north" {}
region "south" {}
region "east" {}
region "west" {}
}
Rollback Regions
This example shows the default value of on_failure
. Because max_parallel = 1
,
the "north" region will deploy first, followed by "south", and so on. But
supposing the "east" region failed, both the "east" region and the "west"
region would be marked failed
. Because the job has an update
block with
auto_revert=true
, both regions would then rollback to the previous job
version. The "north" and "south" regions would remain blocked
until an
operator intervenes.
multiregion {
strategy {
on_failure = ""
max_parallel = 1
}
region "north" {}
region "south" {}
region "east" {}
region "west" {}
}
update {
auto_revert = true
}
Override Counts
This example shows how the count
field override the default count
of the
task group. The job the deploys 2 "worker" and 1 "controller" allocations to
the "west" region, and 5 "worker" and 1 "controller" task groups to the "east"
region.
multiregion {
region "west" {
count = 2
}
region "east" {
count = 5
}
}
}
group "worker" {
count = 0
}
group "controller" {
count = 1
}
Merging Meta
This example shows how the meta
is merged with the meta
field of the job,
group, and task. A task in "west" will have the values
first-key="regional-west"
, second-key="group-level"
, whereas a task in
"east" will have the values first-key="job-level"
,
second-key="group-level"
.
multiregion {
region "west" {
meta {
first-key = "regional-west"
second-key = "regional-west"
}
}
region "east" {
meta {
second-key = "regional-east"
}
}
}
}
meta {
first-key = "job-level"
}
group "worker" {
meta {
second-key = "group-level"
}
}