Nomad
Inspect running jobs
A successful job submission is not an indication of a successfully-running job. This is the nature of a highly-optimistic scheduler. A successful job submission means the server was able to issue the proper scheduling commands. It does not indicate the job is actually running. To verify the job is running and healthy, you might need to inspect its state.
This section will utilize the job named "docs" from the previous sections, but these operations and command largely apply to all jobs in Nomad.
Query the job status
After a job is submitted, you can query the status of that job using the job status command:
$ nomad job status
ID Type Priority Status
docs service 50 running
At a high level, you can observe that the job is currently running, but what does "running" actually mean. By supplying the name of a job to the job status command, you can ask Nomad for more detailed job information:
$ nomad job status docs
ID = docs
Name = docs
Type = service
Priority = 50
Datacenters = dc1
Status = running
Periodic = false
Summary
Task Group Queued Starting Running Failed Complete Lost
example 0 0 3 0 0 0
Allocations
ID Eval ID Node ID Task Group Desired Status Created At
04d9627d 42d788a3 a1f934c9 example run running <timestamp>
e7b8d4f5 42d788a3 012ea79b example run running <timestamp>
5cbf23a1 42d788a3 1e1aa1e0 example run running <timestamp>
This output shows that there are three instances of this task running, each with
its own allocation. For more information on the status
command, please consult
the nomad job status
command documentation.
Fetch an evaluation's status
You can think of an evaluation as a submission to the scheduler. An example below shows status output for a job where some allocations were placed successfully, but did not have enough resources to place all of the desired allocations.
If you issue the status command with the -evals
flag, the output will show
that there is an outstanding evaluation for this hypothetical job:
$ nomad job status -evals docs
ID = docs
Name = docs
Type = service
Priority = 50
Datacenters = dc1
Status = running
Periodic = false
Evaluations
ID Priority Triggered By Status Placement Failures
5744eb15 50 job-register blocked N/A - In Progress
8e38e6cf 50 job-register complete true
Placement Failure
Task Group "example":
* Resources exhausted on 1 nodes
* Dimension "cpu" exhausted on 1 nodes
Allocations
ID Eval ID Node ID Task Group Desired Status Created At
12681940 8e38e6cf 4beef22f example run running <timestamp>
395c5882 8e38e6cf 4beef22f example run running <timestamp>
4d7c6f84 8e38e6cf 4beef22f example run running <timestamp>
843b07b8 8e38e6cf 4beef22f example run running <timestamp>
a8bc6d3e 8e38e6cf 4beef22f example run running <timestamp>
b0beb907 8e38e6cf 4beef22f example run running <timestamp>
da21c1fd 8e38e6cf 4beef22f example run running <timestamp>
The output states that the job has a "blocked" evaluation that is in progress. When Nomad can not place all the desired allocations, it creates a blocked evaluation that waits for more resources to become available.
The eval status
command enables examination of any evaluation in more detail.
For the most part this should never be necessary. However, it can be useful to
understand what triggered a specific evaluation and it's current status. Running
it on the "complete" evaluation provides output similar to the following:
$ nomad eval status 8e38e6cf
ID = 8e38e6cf
Status = complete
Status Description = complete
Type = service
TriggeredBy = job-register
Job ID = docs
Priority = 50
Placement Failures = true
Failed Placements
Task Group "example" (failed to place 3 allocations):
* Resources exhausted on 1 nodes
* Dimension "cpu" exhausted on 1 nodes
Evaluation "5744eb15" waiting for additional capacity to place remainder
This output indicates that the evaluation was created by a "job-register" event and that it had placement failures. The evaluation also has the information on why placements failed. Also output is the evaluation of any follow-up evaluations created.
If you would like to learn more about this output, consult the documentation for
nomad eval status
command.
Retrieve an allocation's status
You can think of an allocation as an instruction to schedule. Like an
application or service, an allocation has logs and state. The alloc status
command gives the most recent events that occurred for a task, its resource
usage, port allocations and more:
$ nomad alloc status 04d9627d
ID = 04d9627d
Eval ID = 42d788a3
Name = docs.example[2]
Node ID = a1f934c9
Job ID = docs
Client Status = running
Task "server" is "running"
Task Resources
CPU Memory Disk Addresses
0/100 MHz 728 KiB/10 MiB 300 MiB http: 10.1.1.196:5678
Recent Events:
Time Type Description
10/09/16 00:36:06 UTC Started Task started by client
10/09/16 00:36:05 UTC Received Task received by client
The nomad alloc status
command is a good starting to point for debugging an
application that did not start. Hypothetically assume a user meant to start a
Docker container named "redis:2.8", but accidentally put a comma instead of a
period, typing "redis:2,8".
When the job is executed, it produces a failed allocation. The nomad alloc status
command will give the reason why.
$ nomad alloc status 04d9627d
ID = 04d9627d
...
Recent Events:
Time Type Description
06/28/16 15:50:22 UTC Not Restarting Error was unrecoverable
06/28/16 15:50:22 UTC Driver Failure failed to create image: Failed to pull `redis:2,8`: API error (500): invalid tag format
06/28/16 15:50:22 UTC Received Task received by client
Unfortunately not all failures are as visible in the allocation status output.
If the alloc status
command shows many restarts, there is likely an
application-level issue during start up. For example:
$ nomad alloc status 04d9627d
ID = 04d9627d
...
Recent Events:
Time Type Description
06/28/16 15:56:16 UTC Restarting Task restarting in 5.178426031s
06/28/16 15:56:16 UTC Terminated Exit Code: 1, Exit Message: "Docker container exited with non-zero exit code: 1"
06/28/16 15:56:16 UTC Started Task started by client
06/28/16 15:56:00 UTC Restarting Task restarting in 5.00123931s
06/28/16 15:56:00 UTC Terminated Exit Code: 1, Exit Message: "Docker container exited with non-zero exit code: 1"
06/28/16 15:55:59 UTC Started Task started by client
06/28/16 15:55:48 UTC Received Task received by client
To debug these failures, you can use the nomad alloc logs
command, which is
discussed in the accessing logs section of this documentation.
For more information on the alloc status
command, please consult the
documentation for the nomad alloc status
command.