Podman Task Driver

Name: podman

Homepage: https://github.com/hashicorp/nomad-driver-podman

The Podman task driver plugin for Nomad uses the Pod Manager (podman) daemonless container runtime for executing Nomad tasks. Podman supports OCI containers and its command line tool is meant to be a drop-in replacement for Docker's.

Due to Podman's similarity to Docker, the example job created by nomad init -short is easily adapted to use Podman instead:

job "redis" {
  datacenters = ["dc1"]
  type        = "service"

  group "cache" {
    network {
      port "redis" { to = 6379 }
    }

    task "redis" {
      driver = "podman"

      config {
        image = "docker://redis"
        ports = ["redis"]
      }
    }
  }
}

Refer to the project's homepage for details.

Client Requirements

The Podman task driver is not builtin to Nomad. It must be downloaded onto the client host in the configured plugin directory.

Nomad 0.12.9+
Linux host with podman installed
For rootless containers you need a system supporting cgroup V2 and a few other things, follow this tutorial.

You need a 3.0.x podman binary and a system socket activation unit, refer to https://www.redhat.com/sysadmin/podmans-new-rest-api.

Nomad agent, nomad-driver-podman and podman will reside on the same client, so you do not have to worry about the ssh aspects of the Podman api.

Ensure that Nomad can find the plugin, refer to plugin_dir.

Capabilities

The podman driver implements the following capabilities.

Feature	Implementation
`nomad alloc signal`	true
`nomad alloc exec`	false
filesystem isolation	image
network isolation	host, group, task, none
volume mounting	none

Task Configuration

args - (Optional) A list of arguments to the optional command. If no command is specified, the arguments are passed directly to the container.
```
config {
  args = [
    "arg1",
    "arg2",
  ]
}
```

auth - (Optional) Authenticate to the image registry using a static credential.

config {
  image = "your.registry.tld/some/image"
  auth {
    username = "someuser"
    password = "sup3rs3creT"
  }
}

cap_add - (Optional) A list of Linux capabilities as strings to pass to --cap-add.
```
config {
  cap_add = [
    "SYS_TIME"
  ]
}
```
cap_drop - (Optional) A list of Linux capabilities as strings to pass to --cap-drop.
```
config {
  cap_drop = [
    "MKNOD"
  ]
}
```
command - (Optional) The command to run when starting the container.
```
config {
  command = "some-command"
}
```
devices - (Optional) A list of host-device[:container-device][:permissions] definitions. Each entry adds a host device to the container. Optional permissions can be used to specify device permissions, it is a combination of r for read, w for write, and m for mknod(2). Refer to Podman's documentation for more details.
```
config {
  devices = [
    "/dev/net/tun"
  ]
}
```
entrypoint - (Optional) The entrypoint for the container. Defaults to the entrypoint set in the image.
```
config {
  entrypoint = "/entrypoint.sh"
}
```
force_pull - (Optional) true or false (default). Always pull the latest image on container start.
```
config {
  force_pull = true
}
```
hostname - (Optional) The hostname to assign to the container. When launching more than one of a task (using count) with this option set, every container the task starts will have the same hostname.
image - The image to run. Accepted transports are docker (default if missing), oci-archive and docker-archive. Images referenced as short-names will be treated according to user-configured preferences.
```
config {
  image = "docker://redis"
}
```
image_pull_timeout - (Optional) Time duration for your pull timeout (default to "5m"). Cannot be longer than the client_http_timeout.
```
config {
  image_pull_timeout = "5m"
}
```
init - (Optional) Run an init inside the container that forwards signals and reaps processes.
```
config {
  init = true
}
```

init_path - (Optional) Path to the container-init binary.

config {
  init = true
  init_path = "/usr/libexec/podman/catatonit"
}

labels - (Optional) Set labels on the container.

config {
  labels = {
    "nomad" = "job"
  }
}

logging - (Optional) Configure logging. Also refer to the plugin option disable_log_collection.
- driver = "nomad" - (Default) Podman redirects its combined stdout/stderr logstream directly to a Nomad fifo. Benefits of this mode are: zero overhead, don't have to worry about log rotation at system or Podman level. Downside: you cannot easily ship the logstream to a log aggregator plus stdout/stderr is multiplexed into a single stream.
```
config {
  logging = {
    driver = "nomad"
  }
}
```
- driver = "journald" - The container log is forwarded from Podman to the journald on your host. Next, it's pulled by the Podman API back from the journal into the Nomad fifo (controllable by disable_log_collection). Benefits: all containers can log into the host journal, you can ship a structured stream including metadata to your log aggregator. No log rotation at Podman level. You can add additional tags to the journal. Drawbacks: a bit more overhead, depends on Journal (will not work on WSL2). You should configure some rotation policy for your Journal. Ensure you're running Podman 3.1.0 or higher because of bugs in older versions.
```
config {
  logging = {
    driver = "journald"
    options = [
      {
        "tag" = "redis"
      }
    ]
  }
}
```
memory_reservation - (Optional) Memory soft limit (units = b (bytes), k (kilobytes), m (megabytes), or g (gigabytes)).
After setting memory reservation, when the system detects memory contention or low memory, containers are forced to restrict their consumption to their reservation. So you should always set the value below --memory, otherwise the hard limit will take precedence. By default, memory reservation will be the same as memory limit.
```
config {
  memory_reservation = "100m"
}
```
memory_swap - (Optional) A limit value equal to memory plus swap. The swap limit should always be larger than the memory value. Unit can be b (bytes), k (kilobytes), m (megabytes), or g (gigabytes). If you don't specify a unit, b is used. Set LIMIT to -1 to enable unlimited swap.
```
config {
  memory_swap = "180m"
}
```
memory_swappiness - Tune a container's memory swappiness behavior. Accepts an integer between 0 and 100.
```
config {
  memory_swappiness = 60
}
```
network_mode - (Optional) Set the network mode for the container. By default the task uses the network stack defined in the task group network block. If the groups network behavior is also undefined, it will fallback to bridge in rootful mode or slirp4netns for rootless containers.
- bridge - (Default for rootful) Create a network stack on the default Podman bridge.
- container:id - Reuse another container's network stack.
- host - Use the Podman host network stack. Note: the host mode gives the container full access to local system services such as D-bus and is therefore considered insecure.
- none - No networking.
- slirp4netns - (Default for rootless) Use slirp4netns to create a user network stack. Podman currently does not support this option for rootful containers (issue).
- task:name-of-other-task: Join the network of another task in the same allocation.
```
config {
  network_mode = "bridge"
}
```
ports - (Optional) Forward and expose ports. Refer to Docker driver configuration for details.
privileged - (Optional) true or false (default). A privileged container turns off the security features that isolate the container from the host. Dropped Capabilities, limited devices, read-only mount points, Apparmor/SELinux separation, and Seccomp filters are all disabled.
readonly_rootfs - (Optional) true or false (default). Mount the rootfs as read-only.
```
config {
  readonly_rootfs = true
}
```
sysctl - (Optional) A key-value map of sysctl configurations to set to the containers on start.
```
config {
  sysctl = {
    "net.core.somaxconn" = "16384"
  }
}
```
tmpfs - (Optional) A list of /container_path strings for tmpfs mount points. Refer to podman run --tmpfs options for details.
```
config {
  tmpfs = [
    "/var"
  ]
}
```
tty - (Optional) true or false (default). Allocate a pseudo-TTY for the container.
volumes - (Optional) A list of host_path:container_path:options strings to bind host paths to container paths. Named volumes are not supported.
```
config {
  volumes = [
    "/some/host/data:/container/data:ro,noexec"
  ]
}
```
working_dir - (Optional) The working directory for the container. Defaults to the default set in the image.
```
config {
  working_dir = "/data"
}
```

ulimit - (Optional) A key-value map of ulimit configurations to set to the containers to start.

config {
  ulimit {
    nproc  = "4242"
    nofile = "2048:4096"
  }
}

Additionally, the Podman driver supports customization of the container's user through the task's user option.

Network Configuration

Nomad lifecycle hooks combined with the drivers network_mode allows very flexible network namespace definitions. This feature does not build upon the native Podman pod structure but simply reuses the networking namespace of one container for other tasks in the same group.

A typical example is a network server and a metric exporter or log shipping sidecar. The metric exporter needs access to a private monitoring port which should not be exposed to the network and thus is usually bound to localhost.

The nomad-driver-podman repository includes three different examples jobs for such a setup. All of them will start a nats server and a prometheus-nats-exporter using different approaches.

You can use curl to prove that the job is working correctly and that you can get Prometheus metrics:

$ curl http://your-machine:7777/metrics

2 Task setup, server defines the network

Reference examples/jobs/nats_simple_pod.nomad.

Here, the server task is started as main workload and the exporter runs as a poststart sidecar. Because of that, Nomad guarantees that the server is started first and thus the exporter can easily join the servers network namespace via network_mode = "task:server".

Note, that the server configuration file binds the http_port to localhost.

Be aware that ports must be defined in the parent network namespace, here server.

3 Task setup, a pause container defines the network

Reference examples/jobs/nats_pod.nomad.

A slightly different setup is demonstrated in this job. It reassembles more closesly the idea of a pod by starting a pause task, named pod via a prestart sidecar hook.

Next, the main workload, server is started and joins the network namespace by using the network_mode = "task:pod" block. Finally, Nomad starts the poststart sidecar exporter which also joins the network.

Note that all ports must be defined on the pod level.

2 Task setup, shared Nomad network namespace

Reference examples/jobs/nats_group.nomad.

This example is very different. Both server and exporter join a network namespace which is created and managed by Nomad itself. Refer to Nomad's network block to get started with this generic approach.

Plugin Options

The Podman plugin has options which may be customized in the agent's configuration file.

disable_log_collection (bool: false) - Setting this to true will disable Nomad logs collection of Podman tasks. If you don't rely on Nomad log capabilities and exclusively use host based log aggregation, you may consider this option to disable Nomad log collection overhead. Beware that you also lose automatic log rotation.
```
plugin "nomad-driver-podman" {
  config {
    disable_log_collection = false
  }
}
```
gc block:
- container - Defaults to true. This option can be used to disable Nomad from removing a container when the task exits.
```
plugin "nomad-driver-podman" {
  config {
    gc {
      container = false
    }
  }
}
```
recover_stopped - Defaults to true. Allows the driver to start and reuse a previously stopped container after a Nomad client restart. Consider a simple single node system and a complete reboot. All previously managed containers will be reused instead of disposed and recreated.
```
plugin "nomad-driver-podman" {
  config {
    recover_stopped = false
  }
}
```
socket_path (string) - Defaults to unix://run/podman/io.podman when running as root or a cgroup V1 system, and unix://run/user/<USER_ID>/podman/io.podman for rootless cgroup V2 systems.
disable_log_collection (bool: false) - Setting this to true will disable Nomad logs collection of Podman tasks. If you don't rely on Nomad log capabilities and exclusively use host based log aggregation, you may consider this option to disable Nomad log collection overhead. Beware to you also lose automatic log rotation.
```
plugin "nomad-driver-podman" {
  config {
    disable_log_collection = false
  }
}
```

client_http_timeout (string: "60s") - Default timeout used by http.Client requests.

plugin "nomad-driver-podman" {
  config {
    client_http_timeout = "60s"
  }
}

volumes block:
- enabled - Defaults to true. Allows tasks to bind host paths (volumes) inside their container.
- selinuxlabel - Allows the operator to set a SELinux label to the allocation and task local bind-mounts to containers. If used with volumes.enabled set to false, the labels will still be applied to the standard binds in the container.
```
plugin "nomad-driver-podman" {
  config {
    volumes {
      enabled      = true
      selinuxlabel = "z"
    }
  }
}
```