Schedule edge Services with native service discovery

25min
|
Nomad
Terraform
Packer

Edge computing lets organizations run workloads closer to their users. This proximity unlocks several benefits:

Decreased latency. Data does not need to travel to distant data centers for processing. This decreases network latency which provides a better user experience. This benefit is crucial for CDN providers and online game servers.
Privacy and Compliance. Edge computing increases privacy by storing and processing user data close to the user, ensuring data doesn't leave the geographic region. This benefit is especially important for regulated industries like healthcare and financial services and with regulations like GDPR.
Smart device fleet management. Edge computing lets you collect data, monitor, and control internet of things (IoT) devices and sensors. This benefit is useful for any industries that need to manage fleets of remote devices, like agriculture, manufacturers, and more.

However, when organizations adopt edge computing, they run into challenges like managing heterogeneous devices (different processors, operating systems, etc), resource constrained devices, and intermittent connectivity.

Nomad addresses these challenges, making it an attractive edge orchestrator. The Nomad client agent is a single binary with a small footprint, limited resource consumption, and the ability to run on different types of devices. In addition, Nomad supports geographically distant clients, which means a Nomad server cluster does not need to run near the client.

Since Nomad 1.3, native service discovery simplifies connecting Nomad tasks where you cannot use a single service mesh and removes the need to manage a separate Consul cluster. Nomad's native service discovery also removes the need to install a Consul agent on each edge device. This reduces Nomad's resource footprint even further, so you can run and support more workloads on the edge. Additionally, disconnected client allocations reconnect gracefully, handling situations when edge devices experience network latency or temporary connectivity loss.

In this tutorial, you will deploy a single Nomad server cluster with distant clients edge architecture in two AWS regions. One region, representing an on-premise data center, will host the Nomad server cluster and one client. The other region, representing the edge data center, will host two Nomad clients. Then, you will schedule HashiCups, a demo application, on both on-prem and edge data centers, connecting its services with Nomad's native service discovery. Finally, you will simulate unstable network connectivity between the Nomad clients and the server to test how Nomad handles client disconnection and reconnection. In the process, you will learn how these features make Nomad an ideal edge scheduler.

Nomad Edge single server cluster and distant client
architecture.

HashiCups overview

HashiCups is a demo application that lets you view and order customized HashiCorp branded coffee. The HashiCups application consists of a frontend React application and multiple backend services. The HashiCups backend consists of a GraphQL backend (public-api), products API (product-api), a Postgres database, and a payments API (payment-api).The product-api connects to both the public-api and database to store and return information about HashiCups coffees, users, and orders.

HashiCups frontend and backend
services

You will deploy the HashiCups application to two Nomad data centers. The primary data center will host the HashiCups database and product API. The edge data center will host the remaining HashiCups backend (public API, payments API) and the frontend (frontend and NGINX reverse proxy). This architecture decreases latency for users by placing the frontend services closer to them. In addition, sensitive payment information remains on the edge — HashiCups does not need to send this data to the primary data center, reducing potential attack surfaces.

HashiCups services deployed in primary and edge data
centers.

Prerequisites

The tutorial assumes that you are familiar with Nomad. If you are new to Nomad itself, refer first to the Get Started tutorials.

For this tutorial, you will need:

Packer 1.8 or later installed locally.
Terraform 1.1.7 or later installed locally.
Nomad 1.3 or later installed locally.
An AWS account with credentials set as local environment variables

Note

This tutorial creates AWS resources that may not qualify as part of the AWS free tier. Be sure to follow the Cleanup process at the end so you don't incur any additional unnecessary charges.

Clone the example repository

In your terminal, clone the example repository. This repository contains all the Terraform, Packer, and Nomad configuration files you will need to complete this tutorial.

$ git clone https://github.com/hashicorp-education/learn-nomad-edge

Navigate to the cloned repository.

$ cd learn-nomad-edge

Now, checkout the tagged version verified for this tutorial.

$ git checkout tags/v1.0.0

Create SSH key

Later in this tutorial, you will need to connect to your Nomad agent to bootstrap ACLs.

Create a local SSH key to pair with the terraform user so you can securely connect to your Nomad agents.

Generate a new SSH key called learn-nomad-edge. The argument provided with the -f flag creates the key in the current directory and creates two files called learn-nomad-edge and learn-nomad-edge.pub. Change the placeholder email address to your email address.

$ ssh-keygen -t rsa -C "your_email@example.com" -f ./learn-nomad-edge

When prompted, press enter to leave the passphrase blank on this key.

Review and build Nomad images

Navigate to the packer directory.

$ cd packer

This directory contains all the files used to build AMIs in the us-east-2 and us-west-1 AWS regions that contain the Nomad 1.5.3 binary and your previously created SSH public key.

$ tree
.
├── config
│   ├── nomad.hcl
│   ├── nomad-acl-user.hcl
│   ├── nomad-client.hcl
│   └── nomad.service
└── scripts
│   ├── client.sh
│   ├── server.sh
│   └── setup.sh
└── nomad.pkr.hcl

The config directory contains configuration files for the Nomad agents.
- The nomad.hcl file configures the Nomad servers. Since the primary and edge data centers are on different networks, the server must advertise its public IP address so the Nomad clients can successfully connect to the server cluster.
  packer/config/nomad.hcl
```
advertise {
http = "IP_ADDRESS:4646"
rpc  = "IP_ADDRESS:4647"
serf = "IP_ADDRESS:4648"
}
```
  The scripts/server.sh script will replace the placeholders (IP_ADDRESS, SERVER_COUNT, and RETRY_JOIN) when the server starts. The Nomad servers also have ACL enabled.
- The nomad-acl-user.hcl file defines the ACL policies.
- The nomad-client.hcl file configures the Nomad clients. Since the primary and edge data centers are on different networks, the client must advertise its public IP address so the Nomad clients can successfully connect to the other Nomad clients.
  packer/config/nomad.hcl
```
advertise {
http = "IP_ADDRESS:4646"
rpc  = "IP_ADDRESS:4647"
serf = "IP_ADDRESS:4648"
}
```
- The scripts/client.sh script will replace the placeholders (DATACENTER, SERVER_NAME, and RETRY_JOIN) when the client starts. The Nomad clients also have ACL enabled.
- The nomad.service defines a systemd process. This makes it easier to start, stop, and restart Nomad on the agents.
The scripts directory contains helper scripts. The setup.sh script creates the terraform user, adds the public SSH key, and installs Nomad 1.5.3 and Docker. The client.sh and server.sh scripts configure their respective Nomad agents.
The nomad.pkr.hcl Packer template file defines the AMIs. It uses the scripts/setup.sh to set up Nomad agents on an Ubuntu 20.04 image.

Build Nomad images

Initialize Packer to retrieve the required plugins.

$ packer init nomad.pkr.hcl

Build the image.

$ packer build nomad.pkr.hcl
## ...
Build 'amazon-ebs.nomad-secondary' finished after 3 minutes 42 seconds.
Build 'amazon-ebs.nomad-primary' finished after 8 minutes 52 seconds.
==> Wait completed after 8 minutes 52 seconds
==> Builds finished. The artifacts of successful builds are:
--> amazon-ebs.nomad-secondary: AMIs were created:
us-west-1: ami-0183bb1e3ab40da53
--> amazon-ebs.nomad-primary: AMIs were created:
us-east-2: ami-08a59e91d881df603

Packer will display the two AMIs. You will use these AMIs in the next section to deploy the Nomad server cluster and clients.

Review and deploy Nomad cluster and clients

Navigate to the cloned repository's root directory. This directory contains Terraform configuration to deploy all the resources you will use in this tutorial.

$ cd ..

Open main.tf. This file contains the Terraform configuration to deploy the underlying shared resources and Nomad agents to the two AWS regions through the single server cluster and distant client (SCDC) edge architecture. As opposed to deploying a Nomad server cluster at every edge location, this edge architecture is simpler, scalable, has a smaller resource consumption footprint, and avoids server federation. However, it requires more client to server connection configuration, especially around heartbeats and unstable connectivity.

Nomad Edge single server cluster and distant client
architecture.

The primary_shared_resources and edge_shared_resources modules use the shared-resources module to deploy a VPC, security groups, and IAM roles into their respective regions.

The primary_nomad_servers module uses the nomad-server module to deploy a three node Nomad server cluster in the primary data center (us-east-2). Notice that it uses var.primary_ami for its AMI.

module "primary_nomad_servers" {
  source = "./nomad-server"
  region = "us-east-2"

  ## ...

  ami                  = var.primary_ami
  server_instance_type = "t2.micro"
  server_count         = 3
}

The primary_nomad_clients module uses the nomad-client module to deploy two Nomad clients in the primary data center (us-east-2). Notice that it uses the same AMI (var.primary_ami) as the server agent — the user script (nomad-client/data-scripts/user-data-client.sh) configures the Nomad agent as a client — and defines nomad_dc as dc1.
main.tf
```
module "primary_nomad_clients" {
  source = "./nomad-client"
  region = "us-east-2"

  ## ...

  ami                  = var.primary_ami
  client_instance_type = "t2.small"
  client_count         = 1
  nomad_dc             = "dc1"
}
```

The edge_nomad_clients module uses the nomad-client module to deploy one Nomad client in the edge data center (us-west-1). Notice that it uses var.edge_ami for its AMI and defines nomad_dc as dc2.

module "edge_nomad_clients" {
  source = "./nomad-client"
  region = "us-west-1"

  ## ...

  ami                  = var.edge_ami
  client_instance_type = "t2.small"
  client_count         = 2
  nomad_dc             = "dc2"
}

Define AMI IDs

Update terraform.tfvars to reflect the AMI IDs you built with Packer. The primary_ami should reference the AMI created in us-east-2; the edge_ami should reference the AMI created in us-west-1.

primary_ami = "REPLACE_WITH_BUILD_AMI_ID"
edge_ami    = "REPLACE_WITH_BUILD_EDGE_AMI_ID"

Deploy Nomad cluster and clients

Initialize your Terraform configuration.

$ terraform init
Initializing modules...
- edge_nomad_clients in nomad-client
- edge_shared_resources in shared-resources
Downloading registry.terraform.io/terraform-aws-modules/vpc/aws 3.13.0 for edge_shared_resources.vpc...
- edge_shared_resources.vpc in .terraform/modules/edge_shared_resources.vpc
- primary_nomad_clients in nomad-client
- primary_nomad_servers in nomad-server
- primary_shared_resources in shared-resources
Downloading registry.terraform.io/terraform-aws-modules/vpc/aws 3.13.0 for primary_shared_resources.vpc...
- primary_shared_resources.vpc in .terraform/modules/primary_shared_resources.vpc

Initializing the backend...

Initializing provider plugins...
- Reusing previous version of hashicorp/aws from the dependency lock file
- Reusing previous version of hashicorp/template from the dependency lock file
- Installing hashicorp/aws v4.6.0...
- Installed hashicorp/aws v4.6.0 (signed by HashiCorp)
- Installing hashicorp/template v2.2.0...
- Installed hashicorp/template v2.2.0 (signed by HashiCorp)

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

Then, apply your configuration to create the resources. Respond yes to the prompt to confirm the apply.

$ terraform apply
## ...

Plan: 41 to add, 0 to change, 0 to destroy.

## ...

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

## ...

Apply complete! Resources: 41 added, 0 changed, 0 destroyed.

Outputs:

edge_dc_nomad_client = "184.169.204.238"
nomad_lb_address = "http://learn-nomad-edge-server-lb-1934725976.us-east-2.elb.amazonaws.com:4646"
nomad_primary_dc_clients = [
  "3.15.5.228",
]
nomad_server = "18.191.0.46"
nomad_server_1 = "3.145.196.167"
nomad_server_2 = "3.144.15.124"
nomad_servers = [
  "18.191.0.46",
  "3.145.196.167",
  "3.144.15.124",
]
primary_dc_nomad_client = "3.15.5.228"

Once Terraform finishes provisioning the resources, display the nomad_lb_address Terraform output.

$ terraform output -raw nomad_lb_address
http://learn-nomad-edge-server-lb-1934725976.us-east-2.elb.amazonaws.com:4646

Open the link in your web browser to go to the Nomad UI. It should show an unauthorized page, since you have not provided the ACL bootstrap token.

Bootstrap Nomad ACL

Connect to one of your Nomad servers via SSH.

$ ssh terraform@$(terraform output -raw nomad_server) -i ./learn-nomad-edge

Run the following command to bootstrap the initial ACL token, parse the bootstrap token, and export it as an environment variable.

$ export NOMAD_BOOTSTRAP_TOKEN=$(nomad acl bootstrap | grep -i secret | awk -F '=' '{print $2}')

Then, apply the ACL policy. This is the ACL policy defined in packer/config/nomad-acl-user.hcl.

$ nomad acl policy apply -token $NOMAD_BOOTSTRAP_TOKEN -description "Policy to allow reading of agents and nodes and listing and submitting jobs in all namespaces." node-read-job-submit /ops/config/nomad-acl-user.hcl
Successfully wrote "node-read-job-submit" ACL policy!

Finally, create an ACL token for that policy. Keep this token in a safe place, you will use it in the next section to authenticate the Nomad UI to view the Nomad agents and jobs.

$ nomad acl token create -token $NOMAD_BOOTSTRAP_TOKEN -name "read-token" -policy node-read-job-submit | grep -i secret | awk -F "=" '{print $2}'
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

Create a management token. Unlike the previous ACL token, this management token can perform all operations. You will use this in future sections to authenticate the Nomad CLI to deploy jobs.

$ nomad acl token create -token $NOMAD_BOOTSTRAP_TOKEN -type="management" -global=true -name="Replication Token" | grep -i secret | awk -F "=" '{print $2}'
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

Close the SSH connection.

$ exit

Verify Nomad cluster and clients

Go to the Nomad UI and click on ACL Tokens in the top right corner. Enter the management ACL token in the Secret ID field and click on Set Token. You now have read permissions in the Nomad UI.

Set ACL token in Nomad UI

Click on Servers to confirm there are three nodes in your Nomad server cluster.

Nomad UI shows three nodes in Nomad server cluster

Click on Clients to confirm there are three clients — one in the primary data center (dc1) and two in the edge data center (dc2).

Nomad UI shows three clients across two Nomad data
centers

Connect to Nomad servers

You need to set the NOMAD_ADDR and NOMAD_TOKEN environment variables so your local Nomad binary can connect to the Nomad cluster.

First, set the NOMAD_ADDR environment variable to one of your Nomad servers.

$ export NOMAD_ADDR="http://$(terraform output -raw nomad_server):4646"

Then, set the NOMAD_TOKEN environment variable to the management token you created in the previous step.

$ export NOMAD_TOKEN=

List the Nomad server members to verify you successfully configured your Nomad binary.

$ nomad server members
Name                   Address        Port  Status  Leader  Raft Version  Build      Datacenter  Region
ip-10-0-101-18.global  18.191.0.46    4648  alive   false   3             1.5.3     dc1         global
ip-10-0-101-57.global  3.145.196.167  4648  alive   true    3             1.5.3      dc1         global
ip-10-0-101-69.global  3.144.15.124   4648  alive   false   3             1.5.3      dc1         global

Review HashiCups jobs

The jobs directory contains the HashiCups jobs you will schedule in the primary and edge data centers.

HashiCups services deployed in primary and edge data
centers.

Review the HashiCups job

Open jobs/hashicups.nomad.hcl. This Nomad job file will deploy the HashiCups database and product-api to the primary data center.

The hashicups job contains a hashicups group which defines the HashiCups database and product-api tasks. Nomad will only deploy this job in the primary datacenter (var.datacenters).

## ...
# Begin Job Spec
job "hashicups" {
  type   = "service"
  region = var.region
  datacenters = var.datacenters
  ## ...
}

In the db task, find the service stanza.

## ...
job "hashicups" {
  ## ...
  group "hashicups" {
    ## ...
    task "db" {
      driver = "docker"
      meta {
        service = "database"
      }
      service {
        port     = "db"
        tags     = ["hashicups", "backend"]
        provider = "nomad"
        address  = attr.unique.platform.aws.public-ipv4
      }
      ## ...
  }
}

Since this job file defines the service provider as nomad, Nomad will register the service in its built-in service discovery. This will enable other Nomad tasks to query and connect to the service. Nomad's native service discovery lets you register and query services. Unlike Consul, it does not provide a service mesh and route traffic. This is preferable for edge computing where unstable connectivity could impact service mesh. In addition, it reduces resource consumption since you do not need to run a Consul agent on each edge device.

Notice that the service stanza defines the address to the attribute associated with the EC2 instance's public IP address. Since the EC2 instance's kernel is unaware of its public IP address, Nomad cannot advertise the public IP address by default. For edge workloads that want to communicate with each other over the public Internet (like the HashiCups demo application), you must set the address to the attribute associated with the EC2 instance's public IP address for Nomad's native service discovery to list the correct address to connect to.

service {
  port     = "db"
  tags     = ["hashicups", "backend"]
  provider = "nomad"
  address  = attr.unique.platform.aws.public-ipv4
}

The product-api task has a similar service stanza. This advertises the product-api's address and port number, letting the public-api query Nomad's service discovery to connect to the product-api service.

In the product-api task, find the template stanza.

## ...
job "hashicups" {
  ## ...
  group "hashicups" {
    ## ...
    task "product-api" {
      driver = "docker"
      meta {
        service = "product-api"
      }
      template {
        data        = <<EOH
{{ range nomadService "hashicups-hashicups-db" }}
DB_CONNECTION="host={{ .Address }} port={{ .Port }} user=${var.postgres_user} password=${var.postgres_password} dbname=${var.postgres_db} sslmode=disable"
{{ end }}
EOH
        destination = "local/env.txt"
        env         = true
      }
      ## ...
  }
}

This template queries Nomad's native service discovery for the hashicups-hashicups-db service's address and port. It uses these values to populate the DB_CONNECTION environment variable which lets the product-api connect to the database.

Review the HashiCups edge job

Open jobs/hashicups-edge.nomad.hcl. This Nomad job file will deploy the remaining HashiCups backend and the frontend to the edge data center.

The hashicups-edge job contains a hashicups-edge group, which defines the remaining HashiCups tasks. Nomad will only deploy this job in the edge datacenter (var.datacenters`).

## ...
# Begin Job Spec
job "hashicups-edge" {
  type   = "service"
  region = var.region
  datacenters = var.datacenters
  ## ...
}

Find the max_client_disconnect attribute inside the group stanza.

## ...
job "hashicups" {
  ## ...
  group "hashicups-edge" {
    ## ...
    max_client_disconnect = "1h"
    ## ...
  }
}

If you do not set this attribute, Nomad runs its default behavior: when a Nomad client fails its heartbeat, Nomad will mark the client as down and the allocation as lost. Nomad will automatically schedule a new allocation on another client. However, if the down client reconnects to the server, it will shut down its existing allocations. This is suboptimal since Nomad will stop running allocations on a reconnected client just to place identical ones.

For many edge workloads, especially ones with high latency or unstable network connectivity, this is disruptive since a disconnected client does not necessarily mean the client is down. The allocations may continue to run on the temporarily disconnected client. For these cases, you want to set the max_client_disconnect attribute to gracefully handle disconnected client allocation.

If max_client_disconnect is set, when the client disconnects, Nomad will still schedule the allocation on another client. However, when the client reconnects:

Nomad will mark the reconnected client as ready.
If there are multiple job versions, Nomad will select the latest job version and stop all other allocations.
If Nomad rescheduled the lost allocation to a new client and the new client has a higher node rank, Nomad will continue the allocations in the new client and stop all others.
If the new client has a worse node rank or there is a tie, Nomad will resume the allocations on the reconnected client and stop all others.

This is the preferred behavior for edge workloads with high latency or unstable network connectivity, and especially true when the disconnected allocation is stateful.

In the public-api task, find the template stanza.

## ...
job "hashicups-edge" {
  ## ...
  group "hashicups-edge" {
    ## ...
    task "public-api" {
      driver = "docker"
      meta {
        service = "public-api"
      }
      template {
        data        = <<EOH
{{ range nomadService "hashicups-hashicups-product-api" }}
  PRODUCT_API_URI="http://{{.Address}}:{{.Port}}"
{{ end }}
EOH
        change_mode = "noop"
        destination = "local/env.txt"
        env         = true
      }
      ## ...
  }
}

This template queries Nomad's native service discovery for the hashicups-hashicups-product-api service's address and port. In addition, this template stanza sets change_mode to noop. By default, change_mode is set to restart, which will cause your task to fail if your client is unable to connect to the Nomad server. Since Nomad is scheduling this job on the edge datacenter, if the edge client disconnects from the Nomad server (and therefore service discovery), the service will use the previously configured address and ports.

Schedule HashiCups jobs

Submit the hashicups job to deploy the tasks to the primary data center.

$ nomad job run jobs/hashicups.nomad.hcl 
==> 2022-04-24T14:47:17-07:00: Monitoring evaluation "2e20a4db"
    2022-04-24T14:47:17-07:00: Evaluation triggered by job "hashicups"
==> 2022-04-24T14:47:18-07:00: Monitoring evaluation "2e20a4db"
    2022-04-24T14:47:18-07:00: Evaluation within deployment: "9b7f3d50"
    2022-04-24T14:47:18-07:00: Allocation "3442ce01" created: node "0643838d", group "hashicups"
    2022-04-24T14:47:18-07:00: Evaluation status changed: "pending" -> "complete"
==> 2022-04-24T14:47:18-07:00: Evaluation "2e20a4db" finished with status "complete"
==> 2022-04-24T14:47:18-07:00: Monitoring deployment "9b7f3d50"
  ✓ Deployment "9b7f3d50" successful
    
    2022-04-24T14:47:30-07:00
    ID          = 9b7f3d50
    Job ID      = hashicups
    Job Version = 0
    Status      = successful
    Description = Deployment completed successfully
    
    Deployed
    Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
    hashicups   1        1       1        0          2022-04-24T21:57:28Z

Submit the hashicups-edge job to deploy the tasks to the edge data center.

$ nomad job run jobs/hashicups-edge.nomad.hcl 
==> 2022-04-24T14:47:47-07:00: Monitoring evaluation "8756c237"
    2022-04-24T14:47:47-07:00: Evaluation triggered by job "hashicups-edge"
    2022-04-24T14:47:47-07:00: Evaluation within deployment: "66d4779e"
    2022-04-24T14:47:47-07:00: Allocation "48af7a5e" created: node "6ba84888", group "hashicups-edge"
    2022-04-24T14:47:47-07:00: Evaluation status changed: "pending" -> "complete"
==> 2022-04-24T14:47:47-07:00: Evaluation "8756c237" finished with status "complete"
==> 2022-04-24T14:47:47-07:00: Monitoring deployment "66d4779e"
  ✓ Deployment "66d4779e" successful
    
    2022-04-24T14:48:29-07:00
    ID          = 66d4779e
    Job ID      = hashicups-edge
    Job Version = 0
    Status      = successful
    Description = Deployment completed successfully
    
    Deployed
    Task Group      Desired  Placed  Healthy  Unhealthy  Progress Deadline
    hashicups-edge  1        1       1        0          2022-04-24T21:58:28Z

Verify HashiCups jobs

List the Nomad services. Notice the service name contains the job name, group name, and task name, separated by a dash (-).

$ nomad service list
Service Name                                Tags
hashicups-edge-hashicups-edge-frontend      [frontend,hashicups]
hashicups-edge-hashicups-edge-nginx         [frontend,hashicups]
hashicups-edge-hashicups-edge-payments-api  [backend,hashicups]
hashicups-edge-hashicups-edge-public-api    [backend,hashicups]
hashicups-hashicups-db                      [backend,hashicups]
hashicups-hashicups-product-api             [backend,hashicups]

Retrieve detailed information about the nginx service. Since there are two Nomad clients on the edge datacenter, this command is useful to locate which client the service is running on. Notice that the nginx service's address reflects the address defined by the advertise stanza — the client's public IP address.

$ nomad service info hashicups-edge-hashicups-edge-nginx
Job ID          Address             Tags                  Node ID   Alloc ID
hashicups-edge  184.169.204.238:80  [hashicups,frontend]  6ba84888  e3b69fc2

Open the nginx's address in your web browser to go to HashiCups.

HashiCups frontend connected with the `product-api` to display an array of
coffee images

Simulate client disconnect

When running and managing edge services, the network connection between your Nomad servers and edge services may be unstable. In this step, you will simulate the client running the hashicups-edge job disconnecting from the Nomad servers to learn how Nomad reacts to disconnected clients.

Retrieve the nginx service's client IP address. For the example below, the client IP address is 184.169.204.238.

$ nomad service info hashicups-edge-hashicups-edge-nginx
Job ID          Address             Tags                  Node ID   Alloc ID
hashicups-edge  184.169.204.238:80  [hashicups,frontend]  6ba84888  e3b69fc2

Export the client IP address as an environment variable named CLIENT_IP. Do not include the port. For example, the client IP address for this example would be 184.169.204.238.

$ export CLIENT_IP=

Run the following command to drop all packets from the Nomad servers to the Nomad client that is currently hosting the hashicups-edge job.

$ ssh terraform@$CLIENT_IP -i ./learn-nomad-edge \
'sudo iptables -I INPUT -s '$(terraform output -raw nomad_server)' -j DROP && \
sudo iptables -I INPUT -s '$(terraform output -raw nomad_server_1)' -j DROP && \
sudo iptables -I INPUT -s '$(terraform output -raw nomad_server_2)' -j DROP'

Verify disconnected client

Retrieve the hashicups-edge job's status. Notice that one of the allocations's status is now unknown and Nomad rescheduled the allocation onto a different client.

Tip

If the allocation status does not change, wait a couple of seconds before retrieving the job's status. If it does not change, verify that you dropped packets on the correct client.

$ nomad status hashicups-edge
## ...
Allocations
ID        Node ID   Task Group      Version  Desired  Status   Created    Modified
40f52550  da109b44  hashicups-edge  0        run      pending  9s ago     8s ago
48af7a5e  6ba84888  hashicups-edge  0        run      unknown  2m39s ago  9s ago

This is the preferred behavior as the client instance is still up but could not connect to the Nomad status, like an edge network's unstable network connection.

List the nginx service. Notice that Nomad lists both services. This is because even though the original client cannot connect to the Nomad servers, it does not necessarily mean that the client is unavailable. As a result, Nomad continues to list the original client as available.

$ nomad service info hashicups-edge-hashicups-edge-nginx
Job ID          Address             Tags                  Node ID   Alloc ID
hashicups-edge  13.57.34.53:80      [hashicups,frontend]  da109b44  40f52550
hashicups-edge  184.169.204.238:80  [hashicups,frontend]  6ba84888  48af7a5e

Visit both addresses to find the HashiCups dashboard.

Re-enable client connection

Run the following command to re-accept packets from the Nomad servers.

$ ssh terraform@$CLIENT_IP -i ./learn-nomad-edge \
'sudo iptables -D INPUT -s '$(terraform output -raw nomad_server)' -j DROP && \
sudo iptables -D INPUT -s '$(terraform output -raw nomad_server_1)' -j DROP && \
sudo iptables -D INPUT -s '$(terraform output -raw nomad_server_2)' -j DROP'

Retrieve the hashicups-edge job's status. Notice that the original client status is now running and rescheduled allocation on the new client is now complete.

Tip

If the allocation status does not change, wait a couple of seconds before retrieving the job's status. If it does not change, verify that you re-accepted packets on the correct client.

$ nomad status hashicups-edge
## ...
Allocations
ID        Node ID   Task Group      Version  Desired  Status    Created    Modified
40f52550  da109b44  hashicups-edge  0        stop     complete  3m42s ago  3s ago
48af7a5e  6ba84888  hashicups-edge  0        run      running   6m12s ago  4s ago

Since the original client reconnected and the node rank on the rescheduled allocation is equal to or worse than the original client, Nomad resumed the original allocation and stopped the new one.

Retrieve the re-connected allocation's status to find the reconnect event, replacing ALLOC_ID with your re-connected allocation ID. In this example, it is 48af7a5e.

$ nomad alloc status ALLOC_ID
## ...
Recent Events:
Time                       Type         Description
2022-04-24T14:53:55-07:00  Reconnected  Client reconnected
2022-04-24T14:48:01-07:00  Started      Task started by client
2022-04-24T14:47:48-07:00  Driver       Downloading image
2022-04-24T14:47:47-07:00  Task Setup   Building Task Directory
2022-04-24T14:47:47-07:00  Received     Task received by client

List the nginx service. Notice that Nomad removed the completed job – it only lists the original service.

$ nomad service info hashicups-edge-hashicups-edge-nginx
Job ID          Address             Tags                  Node ID   Alloc ID
hashicups-edge  184.169.204.238:80  [hashicups,frontend]  6ba84888  48af7a5e

Clean up resources

Run terraform destroy to clean up your provisioned infrastructure. Respond yes to the prompt to confirm the operation.

$ terraform destroy
## ...
Plan: 0 to add, 0 to change, 20 to destroy.
## ...
Do you really want to destroy all resources?
  Terraform will destroy all your managed infrastructure, as shown above.
  There is no undo. Only 'yes' will be accepted to confirm.

  Enter a value: yes
## ...
Destroy complete! Resources: 20 destroyed.

Your AWS account still has the AMI and its S3-stored snapshots, which you may be charged for depending on your other usage. Delete the AMI and snapshots stored in your S3 buckets.

Note

Remember to delete the AMI images and snapshots in both regions where you created them. If you didn't update the region variable in the terraform.tfvars file, they will be in the us-east-2 and us-west-1 regions.

In your us-east-2 AWS account, deregister the AMI by selecting it, clicking on the Actions button, then the Deregister AMI option, and finally confirm by clicking the Deregister AMI button in the confirmation dialog.

Delete the snapshots by selecting the snapshots, clicking on the Actions button, then the Delete snapshot option, and finally confirm by clicking the Delete button in the confirmation dialog.

Then, delete the AMI images and snapshots in the us-west-1 region.

In your us-west-1 AWS account, deregister the AMI by selecting it, clicking on the Actions button, then the Deregister AMI option, and finally confirm by clicking the Deregister AMI button in the confirmation dialog.

Next steps

In this tutorial, you deployed a single server cluster and distant client edge architecture. Then, you scheduled HashiCups on both on-prem and edge data centers, connecting its services with Nomad's native service discovery. Finally, you tested the disconnected client allocation by simulating unstable network connectivity between the Nomad clients and the server.

For more information, check out the following resources.

Learn more about Nomad's native service discovery by visiting the Nomad documentation
Read more about disconnected client allocation handling by visiting the Nomad documentation
Complete the tutorials in the Nomad ACL System Fundamentals collection to configure a Nomad cluster for ACLs, bootstrap the ACL system, author your first policy, and grant a token based on the policy.

Collection Overview

Edge Computing

Next Collection

Enterprise