Connect to a PostgreSQL cluster deployed to Patroni

This topic describes how to connect Terraform Enterprise to a highly-available PostgreSQL cluster deployed to Patroni on Kubernetes.

Warning

Connecting to a database cluster is in beta. These instructions describe an example scenario that we tested and verified for non-production use cases. You should evaluate your requirements and business needs to determine the optimal architecture and configurations for your specific environment.

Overview

Install the postgres-operator chart, which creates a Postgres construct that manages PostgreSQL clusters on Kubernetes. Refer to the Postgres operator documentation for additional information.

Create a custom values.yaml file and define the necessary Kubernetes objects, such as the HAProxy and a service that enables the proxy to discover the Patroni pods.
Deploy the configurations using the Postgres operator Helm chart.

It is optional, but you can create and run a test workload against Terraform Enterprise to measure the resilience of your high availability PostgreSQL cluster.

Requirements

During testing, the following deployment configuration resulted in three successful failover recoveries after five iterations. Refer to Measure failover resilience for additional information.

Load balancer

You must deploy a load balancer between Terraform Enterprise and the PostgreSQL cluster on Patroni. Refer to the requirements for connecting Terraform Enterprise to a PostgreSQL cluster for additional information.

The scenario described in these instructions uses an HAProxy. For a production deployment of Patroni on Kubernetes, we recommend using the Kubernetes load balancer service. You can configure the load balancer service in the Patroni cluster manifest. Refer to the Patroni documentation for details.

Terraform Enterprise

We tested the scenario described in this topic against the following Terraform Enterprise deployment:

Release v202409-1
active-active operational mode
Deployed to Google Kubernetes Engine (GKE)
Deployed on three nodes
The follow environment variables configured:
- TFE_DATABASE_HOST variable set to an HAProxy load balancer
- TFE_DATABASE_RECONNECT_ENABLED set to true.

Patroni

We tested the scenario described in this topic against the following Patroni deployment:

Deployed with three nodes
Served to Terraform Enterprise through an HAProxy load balancer

Configure Kubernetes objects

Create a values.yaml file to override the default Postgres operator values.

Define cluster resource defaults

The Postgres operator Helm chart contains default values for all Patroni clusters deployed using the operator. Refer to the Zalando Postgres operator values for additional information.

Specify the resources that the Postgres containers should use in the configPostgresPodResources field. The following example configures resource requests, such as CPU and memory limits for the Postgres containers in the pods:

configPostgresPodResources
  # CPU limits for the postgres containers
  default_cpu_limit: "8"
  # CPU request value for the postgres containers
  default_cpu_request: "4"
  # memory limits for the postgres containers
  default_memory_limit: 32Gi
  # memory request value for the postgres containers
  default_memory_request: 16Gi

Define cluster behaviors

Kubernetes allocates resources to the Patroni cluster according to the values you define in the configPostgresPodResources field and starts individual Postgres clusters according to the cluster manifest. The manifest is a custom resource definition (CRD) that defines parameters for each cluster. Refer to the following Postgres operator topics for additional information about the cluster manifest:

The following example manifest specifies the cluster configuration we tested for this scenario:

apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
  name: patroni
  namespace: failover
spec:
  teamId: "terraform-enterprise"
  volume:
    size: 10Gi
  numberOfInstances: 3
  users:
    cluster_admin:  # database owner
    - superuser
    - createdb
    user: []  # role for application foo
  databases:
    db: user  # dbname: owner
  postgresql:
    version: "15"

Discovery service

Define a service that enables the HAProxy to discover the IP addresses of the Patroni pods. The following example uses the Spilo application for discovery. It discovers all Patroni pods and then uses the HAproxy to route to the master:

apiVersion: v1
kind: Service
metadata:
  name: patroni-headless
  namespace: failover
spec:
  clusterIP: None
  selector:
    cluster-name: patroni
    application: spilo    
    # spilo-role   = "master"
  ports:
    - port: 5432
      name: postgresql
    - port: 8008
      name: api

HAProxy

Define an HAProxy that routes traffic to the primary node by setting the /primary Patroni endpoint.

The HAProxy configuration is crucial to the successful recovery of Terraform Enterprise after a failover. By default, the proxy runs a health check every two seconds, which is too long for many implementations. We recommend configuring HAProxy with an interval of maximum 1s.

Refer to HAProxy documentation for instructions on how to change the health check interval.

The following configuration uses the Kubernetes DNS set up in resolv.conf and applies server-templates and service names to the HAProxy. It also uses a Kubernetes resolver to resolve the DNS name of the Patroni service. If it is unable to resolve the DNS name, it uses the last known IP address or the libc resolver in that order.

   global
         log stdout format raw local0
         maxconn 2000

   defaults
         log global
         mode tcp
         option tcplog
         timeout connect 5000ms
         timeout client 50000ms
         timeout server 50000ms

   
   resolvers kubernetes
         parse-resolv-conf
         hold valid 10s

   listen postgres
         bind *:5000
         mode tcp
         retry-on all-retryable-errors
         
         option httpchk
         http-check send meth HEAD uri /primary
         http-check expect status 200

         default-server inter 1s fall 3 rise 2 on-marked-down shutdown-sessions
         
         server-template patroni- 1-3 

   patroni-headless.failover.svc.cluster.local:5432 check port 8008 resolvers 
   kubernetes init-addr last,libc,none

Deploy the configurations

Install the Postgres operator chart and apply the configuration files to deploy Patroni and the HAProxy. Refer to the Postgres operator documentation for instructions on how to deploy.

Measure failover resilience

You can collect recovery time objective (RTO) data to assess the resilience of your HA system. Refer to the following topics for additional information:

In the example scenario, we observed the following outcomes:

Recovery times ranging from a minimum RTO of 2m18s to a maximum of 4m56s
Average RTO of 3m38s across successful failovers.
Two out of five failovers experienced issues where Terraform Enterprise returned the operation within one minute, but node restarts were needed to resolve continued run failures.

Troubleshooting

You may need to manually address issues after a failover to return to functionality. For example, the Vault process may still be connected to a read-only instance if the affected instance can not process runs.

Refer to Unable to write to database after a failover in the Terraform troubleshooting documentation for symptoms and solutions.