Vault
PKI secrets engine - considerations
To successfully deploy this secrets engine, there are a number of important considerations to be aware of, as well as some preparatory steps that should be undertaken. You should read all of these before using this secrets engine or generating the CA to use with this secrets engine.
Table of contents
- Be Careful with Root CAs
- One CA Certificate, One Secrets Engine
- Key Types Matter
- Use a CA Hierarchy
- Cluster URLs are Important
- Automate Rotation with ACME
- Keep Certificate Lifetimes Short, For CRL's Sake
- You must configure issuing/CRL/OCSP information in advance
- Distribution of CRLs and OCSP
- Automate CRL Building and Tidying
- Spectrum of Revocation Support
- Issuer Subjects and CRLs
- Automate Leaf Certificate Renewal
- Safe Minimums
- Token Lifetimes and Revocation
- Safe Usage of Roles
- Telemetry
- Auditing
- Role-Based Access
- Replicated DataSets
- Cluster Scalability
- PSS Support
- Issuer Storage Migration Issues
Be careful with root CAs
Vault storage is secure, but not as secure as a piece of paper in a bank vault. It is, after all, networked software. If your root CA is hosted outside of Vault, don't put it in Vault as well; instead, issue a shorter-lived intermediate CA certificate and put this into Vault. This aligns with industry best practices.
Since 0.4, the secrets engine supports generating self-signed root CAs and creating and signing CSRs for intermediate CAs. In each instance, for security reasons, the private key can only be exported at generation time, and the ability to do so is part of the command path (so it can be put into ACL policies).
If you plan on using intermediate CAs with Vault, it is suggested that you let
Vault create CSRs and do not export the private key, then sign those with your
root CA (which may be a second mount of the pki
secrets engine).
Managed keys
Since 1.10, Vault Enterprise can access private key material in a
managed key. In this case, Vault never sees the
private key, and the external KMS or HSM performs certificate signing operations.
Managed keys are configured by selecting the kms
type when generating a root
or intermediate.
One CA certificate, one secrets engine
Since Vault 1.11.0, the PKI Secrets Engine supports multiple issuers in a single mount. However, in order to simplify the configuration, it is strongly recommended that operators limit a mount to a single issuer. If you want to issue certificates from multiple disparate CAs, mount the PKI secrets engine at multiple mount points with separate CA certificates in each.
The rationale for separating mounts is to simplify permissions management: very few individuals need access to perform operations with the root, but many need access to create leaves. The operations on a root should generally be limited to issuing and revoking intermediate CAs, which is a highly privileged operation; it becomes much easier to audit these operations when they're in a separate mount than if they're mixed in with day-to-day leaf issuance.
A common pattern is to have one mount act as your root CA and to use this CA only to sign intermediate CA CSRs from other PKI secrets engines.
To keep old CAs active, there's two approaches to achieving rotation:
- Use multiple secrets engines. This allows a fresh start, preserving the old issuer and CRL. Vault ACL policy can be updated to deny new issuance under the old mount point and roles can be re-evaluated before being imported into the new mount point.
- Use multiple issuers in the same mount point. The usage of the old issuer can be restricted to CRL signing, and existing roles and ACL policy can be kept as-is. This allows cross-signing within the same mount, and consumers of the mount won't have to update their configuration. Once the transitional period for this rotation has completed and all past issued certificate have expired, it is encouraged to fully remove the old issuer and any unnecessary cross-signed issuers from the mount point.
Another suggested use case for multiple issuers in the same mount is splitting issuance by TTL lifetime. For short-lived certificates, an intermediate stored in Vault will often out-perform a HSM-backed intermediate. For longer-lived certificates, however, it is often important to have the intermediate key material secured throughout the lifetime of the end-entity certificate. This means that two intermediates in the same mount -- one backed by the HSM and one backed by Vault -- can satisfy both use cases. Operators can make roles setting maximum TTLs for each issuer and consumers of the mount can decide which to use.
Always configure a default issuer
For backwards compatibility, the default issuer is used to service PKI endpoints without an explicit issuer (either via path selection or role-based selection). When certificates are revoked and their issuer is no longer part of this PKI mount, Vault places them on the default issuer's CRL. This means maintaining a default issuer is important for both backwards compatibility for issuing certificates and for ensuring revoked certificates land on a CRL.
Key types matter
Certain key types have impacts on performance. Signing certificates from a RSA
key will be slower than issuing from an ECDSA or Ed25519 key. Key generation
(using /issue/:role
endpoints) using RSA keys will also be slow: RSA key
generation involves finding suitable random primes, whereas Ed25519 keys can
be random data. As the number of bits goes up (RSA 2048 -> 4096 or ECDSA
P-256 -> P-521), signature times also increases.
This matters in both directions: not only is issuance more expensive, but validation of the corresponding signature (in say, TLS handshakes) will also be more expensive. Careful consideration of both issuer and issued key types can have meaningful impacts on performance of not only Vault, but systems using these certificates.
Cluster performance and key types
The benchmark-vault project can be used to measure the performance of a Vault PKI instance. In general, some considerations to be aware of:
RSA key generation is much slower and highly variable than EC key generation. If performance and throughput are a necessity, consider using EC keys (including NIST P-curves and Ed25519) instead of RSA.
Key signing requests (via
/pki/sign
) will be faster than (/pki/issue
), especially for RSA keys: this removes the necessity for Vault to generate key material and can sign the key material provided by the client. This signing step is common between both endpoints, so key generation is pure overhead if the client has a sufficiently secure source of entropy.The CA's key type matters as well: using a RSA CA will result in a RSA signature and takes longer than a ECDSA or Ed25519 CA.
Storage is an important factor: with BYOC Revocation, using
no_store=true
still gives you the ability to revoke certificates and audit logs can be used to track issuance. Clusters using a remote storage (like Consul) over a slow network and usingno_store=false
orno_store_cert_metadata=false
along with specifying metadata on issuance, will result in additional latency on issuance. Adding leases for every issued certificate compounds the problem.- Storing too many certificates results in longer
LIST /pki/certs
time, including the time to tidy the instance. As such, for large scale deployments (>= 250k active certificates) it is recommended to use audit logs to track certificates outside of Vault.
- Storing too many certificates results in longer
As a general comparison on unspecified hardware, using benchmark-vault
for
30s
on a local, single node, raft-backed Vault instance:
Vault can issue 300k certificates using EC P-256 for CA & leaf keys and without storage.
- But switching to storing these leaves drops that number to 65k, and only 20k with leases.
Using large, expensive RSA-4096 bit keys, Vault can only issue 160 leaves, regardless of whether or not storage or leases were used. The 95% key generation time is above 10s.
- In comparison, using P-521 keys, Vault can issue closer to 30k leaves without leases and 18k with leases.
These numbers are for example only, to represent the impact different key types can have on PKI cluster performance.
The use of ACME adds additional latency into these numbers, both because certificates need to be stored and because challenge validation needs to be performed.
Use a CA hierarchy
It is generally recommended to use a hierarchical CA setup, with a root certificate which issues one or more intermediates (based on usage), which in turn issue the leaf certificates.
This allows stronger storage or policy guarantees around protection of the root CA, while letting Vault manage the intermediate CAs and issuance of leaves. Different intermediates might be issued for different usage, such as VPN signing, Email signing, or testing versus production TLS services. This helps to keep CRLs limited to specific purposes: for example, VPN services don't care about the revoked set of email signing certificates if they're using separate certificates and different intermediates, and thus don't need both CRL contents. Additionally, this allows higher risk intermediates (such as those issuing longer-lived email signing certificates) to have HSM-backing without impacting the performance of easier-to-rotate intermediates and certificates (such as TLS intermediates).
Vault supports the use of both the allowed_domains
parameter on
Roles and the permitted_dns_domains
parameter to set the Name Constraints extension
on root and intermediate generation. This allows for several layers of
separation of concerns between TLS-based services.
Cross-Signed intermediates
When cross-signing intermediates from two separate roots, two separate
intermediate issuers will exist within the Vault PKI mount. In order to
correctly serve the cross-signed chain on issuance requests, the
manual_chain
override is required on either or both intermediates. This
can be constructed in the following order:
- this issuer (
self
) - this root
- the other copy of this intermediate
- the other root
All requests to this issuer for signing will now present the full cross-signed chain.
Cluster URLs are important
In Vault 1.13, support for templated AIA URLs was added. With the per-cluster URL configuration pointing to this Performance Replication cluster, AIA information will point to the cluster that issued this certificate automatically.
In Vault 1.14, with ACME support, the same configuration is used for allowing ACME clients to discover the URL of this cluster.
Warning: It is important to ensure that this configuration is up to date and maintained correctly, always pointing to the node's PR cluster address (which may be a Load Balanced or a DNS Round-Robbin address). If this configuration is not set on every Performance Replication cluster, certificate issuance (via REST and/or via ACME) will fail.
Automate rotation with ACME
In Vault 1.14, support for the Automatic Certificate Management Environment (ACME) protocol has been added to the PKI Engine. This is a standardized way to handle validation, issuance, rotation, and revocation of server certificates.
Many ecosystems, from web servers like Caddy, Nginx, and Apache, to orchestration environments like Kubernetes (via cert-manager) natively support issuance via the ACME protocol. For deployments without native support, stand-alone tools like certbot support fetching and renewing certificates on behalf of consumers. Vault's PKI Engine only includes server support for ACME; no client functionality has been included.
Note: Vault's PKI ACME server caps the certificate's validity at 90 days
maximum by default, overridable using the ACME config max_ttl parameter.
Shorter validity durations can be set via limiting the role's TTL to
be under the global ACME configured limit.
Aligning with Let's Encrypt, we do not support the optional NotBefore
and NotAfter
order request parameters.
ACME stores certificates
Because ACME requires stored certificates in order to function, the notes
below about automating tidy are
especially important for the long-term health of the PKI cluster. ACME also
introduces additional resource types (accounts, orders, authorizations, and
challenges) that must be tidied via the tidy_acme=true
option. Orders, authorizations, and
challenges are cleaned up based on the
safety_buffer
parameter, but accounts can live longer past their last issued certificate
by controlling the acme_account_safety_buffer
parameter.
As a consequence of the above, and like the discussions in the Cluster
Scalability section, because these roles have
no_store=false
set, ACME can only issue certificates on the active nodes
of PR clusters; standby nodes, if contacted, will transparently forward
all requests to the active node.
ACME role restrictions require EAB
Because ACME by default has no external authorization engine and is unauthenticated from a Vault perspective, the use of roles with ACME in the default configuration are of limited value as any ACME client can request certificates under any role by proving possession of the requested certificate identifiers.
To solve this issue, there are two possible approaches:
- Use a restrictive
allowed_roles
,allowed_issuers
, anddefault_directory_policy
ACME configuration to let only a single role and issuer be used. This prevents user choice, allowing some global restrictions to be placed on issuance and avoids requiring ACME clients to have (at initial setup) access to a Vault token other mechanism for acquiring a Vault EAB ACME token. - Use a more permissive configuration with
eab_policy=always-required
to allow more roles and users to select the roles, but bind ACME clients to a Vault token which can be suitably ACL'd to particular sets of approved ACME directories.
The choice of approach depends on the policies of the organization wishing to use ACME.
Another consequence of the Vault unauthenticated nature of ACME requests are that role templating, based on entity information, cannot be used as there is no token and thus no entity associated with the request, even when EAB binding is used.
ACME and the public internet
Using ACME is possible over the public internet; public CAs like Let's Encrypt
offer this as a service. Similarly, organizations running internal PKI
infrastructure might wish to issue server certificates to pieces of
infrastructure outside of their internal network boundaries, from a publicly
accessible Vault instance. By default, without enforcing a restrictive
eab_policy
, this results in a complicated threat model: any external
client which can prove possession of a domain can issue a certificate under
this CA, which might be considered more trusted by this organization.
As such, we strongly recommend publicly facing Vault instances (such as HCP
Vault) enforce that PKI mount operators have required a restrictive
eab_policy=always-required
configuration.
System administrators of Vault instances can enforce this by setting the
VAULT_DISABLE_PUBLIC_ACME=true
environment
variable.
ACME errors are in server logs
Because the ACME client is not necessarily trusted (as account registration may not be tied to a valid Vault token when EAB is not used), many error messages end up in the Vault server logs out of security necessity. When troubleshooting issues with clients requesting certificates, first check the client's logs, if any, (e.g., certbot will state the log location on errors), and then correlate with Vault server logs to identify the failure reason.
ACME security considerations
ACME allows any client to use Vault to make some sort of external call; while the design of ACME attempts to minimize this scope and will prohibit issuance if incorrect servers are contacted, it cannot account for all possible remote server implementations. Vault's ACME server makes three types of requests:
- DNS requests for
_acme-challenge.<domain>
, which should be least invasive and most safe. - TLS ALPN requests for the
acme-tls/1
protocol, which should be safely handled by the TLS before any application code is invoked. - HTTP requests to
http://<domain>/.well-known/acme-challenge/<token>
, which could be problematic based on server design; if all requests, regardless of path, are treated the same and assumed to be trusted, this could result in Vault being used to make (invalid) requests. Ideally, any such server implementations should be updated to ignore such ACME validation requests or to block access originating from Vault to this service.
In all cases, no information about the response presented by the remote server is returned to the ACME client.
When running Vault on multiple networks, note that Vault's ACME server places no restrictions on requesting client/destination identifier validations paths; a client could use a HTTP challenge to force Vault to reach out to a server on a network it could otherwise not access.
ACME and client counting
In Vault 1.14, ACME contributes differently to usage metrics than other interactions with the PKI Secrets Engine. Due to its use of unauthenticated requests (which do not generate Vault tokens), it would not be counted in the traditional activity log APIs. Instead, certificates issued via ACME will be counted via their unique certificate identifiers (the combination of CN, DNS SANs, and IP SANs). These will create a stable identifier that will be consistent across renewals, other ACME clients, mounts, and namespaces, contributing to the activity log presently as a non-entity token attributed to the first mount which created that request.
Keep certificate lifetimes short, for CRL's sake
This secrets engine aligns with Vault's philosophy of short-lived secrets. As such it is not expected that CRLs will grow large; the only place a private key is ever returned is to the requesting client (this secrets engine does not store generated private keys, except for CA certificates). In most cases, if the key is lost, the certificate can simply be ignored, as it will expire shortly.
If a certificate must truly be revoked, the normal Vault revocation function can be used, and any revocation action will cause the CRL to be regenerated. When the CRL is regenerated, any expired certificates are removed from the CRL (and any revoked, expired certificate are removed from secrets engine storage). This is an expensive operation! Due to the structure of the CRL standard, Vault must read all revoked certificates into memory in order to rebuild the CRL and clients must fetch the regenerated CRL.
This secrets engine does not support multiple CRL endpoints with sliding date windows; often such mechanisms will have the transition point a few days apart, but this gets into the expected realm of the actual certificate validity periods issued from this secrets engine. A good rule of thumb for this secrets engine would be to simply not issue certificates with a validity period greater than your maximum comfortable CRL lifetime. Alternately, you can control CRL caching behavior on the client to ensure that checks happen more often.
Often multiple endpoints are used in case a single CRL endpoint is down so that clients don't have to figure out what to do with a lack of response. Run Vault in HA mode, and the CRL endpoint should be available even if a particular node is down.
Note: Since Vault 1.11.0, with multiple issuers in the same mount point, different issuers may have different CRLs (depending on subject and key material). This means that Vault may need to regenerate multiple CRLs. This is again a rationale for keeping TTLs short and avoiding revocation if possible.
Note: Since Vault 1.12.0, we support two complementary revocation mechanisms: Delta CRLs, which allow for rebuilds of smaller, incremental additions to the last complete CRL, and OCSP, which allows responding to revocation status requests for individual certificates. When coupled with the new CRL auto-rebuild functionality, this means that the revoking step isn't as costly (as the CRL isn't always rebuilt on each revocation), outside of storage considerations. However, while the rebuild operation still can be expensive with lots of certificates, it will be done on a schedule rather than on demand.
NotAfter behavior on leaf certificates
In Vault 1.11.0, the PKI Secrets Engine has introduced a new
leaf_not_after_behavior
parameter on
issuers.
This allows modification of the issuance behavior: should Vault err
,
preventing issuance of a longer-lived leaf cert than issuer, silently
truncate
to that of the issuer's NotAfter
value, or permit
longer
expirations.
It is strongly suggested to use err
or truncate
for intermediates;
permit
is only useful for root certificates, as intermediate's NotAfter
expiration are checked when validating presented chains.
In combination with a cascading expiration with longer lived roots (perhaps on the range of 2-10 years), shorter lived intermediates (perhaps on the range of 6 months to 2 years), and short-lived leaf certificates (on the range of 30 to 90 days), and the rotation strategies discussed in other sections, this should keep the CRLs adequately small.
Cluster performance and quantity of leaf certificates
As mentioned above, keeping TTLs short (or using no_store=true
and
no_store_cert_metadata=true
) and avoiding
leases is important for a healthy cluster. However it is important to note
this is a scale problem: 10-1000 long-lived, stored certificates are probably
fine, but 50k-100k become a problem and 500k+ stored, unexpired certificates
can negatively impact even large Vault clusters--even with short TTLs!
However, once these certificates are expired, a tidy operation will clean up CRLs and Vault cluster storage.
Note that organizational risk assessments for certificate compromise might
mean certain certificate types should always be issued with no_store=false
;
even short-lived broad wildcard certificates (say, *.example.com
) might be
important enough to have precise control over revocation. However, an internal
service with a well-scoped certificate (say, service.example.com
) might be
of low enough risk to issue a 90-day TTL with no_store=true
, preventing
the need for revocation in the unlikely case of compromise.
Having a shorter TTL decreases the likelihood of needing to revoke a cert (but cannot prevent it entirely) and decrease the impact of any such compromise.
Note: As of Vault 1.12, the PKI Secret Engine's Bring-Your-Own-Cert
(BYOC)
functionality allows revocation of certificates not previously stored
(e.g., issued via a role with no_store=true
). This means that setting
no_store=true
is now safe to be used globally, regardless of importance
of issued certificates (and their likelihood for revocation).
You must configure issuing/CRL/OCSP information in advance
This secrets engine serves CRLs from a predictable location, but it is not
possible for the secrets engine to know where it is running. Therefore, you must
configure desired URLs for the issuing certificate, CRL distribution points, and
OCSP servers manually using the config/urls
endpoint. It is supported to have
more than one of each of these by passing in the multiple URLs as a
comma-separated string parameter.
Note: when using Vault Enterprise's Performance Replication features with a PKI Secrets Engine mount, each cluster will have its own CRL; this means each cluster's unique CRL address should be included in the AIA information field separately, or the CRLs should be consolidated and served outside of Vault.
Note: When using multiple issuers in the same mount, it is suggested to use
the per-issuer AIA fields rather than the global (/config/urls
) variant.
This is for correctness: these fields are used for chain building and
automatic CRL detection in certain applications. If they point to the wrong
issuer's information, these applications may break.
Distribution of CRLs and OCSP
Both CRLs and OCSP allow interrogating revocation status of certificates. Both of these methods include internal security and authenticity (both CRLs and OCSP responses are signed by the issuing CA within Vault). This means both are fine to distribute over non-secure and non-authenticated channels, such as HTTP.
Note: The OCSP implementation for GET requests can lead to intermittent 400 errors when an encoded OCSP request contains consecutive '/' characters. Until this is resolved it is recommended to use POST based OCSP requests.
Automate CRL building and tidying
Since Vault 1.12, the PKI Secrets Engine supports automated CRL rebuilding
(including optional Delta CRLs which can be built more frequently than
complete CRLs) via the /config/crl
endpoint. Additionally, tidying of
revoked and expired certificates can be configured automatically via the
/config/auto-tidy
endpoint. Both of these should be enabled to ensure
compatibility with the wider PKIX ecosystem and performance of the cluster.
Spectrum of revocation support
Starting with Vault 1.13, the PKI secrets engine has the ability to support a spectrum of cluster sizes and certificate revocation quantities.
For users with few revocations or who want a unified view and have the inter-cluster bandwidth to support it, we recommend turning on auto rebuilding of CRLs, cross-cluster revocation queues, and cross-cluster CRLs. This allows all consumers of the CRLs to have the most accurate picture of revocations, regardless of which cluster they talk to.
If the unified CRL becomes too big for the underlying storage mechanism or
for a single host to build, we recommend relying on OCSP instead of CRLs.
These have much smaller storage entries, and the CRL disabled
flag is
independent of unified_crls
, allowing unified OCSP to remain.
However, when cross-cluster traffic becomes too high (or if CRLs are still necessary in addition to OCSP), we recommend sharding the CRL between different clusters. This has been the default behavior of Vault, but with the introduction of per-cluster, templated AIA information, the leaf certificate's Authority Information Access (AIA) info will point directly to the cluster which issued it, allowing the correct CRL for this cert to be identified by the application. This more correctly mimics the behavior of Let's Encrypt's CRL sharding.
This sharding behavior can also be used for OCSP, if the cross-cluster traffic for revocation entries becomes too high.
For users who wish to manage revocation manually, using the audit logs to
track certificate issuance would allow an external system to identify which
certificates were issued. These can be manually tracked for revocation, and
a custom CRL can be built
using externally tracked revocations. This would allow usage of roles set to
no_store=true
, so Vault is strictly used as an issuing authority and isn't
storing any certificates, issued or revoked. For the highest of revocation
volumes, this could be the best option.
Notably, this last approach can either be used for the creation of externally stored unified or sharded CRLs. If a single external unified CRL becomes unreasonably large, each cluster's certificates could have AIA info point to an externally stored and maintained, sharded CRL. However, Vault has no mechanism to sign OCSP requests at this time.
What are Cross-Cluster CRLs?
Vault Enterprise supports a clustering mode called Performance Replication. In a replicated PKI Secrets Engine mount, issuer and role information is synced between the Performance Primary and all Performance Secondary clusters. However, each Performance Secondary cluster has its own local storage of issued certificates and revocations which is not synced. In Vault versions before 1.13, this meant that each of these clusters had its own CRL and OCSP data, and any revocation requests needed to be processed on the cluster that issued it (or BYOC used).
Starting with Vault 1.13, we've added two
features to Vault
Enterprise to help manage this setup more correctly and easily: revocation
request queues (cross_cluster_revocation=true
in config/crl
) and unified
revocation entries (unified_crl=true
in config/crl
).
The former allows operators (revoking by serial number) to request a
certificate be revoked regardless of which cluster it was issued on. For
example, if a request goes into the Performance Primary, but it didn't
issue the certificate, it'll write a cross-cluster revocation request,
and mark the results as pending. If another cluster already has this
certificate in storage, it will revoke it and confirm the revocation back
to the main cluster. An operator can list pending
revocations to see
the status of these requests. To clean up invalid requests (e.g., if the
cluster which had that certificate disappeared, if that certificate was
issued with no_store=true
on the role, or if it was an invalid serial
number), an operator can use tidy with
tidy_revocation_queue=true
, optionally shortening
revocation_queue_safety_buffer
to remove them quicker.
The latter allows all clusters to have a unified view of revocations,
that is, to have access to a list of revocations performed by other clusters.
While the configuration parameter includes crl
in the description, this
applies to both CRLs and the
OCSP responder. When this
revocation replication occurs, if any cluster considers a cert revoked when
another doesn't (e.g., via BYOC revocation of a no_store=false
certificate),
all clusters will now consider it revoked assuming it hasn't expired. Notably,
the active node of the primary cluster will be used to rebuild the CRL; as
this can grow large if many clusters have lots of revoked certs, an operator
might need to disable CRL building (disabled=true
in config/crl
) or
increase the storage size.
As an aside, all new cross-cluster writes (from Performance Secondary up to the Performance Primary) are performed synchronously. This gives the caller confidence that the request actually went through, at the expense of incurring a bit higher overhead for revoking certificates. When a node loses its GRPC connection (e.g., during leadership election or being otherwise unable to contact the active primary), errors will occur though the local portion of the write (if any) will still succeed. For cross-cluster revocation requests, due to there being no local write, this means that the operation will need to be retried, but in the event of an issue writing a cross-cluster revocation entry when the cert existed locally, the revocation will eventually be synced across clusters when the connection comes back.
Issuer subjects and CRLs
As noted on several GitHub issues, Go's x509 library has an opinionated parsing and structuring mechanism for certificate's Subjects. Issuers created within Vault are fine, but when using externally created CA certificates, these may not be parsed correctly throughout all parts of the PKI. In particular, CRLs embed a (modified) copy of the issuer name. This can be avoided by using OCSP to track revocation, but note that performance characteristics are different between OCSP and CRLs.
Note: As of Go 1.20 and Vault 1.13, Go correctly formats the CRL's issuer name and this notice does not apply.
Automate leaf certificate renewal
To manage certificates for services at scale, it is best to automate the
certificate renewal as much as possible. Vault Agent has support for
automatically renewing requested certificates
based on the validTo
field. Other solutions might involve using
cert-manager in Kubernetes or OpenShift, backed
by the Vault CA.
Safe minimums
Since its inception, this secrets engine has enforced SHA256 for signature hashes rather than SHA1. As of 0.5.1, a minimum of 2048 bits for RSA keys is also enforced. Software that can handle SHA256 signatures should also be able to handle 2048-bit keys, and 1024-bit keys are considered unsafe and are disallowed in the Internet PKI.
Token lifetimes and revocation
When a token expires, it revokes all leases associated with it. This means that
long-lived CA certs need correspondingly long-lived tokens, something that is
easy to forget. Starting with 0.6, root and intermediate CA certs no longer have
associated leases, to prevent unintended revocation when not using a token with
a long enough lifetime. To revoke these certificates, use the pki/revoke
endpoint.
Safe usage of roles
The Vault PKI Secrets Engine supports many options to limit issuance via
Roles.
Careful consideration of construction is necessary to ensure that more
permissions are not given than necessary. Additionally, roles should generally
do one thing; multiple roles should be preferable over having too permissive
roles that allow arbitrary issuance (e.g., allow_any_name
should generally
be used sparingly, if at all).
allow_any_name
should generally be set tofalse
; this is the default.allow_localhost
should generally be set tofalse
for production services, unless listening onlocalhost
is expected.- Unless necessary,
allow_wildcard_certificates
should generally be set tofalse
. This is not the default due to backwards compatibility concerns.- This is especially necessary when
allow_subdomains
orallow_glob_domains
are enabled.
- This is especially necessary when
enforce_hostnames
should generally be enabled for TLS services; this is the default.allow_ip_sans
should generally be set tofalse
(but defaults totrue
), unless IP address certificates are explicitly required.- When using short TTLs (< 30 days) or with high issuance volume, it is
generally recommend to set
no_store
totrue
(defaults tofalse
). This prevents serial number based revocation, but allows higher throughput as Vault no longer needs to store every issued certificate. This is discussed more in the Replicated Datasets section below. - Do not use roles with root certificates (
issuer_ref
). Root certificates should generally only issue intermediates (see the section on CA hierarchy above), which doesn't rely on roles. - Limit
key_usage
andext_key_usage
; don't attempt to allow all usages for all purposes. Generally the default values are useful for client and server TLS authentication.
Telemetry
Beyond Vault's default telemetry around request processing, PKI exposes count and
duration metrics for the issue, sign, sign-verbatim, and revoke calls. The
metrics keys take the form mount-path,operation,[failure]
with labels for
namespace and role name.
Note that these metrics are per-node and thus would need to be aggregated across nodes and clusters.
Auditing
Because Vault HMACs audit string keys by default, it is necessary to tune PKI secrets mounts to get an accurate view of issuance that is occurring under this mount.
Note: Depending on usage of Vault, CRLs (and rarely, CA chains) can grow to
be rather large. We don't recommend un-HMACing the crl
field for this
reason, but note that the recommendations below suggest to un-HMAC the
certificate
response parameter, which the CRL can be served in via
the /pki/cert/crl
API endpoint. Additionally, the http_raw_body
can
be used to return CRL both in PEM and raw binary DER form, so it is
suggested not to un-HMAC that field to not corrupt the log format.
If this is done with only a syslog audit device,
Vault can deny requests (with an opaque 500 Internal Error
message)
after the action has been performed on the server, because it was
unable to log the message.
The suggested workaround is to either leave the certificate
and crl
response fields HMACed and/or to also enable the file
audit log type.
Some suggested keys to un-HMAC for requests are as follows:
csr
- the requested CSR to sign,certificate
- the requested self-signed certificate to re-sign or when importing issuers,- Various issuance-related overriding parameters, such as:
issuer_ref
- the issuer requested to sign this certificate,common_name
- the requested common name,alt_names
- alternative requested DNS-type SANs for this certificate,other_sans
- other (non-DNS, non-Email, non-IP, non-URI) requested SANs for this certificate,ip_sans
- requested IP-type SANs for this certificate,uri_sans
- requested URI-type SANs for this certificate,ttl
- requested expiration date of this certificate,not_after
- requested expiration date of this certificate,serial_number
- the subject's requested serial number,key_type
- the requested key type,private_key_format
- the requested key format which is also used for the public certificate format as well,
- Various role- or issuer-related generation parameters, such as:
managed_key_name
- when creating an issuer, the requested managed key name,managed_key_id
- when creating an issuer, the requested managed key identifier,ou
- the subject's organizational unit,organization
- the subject's organization,country
- the subject's country code,locality
- the subject's locality,province
- the subject's province,street_address
- the subject's street address,postal_code
- the subject's postal code,permitted_dns_domains
- permitted DNS domains,policy_identifiers
- the requested policy identifiers when creating a role, andext_key_usage_oids
- the extended key usage OIDs for the requested certificate.
Some suggested keys to un-HMAC for responses are as follows:
certificate
- the certificate that was issued,issuing_ca
- the certificate of the CA which issued the requested certificate,serial_number
- the serial number of the certificate that was issued,error
- to show errors associated with the request, andca_chain
- optional due to noise; the full CA chain of the issuer of the requested certificate.
Note: These list of parameters to un-HMAC are provided as a suggestion and may not be exhaustive.
The following keys are suggested NOT to un-HMAC, due to their sensitive nature:
private_key
- this response parameter contains the private keys generated by Vault during issuance, andpem_bundle
this request parameter is only used on the issuer-import paths and may contain sensitive private key material.
Role-Based access
Vault supports path-based ACL Policies for limiting access to various paths within Vault.
The following is a condensed example reference of ACLing the PKI Secrets Engine. These are just a suggestion; other personas and policy approaches may also be valid.
We suggest the following personas:
- Operator; a privileged user who manages the health of the PKI subsystem; manages issuers and key material.
- Agent; a semi-privileged user that manages roles and handles revocation on behalf of an operator; may also handle delegated issuance. This may also be called an administrator or role manager.
- Advanced; potentially a power-user or service that has access to additional issuance APIs.
- Requester; a low-level user or service that simply requests certificates.
- Unauthed; any arbitrary user or service that lacks a Vault token.
For these personas, we suggest the following ACLs, in condensed, tabular form:
Path | Operations | Operator | Agent | Advanced | Requester | Unauthed |
---|---|---|---|---|---|---|
/ca(/pem)? | Read | Yes | Yes | Yes | Yes | Yes |
/ca_chain | Read | Yes | Yes | Yes | Yes | Yes |
/crl(/pem)? | Read | Yes | Yes | Yes | Yes | Yes |
/crl/delta(/pem)? | Read | Yes | Yes | Yes | Yes | Yes |
/cert/:serial(/raw(/pem)?)? | Read | Yes | Yes | Yes | Yes | Yes |
/issuers | List | Yes | Yes | Yes | Yes | Yes |
/issuer/:issuer_ref/(json¦der¦pem) | Read | Yes | Yes | Yes | Yes | Yes |
/issuer/:issuer_ref/crl(/der¦/pem)? | Read | Yes | Yes | Yes | Yes | Yes |
/issuer/:issuer_ref/crl/delta(/der¦/pem)? | Read | Yes | Yes | Yes | Yes | Yes |
/ocsp/<request> | Read | Yes | Yes | Yes | Yes | Yes |
/ocsp | Write | Yes | Yes | Yes | Yes | Yes |
/certs | List | Yes | Yes | Yes | Yes | |
/revoke-with-key | Write | Yes | Yes | Yes | Yes | |
/roles | List | Yes | Yes | Yes | Yes | |
/roles/:role | Read | Yes | Yes | Yes | Yes | |
/(issue¦sign)/:role | Write | Yes | Yes | Yes | Yes | |
/issuer/:issuer_ref/(issue¦sign)/:role | Write | Yes | Yes | Yes | ||
/config/auto-tidy | Read | Yes | Yes | |||
/config/ca | Read | Yes | Yes | |||
/config/crl | Read | Yes | Yes | |||
/config/issuers | Read | Yes | Yes | |||
/crl/rotate | Read | Yes | Yes | |||
/crl/rotate-delta | Read | Yes | Yes | |||
/roles/:role | Write | Yes | Yes | |||
/issuer/:issuer_ref | Read | Yes | Yes | |||
/sign-verbatim(/:role)? | Write | Yes | Yes | |||
/issuer/:issuer_ref/sign-verbatim(/:role)? | Write | Yes | Yes | |||
/revoke | Write | Yes | Yes | |||
/tidy | Write | Yes | Yes | |||
/tidy-cancel | Write | Yes | Yes | |||
/tidy-status | Read | Yes | Yes | |||
/config/auto-tidy | Write | Yes | ||||
/config/ca | Write | Yes | ||||
/config/crl | Write | Yes | ||||
/config/issuers | Write | Yes | ||||
/config/keys | Read, Write | Yes | ||||
/config/urls | Read, Write | Yes | ||||
/issuer/:issuer_ref | Write | Yes | ||||
/issuer/:issuer_ref/revoke | Write | Yes | ||||
/issuer/:issuer_ref/sign-intermediate | Write | Yes | ||||
/issuer/issuer_ref/sign-self-issued | Write | Yes | ||||
/issuers/generate/+/+ | Write | Yes | ||||
/issuers/import/+ | Write | Yes | ||||
/intermediate/generate/+ | Write | Yes | ||||
/intermediate/cross-sign | Write | Yes | ||||
/intermediate/set-signed | Write | Yes | ||||
/keys | List | Yes | ||||
/key/:key_ref | Read, Write | Yes | ||||
/keys/generate/+ | Write | Yes | ||||
/keys/import | Write | Yes | ||||
/root/generate/+ | Write | Yes | ||||
/root/sign-intermediate | Write | Yes | ||||
/root/sign-self-issued | Write | Yes | ||||
/root/rotate/+ | Write | Yes | ||||
/root/replace | Write | Yes |
Note: With managed keys, operators might need access to read the mount
point's tunable data (Read on /sys/mounts
) and
may need access to use or manage managed keys.
Replicated DataSets
When operating with Performance Secondary clusters, certain data-sets are maintained across all clusters, while others for performance and scalability reasons are kept within a given cluster.
The following table breaks down by data type what data sets will cross the cluster boundaries. For data-types that do not cross a cluster boundary, read requests for that data will need to be sent to the appropriate cluster that the data was generated on.
Data Set | Replicated Across Clusters |
---|---|
Issuers & Keys | Yes |
Roles | Yes |
CRL Config | Yes |
URL Config | Yes |
Issuer Config | Yes |
Key Config | Yes |
CRL | No |
Revoked Certificates | No |
Leaf/Issued Certificates | No |
Certificate Metadata | No |
The main effect is that within the PKI secrets engine leaf certificates
issued with no_store
set to false
are stored local to the cluster that issued them.
This allows for both primary and Performance Secondary
clusters' active node to issue certificates for greater scalability. As a
result, these certificates, metadata and any revocations are visible only on the issuing
cluster. This additionally means each cluster has its own set of CRLs, distinct
from other clusters. These CRLs should either be unified into a single CRL for
distribution from a single URI, or server operators should know to fetch all
CRLs from all clusters.
Cluster scalability
Most non-introspection operations in the PKI secrets engine require a write to storage, and so are forwarded to the cluster's active node for execution. This table outlines which operations can be executed on performance standby nodes and thus scale horizontally across all nodes within a cluster.
Path | Operations |
---|---|
ca[/pem] | Read |
cert/serial-number | Read |
cert/ca_chain | Read |
config/crl | Read |
certs | List |
ca_chain | Read |
crl[/pem] | Read |
issue | Update * |
revoke/serial-number | Read |
sign | Update * |
sign-verbatim | Update * |
* Only if the corresponding role has no_store
set to true, generate_lease
set to false and no metadata is being written. If generate_lease
is true the
lease creation will be forwarded to the active node; if no_store
is false
the entire request will be forwarded to the active node.
If no_store_cert_metadata=false
and metadata
argument is provided the entire
request will be forwarded to the active node.
PSS support
Go lacks support for PSS certificates, keys, and CSRs using the rsaPSS
OID
(1.2.840.113549.1.1.10
). It requires all RSA certificates, keys, and CSRs
to use the alternative rsaEncryption
OID (1.2.840.113549.1.1.1
).
When using OpenSSL to generate CAs or CSRs from PKCS8-encoded PSS keys, the
resulting CAs and CSRs will have the rsaPSS
OID. Go and Vault will reject
them. Instead, use OpenSSL to generate or convert to a PKCS#1v1.5 private
key file and use this to generate the CSR. Vault will, depending on the role
and the signing mechanism, still use a PSS signature despite the
rsaEncryption
OID on the request as the SubjectPublicKeyInfo and
SignatureAlgorithm fields are orthogonal. When creating an external CA and
importing it into Vault, ensure that the rsaEncryption
OID is present on
the SubjectPublicKeyInfo field even if the SignatureAlgorithm is PSS-based.
These certificates generated by Go (with rsaEncryption
OID but PSS-based
signatures) are otherwise compatible with the fully PSS-based certificates.
OpenSSL and NSS support parsing and verifying chains using this type of
certificate. Note that some TLS implementations may not support these types
of certificates if they do not support rsa_pss_rsae_*
signature schemes.
Additionally, some implementations allow rsaPSS OID certificates to contain
restrictions on signature parameters allowed by this certificate, but Go and
Vault do not support adding such restrictions.
At this time Go lacks support for signing CSRs with the PSS signature
algorithm. If using a managed key that requires a RSA PSS algorithm (such as GCP or
a PKCS#11 HSM) as a backing for an intermediate CA key, attempting to generate
a CSR (via pki/intermediate/generate/kms
) will fail signature verification.
In this case, the CSR will need to be generated outside of Vault and the
signed final certificate can be imported into the mount.
Go additionally lacks support for creating OCSP responses with the PSS signature algorithm. Vault will automatically downgrade issuers with PSS-based revocation signature algorithms to PKCS#1v1.5, but note that certain KMS devices (like HSMs and GCP) may not support this with the same key. As a result, the OCSP responder may fail to sign responses, returning an internal error.
Issuer storage migration issues
When Vault migrates to the new multi-issuer storage layout on releases prior
to 1.11.6, 1.12.2, and 1.13, and storage write errors occur during the mount
initialization and storage migration process, the default issuer may not
have the correct ca_chain
value and may only have the self-reference. These
write errors most commonly manifest in logs as a message like
failed to persist issuer ... chain to disk: <cause>
and indicate that Vault
was not stable at the time of migration. Note that this only occurs when more
than one issuer exists within the mount (such as an intermediate with root).
To fix this manually (until a new version of Vault automatically rebuilds the issuer chain), a rebuild of the chains can be performed:
curl -X PATCH -H "Content-Type: application/merge-patch+json" -H "X-Vault-Request: true" -H "X-Vault-Token: $(vault print token)" -d '{"manual_chain":"self"}' https://.../issuer/default
curl -X PATCH -H "Content-Type: application/merge-patch+json" -H "X-Vault-Request: true" -H "X-Vault-Token: $(vault print token)" -d '{"manual_chain":""}' https://.../issuer/default
This temporarily sets the manual chain on the default issuer to a self-chain
only, before reverting it back to automatic chain building. This triggers a
refresh of the ca_chain
field on the issuer, and can be verified with:
vault read pki/issuer/default
Tutorial
Refer to the Build Your Own Certificate Authority (CA) guide for a step-by-step tutorial.
Have a look at the PKI Secrets Engine with Managed Keys for more about how to use externally managed keys with PKI.
API
The PKI secrets engine has a full HTTP API. Please see the PKI secrets engine API for more details.