How to Deploy HashiCorp Vault on AWS EKS with IAM Authentication and KMS Auto-Unseal – Part 1: Infrastructure Setup In this guide, we’ll deploy a production-grade Vault setup on AWS EKS. The Vault server will: - Use Raft as a storage backend for high availability and simplicity - Automatically unseal via AWS KMS - Use IAM-based authentication for secure and flexible access control We will provision the infrastructure using Terragrunt and Tofu (a Terraform fork), but you can adapt this to your preferred tooling.
include "root" {
path = find_in_parent_folders()
}
include "aws" {
path = find_in_parent_folders("aws.hcl")
}
dependency "vpc" {
config_path = "${dirname(find_in_parent_folders("aws.tfvars"))}/vpc"
}
dependency "kms" {
config_path = "${dirname(find_in_parent_folders("aws.tfvars"))}/kms/infra"
}
locals {
version = read_terragrunt_config(find_in_parent_folders("versions.hcl")).locals.terraform.eks
config = jsondecode(read_tfvars_file(find_in_parent_folders("config.tfvars")))
}
terraform {
source = "tfr:///terraform-aws-modules/eks/aws?version=${local.version}"
}
inputs = {
cluster_name = basename(dirname(get_terragrunt_dir()))
cluster_version = "1.31"
cluster_endpoint_private_access = true
cluster_endpoint_public_access = false
create_cloudwatch_log_group = true
cluster_enabled_log_types = ["audit", "api", "authenticator"]
enable_cluster_creator_admin_permissions = false
cluster_addons = {
coredns = {
most_recent = true
configuration_values = jsonencode({
tolerations = [
{
key = "CriticalAddonsOnly"
operator = "Equal"
value = "true"
effect = "NoExecute"
}
]
})
}
eks-pod-identity-agent = {
most_recent = true
}
kube-proxy = {
most_recent = true
}
vpc-cni = {
most_recent = true
configuration_values = jsonencode({
env = {
ENABLE_PREFIX_DELEGATION = "true"
WARM_PREFIX_TARGET = "1"
}
})
}
}
vpc_id = dependency.vpc.outputs.vpc_attributes.id
subnet_ids = [for k, v in dependency.vpc.outputs.private_subnet_attributes_by_az : v.id if startswith(k, "workloads")]
eks_managed_node_groups = {
critical-workloads = {
ami_type = "BOTTLEROCKET_ARM_64"
instance_types = ["r8g.medium"]
bootstrap_extra_args = <<-EOT
[settings.kubernetes]
"max-pods" = 20
EOT
disable_api_termination = true
min_size = 3
max_size = 3
desired_size = 3
update_config = {
max_unavailable_percentage = 50
}
block_device_mappings = {
xvda = {
device_name = "/dev/xvda"
ebs = {
volume_size = 2
volume_type = "gp3"
iops = 3000
throughput = 150
encrypted = true
delete_on_termination = true
}
}
xvdb = {
device_name = "/dev/xvdb"
ebs = {
volume_size = 20
volume_type = "gp3"
iops = 3000
throughput = 150
encrypted = true
delete_on_termination = true
}
}
}
metadata_options = {
http_endpoint = "enabled"
http_tokens = "required"
http_put_response_hop_limit = 2
instance_metadata_tags = "disabled"
}
taints = {
addons = {
key = "CriticalAddonsOnly"
value = "true"
effect = "NO_EXECUTE"
},
}
}
vault = {
ami_type = "BOTTLEROCKET_ARM_64"
instance_types = ["r8g.large"]
subnet_ids = [[for k, v in dependency.vpc.outputs.private_subnet_attributes_by_az : v.id if startswith(k, "workloads")][1]]
disable_api_termination = true
min_size = 3
max_size = 3
desired_size = 3
update_config = {
max_unavailable_percentage = 33
}
block_device_mappings = {
xvda = {
device_name = "/dev/xvda"
ebs = {
volume_size = 2
volume_type = "gp3"
iops = 3000
throughput = 150
encrypted = true
delete_on_termination = true
}
}
xvdb = {
device_name = "/dev/xvdb"
ebs = {
volume_size = 20
volume_type = "gp3"
iops = 3000
throughput = 150
encrypted = true
delete_on_termination = true
}
}
}
metadata_options = {
http_endpoint = "enabled"
http_tokens = "required"
http_put_response_hop_limit = 2
instance_metadata_tags = "disabled"
}
taints = {
addons = {
key = "Vault"
value = "true"
effect = "NO_EXECUTE"
}
}
}
}
/*
cluster_security_group_additional_rules = {
ingress_service_acc = {
description = "Allow ingress from service account"
protocol = "-1"
from_port = 0
to_port = 0
cidr_blocks = [dependency.service_account_vpc.outputs.vpc_attributes.cidr_block]
type = "ingress"
}
}
*/
node_security_group_enable_recommended_rules = true
node_security_group_additional_rules = {
#Allow communication between nodes
ingress_self_all = {
description = "self"
protocol = "-1"
from_port = 0
to_port = 0
type = "ingress"
self = true
},
ingress_otel_all = {
description = "OTEL Ingress"
protocol = "-1"
from_port = 0
to_port = 0
type = "ingress"
cidr_blocks = ["10.0.0.0/8"]
},
egress_self_all = {
description = "self"
protocol = "-1"
from_port = 0
to_port = 0
type = "egress"
self = true
}
}
create_kms_key = false
cluster_encryption_config = {
provider_key_arn = dependency.kms.outputs.key_arn
resources = ["secrets"]
}
access_entries = {
account-admin = {
principal_arn = "arn:aws:iam:::role/aws-reserved/sso.amazonaws.com/eu-central-1/AWSReservedSSO_AdministratorAccess_7a967032779ca8f1"
type = "STANDARD"
user_name = "admin:{{SessionName}}"
policy_associations = {
admin = {
policy_arn = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy"
access_scope = {
type = "cluster"
}
}
}
}
}
}
This terragrunt stack will bring new EKS cluster, main point is here that we're creating Managed node group Vault and specify taints, later on we will use this taint to deploy our Vault server.
We define a dedicated node group for Vault and taint it with Vault=NoExecute. This ensures Vault pods are scheduled only on these nodes.
After we succesfully deployed our EKS cluster and we can see that nodes are up and ready we can go further, we need to deploy our Vault server in cluster mode with raft protocol turned on, it's better to use `raft` because you can easily get backup from this `Vault` server.
include "root" {
path = find_in_parent_folders()
}
include "aws" {
path = find_in_parent_folders("aws.hcl")
}
include "helm" {
path = find_in_parent_folders("helm.hcl")
}
dependency "cluster" {
config_path = "${get_original_terragrunt_dir()}/../../../cluster"
}
dependency "kms" {
config_path = "${dirname(find_in_parent_folders("aws.tfvars"))}/kms/vault"
}
dependency "iam_role" {
config_path = "${dirname(find_in_parent_folders("aws.tfvars"))}/../global/iam/role/vault/role"
}
dependency "certs" {
config_path = "${get_original_terragrunt_dir()}/../../cert-manager/certs"
skip_outputs = true
}
terraform {
source = "${get_path_to_repo_root()}//modules/kubernetes/helm"
}
locals {
version = read_terragrunt_config(find_in_parent_folders("versions.hcl")).locals.helm.vault
config = jsondecode(read_tfvars_file(find_in_parent_folders("config.tfvars")))
hostname = "vault.services.internal.paera.com"
}
inputs = {
cluster_name = dependency.cluster.outputs.cluster_name
name = "vault"
chart_name = "vault"
chart_version = local.version
namespace = "vault"
repository = {
url = "https://helm.releases.hashicorp.com"
}
values = {
clusterName = dependency.cluster.outputs.cluster_name
global = {
enabled = true
tlsDisable = false
}
injector = {
enabled = false
}
csi = {
enabled = true
}
server = {
ingress = {
enabled = true
annotations = {
"external-dns.alpha.kubernetes.io/hostname" = local.hostname
"alb.ingress.kubernetes.io/group.name" = "services"
"alb.ingress.kubernetes.io/group.order" = "20"
"alb.ingress.kubernetes.io/ip-address-type" = "ipv4"
"alb.ingress.kubernetes.io/listen-ports" = "[{\"HTTPS\": 443}]"
"alb.ingress.kubernetes.io/load-balancer-name" = "private-services"
"alb.ingress.kubernetes.io/scheme" = "internal"
"alb.ingress.kubernetes.io/ssl-policy" = "ELBSecurityPolicy-TLS13-1-2-Res-2021-06"
"alb.ingress.kubernetes.io/target-type" = "ip"
"alb.ingress.kubernetes.io/backend-protocol" = "HTTPS"
"alb.ingress.kubernetes.io/healthcheck-protocol" = "HTTPS"
}
ingressClassName = "alb"
pathType = "Prefix"
activeService = true
hosts = [
{
host = local.hostname
path = []
}
]
tls = [
{
secretName = "vault-tls"
hosts = [
local.hostname
]
}
]
}
shareProcessNamespace = true
extraContainers = [
{
name = "cert-watcher"
image = ".dkr.ecr.eu-central-1.amazonaws.com/internal-services/vault/vault-cert-reloader:latest"
args = ["/var/run/secrets/vault-tls/tls.crt"]
volumeMounts = [
{
name = "certs"
mountPath = "/var/run/secrets/vault-tls"
readOnly = true
}
]
}
]
extraEnvironmentVars = {
VAULT_CACERT = "/vault/userconfig/tls/ca.crt"
VAULT_TLSCERT = "/vault/userconfig/tls/tls.crt"
VAULT_TLSKEY = "/vault/userconfig/tls/tls.key"
}
volumes = [
{
name = "certs"
secret = {
defaultMode = 420
secretName = "vault-tls"
}
}
]
volumeMounts = [
{
name = "certs"
mountPath = "/vault/userconfig/tls"
}
]
resources = {
requests = {
memory = "8Gi"
cpu = "1000m"
}
limits = {
memory = "15Gi"
cpu = "1700m"
}
}
readinessProbe = {
enabled = true
path = "/v1/sys/health?standbyok=true&sealedcode=204&uninitcode=204"
}
livenessProbe = {
enabled = true
path = "/v1/sys/health?standbyok=true"
initialDelaySeconds = 60
}
dataStorage = {
enabled = true
size = "100Gi"
}
auditStorage = {
enabled = true
}
serviceAccount = {
annotations = {
"eks.amazonaws.com/role-arn" = dependency.iam_role.outputs.iam_role_arn
}
}
standalone = {
enabled = false
}
ha = {
enabled = true
replicas = 3
raft = {
enabled = true
setNodeId = true
config = <<-EOC
cluster_name = "paera-vault"
ui = true
listener "tcp" {
tls_disable = false
address = "[::]:8200"
cluster_address = "[::]:8201"
tls_cert_file = "/vault/userconfig/tls/tls.crt"
tls_key_file = "/vault/userconfig/tls/tls.key"
tls_client_ca_file = "/vault/userconfig/tls/ca.crt"
}
storage "raft" {
path = "/vault/data"
retry_join {
leader_api_addr = "https://vault-0.vault-internal:8200"
leader_client_cert_file = "/vault/userconfig/tls/tls.crt"
leader_client_key_file = "/vault/userconfig/tls/tls.key"
leader_ca_cert_file = "/vault/userconfig/tls/ca.crt"
}
retry_join {
leader_api_addr = "https://vault-1.vault-internal:8200"
leader_client_cert_file = "/vault/userconfig/tls/tls.crt"
leader_client_key_file = "/vault/userconfig/tls/tls.key"
leader_ca_cert_file = "/vault/userconfig/tls/ca.crt"
}
retry_join {
leader_api_addr = "https://vault-2.vault-internal:8200"
leader_client_cert_file = "/vault/userconfig/tls/tls.crt"
leader_client_key_file = "/vault/userconfig/tls/tls.key"
leader_ca_cert_file = "/vault/userconfig/tls/ca.crt"
}
autopilot {
server_stabilization_time = "10s"
last_contact_threshold = "10s"
min_quorum = 3
cleanup_dead_servers = false
dead_server_last_contact_threshold = "10m"
max_trailing_logs = 1000
disable_upgrade_migration = false
}
}
service_registration "kubernetes" {}
seal "awskms" {
region = "${local.config.regions.main}"
kms_key_id = "${dependency.kms.outputs.key_arn}"
}
EOC
}
}
tolerations = [
{
key = "Vault"
operator = "Equal"
value = "true"
effect = "NoExecute"
}
]
}
}
}
This code snippet will use Helm terraform provider and deploy Vault in cluster mode. Please pay attention on fact that I'm using here cert-watcher tool which trigger vault reload if a new cert was added, also you need
cert-manager to generate private cecrets for your private nodes, I'm not cover this because this will exceed article limits. Also please pay attention on fact that we're using KMS
key for auto unsealing our Vault server
After you deployed Vault server using Helm you need to initialize the cluster.
kubectl exec -ti vault-0 -n vault -- sh
export VAULT_ADDR='https://vault-0.vault-internal:8200'
vault operator init
vault login
vault operator raft list-peers
Node Address State Voter
---- ------- ----- -----
vault-0 vault-0.vault-internal:8201 leader true
vault-1 vault-1.vault-internal:8201 follower true
vault-2 vault-2.vault-internal:8201 follower true
kubectl get po -n vault
NAME READY STATUS RESTARTS AGE
vault-0 2/2 Running 0 29d
vault-1 2/2 Running 0 29d
vault-2 2/2 Running 0 29d
After succesfully initialize the cluster you will get a root token and recover key which has very permissive privilleges, store securely. Next series I will cover how to setup IAM authentication, stay tuned.
Powered by Golang net/http package