Original Project 24
Overview
In this project I provisioned a production-grade Elastic Kubernetes Service (EKS) cluster on AWS using Terraform. Rather than clicking through the AWS console, the goal was to define the entire infrastructure as code, networking, state management, and the EKS cluster itself, so it is repeatable, version-controlled, and easy to tear down and rebuild.
This lab is self-contained with one exception: the remote state backend pattern (S3 bucket + DynamoDB lock table) is borrowed from Project 18, where that pattern was first introduced. All networking, EKS configuration, and variable management is built fresh here.
The project is split into two parts:
- Part 1 - Bootstrap remote state storage (S3 + DynamoDB) and configure the VPC/networking layer.
- Part 2 - Define and deploy the EKS cluster using the official Terraform EKS module.
Prerequisites
- AWS CLI installed and configured with appropriate IAM permissions
- Terraform installed (v1.x recommended)
- An AWS account with access to create EKS, VPC, S3, and DynamoDB resources
- Familiarity with Terraform remote state, the S3 bucket and DynamoDB table naming conventions used here (
hector-dev-terraform-bucket,terraform-locks) follow the pattern established in Project 18 (Automate Infrastructure with IaC Using Terraform). That is the only carry-over from prior projects; everything else in this lab is built from scratch. - General familiarity with Kubernetes concepts (pods, nodes, clusters) is helpful but this project does not depend on any prior Kubernetes hands-on work
Part 1 - Remote State Backend & Networking
Why Use a Remote Backend?
By default Terraform stores state locally in a terraform.tfstate file. This becomes a problem in team environments or when working across multiple machines, there is no locking mechanism to prevent two applies from running simultaneously, and state can easily go out of sync.
The solution is to store state remotely in S3 with state locking provided by DynamoDB. Before the backend can be configured in Terraform, the S3 bucket and DynamoDB table must already exist. This creates a bootstrapping challenge: Terraform needs to create resources that Terraform itself will later depend on.
The approach is to run Terraform twice:
- First, with the backend configuration excluded, provision the S3 bucket and DynamoDB table using local state.
- Then, enable the backend configuration and re-run
terraform initto migrate state to S3.
Step 1 - Bootstrap the S3 Bucket and DynamoDB Table
A new working directory called eks/ was created. The backend configuration was placed in a file named backend.tfX (the .tfX extension tells Terraform to ignore it) so it would not interfere during the first apply.
Initial directory structure:
hector@hector-Laptop:~/Project24/eks$ tree
.
βββ backend.tfX β temporarily excluded from Terraform
βββ main.tf
βββ providers.tf
0 directories, 3 files
Running terraform init at this point initializes the project with a local backend and downloads the AWS provider:
hector@hector-Laptop:~/Project24/eks$ terraform init
Initializing the backend...
Initializing provider plugins...
- Finding hashicorp/aws versions matching "~> 4.0"...
- Installing hashicorp/aws v4.26.0...
- Installed hashicorp/aws v4.26.0 (signed by HashiCorp)
Terraform has been successfully initialized!Running terraform apply provisions the four required resources, S3 bucket, bucket versioning, server-side encryption, and the DynamoDB table:
Plan: 4 to add, 0 to change, 0 to destroy.
Enter a value: yes
aws_dynamodb_table.terraform_locks: Creating...
aws_s3_bucket.terraform-state: Creating...
aws_s3_bucket.terraform-state: Creation complete after 2s [id=hector-dev-terraform-bucket]
aws_s3_bucket_versioning.version: Creating...
aws_s3_bucket_server_side_encryption_configuration.first: Creating...
aws_s3_bucket_server_side_encryption_configuration.first: Creation complete after 0s [id=hector-dev-terraform-bucket]
aws_s3_bucket_versioning.version: Creation complete after 1s [id=hector-dev-terraform-bucket]
aws_dynamodb_table.terraform_locks: Still creating... [10s elapsed]
aws_dynamodb_table.terraform_locks: Creation complete after 14s [id=terraform-locks]
Apply complete! Resources: 4 added, 0 changed, 0 destroyed.
Step 2 - Enable the S3 Backend
With the S3 bucket and DynamoDB table in place, the backend configuration was activated by renaming backend.tfX back to backend.tf:
# backend.tf
terraform {
backend "s3" {
bucket = "hector-dev-terraform-bucket"
key = "global/s3/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-locks"
encrypt = true
}
}hector@hector-Laptop:~/Project24/eks$ mv backend.tfX backend.tfRunning terraform init again detects the new backend and offers to migrate the existing local state to S3:
hector@hector-Laptop:~/Project24/eks$ terraform init
Initializing the backend...
Do you want to copy existing state to the new backend?
Pre-existing state was found while migrating the previous "local" backend to the
newly configured "s3" backend. No existing state was found in the newly
configured "s3" backend. Do you want to copy this state to the new "s3"
backend? Enter "yes" to copy and "no" to start with an empty state.
Enter a value: yes
Releasing state lock. This may take a few moments...
Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.
Terraform has been successfully initialized!State is now stored remotely and all subsequent applies benefit from S3 versioning and DynamoDB locking.
Troubleshooting - Backend Initialization Required Error
After renaming backend.tfX to backend.tf, running terraform plan before re-running terraform init produces the following error:
β Error: Backend initialization required, please run "terraform init"
β
β Reason: Initial configuration of the requested backend "s3"
β
β Changes to backend configurations require reinitialization. This allows
β Terraform to set up the new configuration, copy existing state, etc. Please run
β "terraform init" with either the "-reconfigure" or "-migrate-state" flags to
β use the current configuration.
Resolution: Any time the backend block is added or changed, terraform init must be re-run before any other Terraform commands. It is not optional even if everything else is unchanged.
Step 3 - Network Configuration
A network.tf file was created to define the VPC and subnets. This includes:
- A VPC with a configurable CIDR block
- Private and public subnets spread across availability zones
- An Elastic IP and NAT Gateway so private subnet workloads can reach the internet
This networking layer is a prerequisite for the EKS cluster, nodes in private subnets need outbound internet access to pull container images and communicate with AWS services.
Part 2 - EKS Cluster Provisioning
Configuration Files
The following files were created to define the EKS cluster:
| File | Purpose |
|---|---|
eks.tf | Calls the eks module to provision the cluster and managed node groups |
locals.tf | Defines reusable local values shared across configurations |
variables.tf | Declares all input variables (cluster name, instance types, scaling limits, etc.) |
terraform.tfvars | Supplies concrete values for all declared variables (auto-loaded by Terraform) |
data.tf | Data sources, queries available AZs and the current AWS account identity |
The additional input variables added to variables.tf cover cluster scaling, instance types, and user access:
# variables.tf (additions)
variable "admin_users" {
type = list(string)
description = "List of Kubernetes admins."
}
variable "developer_users" {
type = list(string)
description = "List of Kubernetes developers."
}
variable "asg_instance_types" {
type = list(string)
description = "List of EC2 instance machine types to be used in EKS."
}
variable "autoscaling_minimum_size_by_az" {
type = number
description = "Minimum number of EC2 instances to autoscale our EKS cluster on each AZ."
}
variable "autoscaling_maximum_size_by_az" {
type = number
description = "Maximum number of EC2 instances to autoscale our EKS cluster on each AZ."
}
variable "autoscaling_average_cpu" {
type = number
description = "Average CPU threshold to autoscale EKS EC2 instances."
}Variable values were customized for this environment. Note the name_prefix and admin_users were updated from the lab template to reflect personal AWS IAM users:
# terraform.tfvars (customized values)
cluster_name = "tooling-app-eks"
iac_environment_tag = "development"
name_prefix = "hector-eks"
main_network_block = "10.0.0.0/16"
subnet_prefix_extension = 4
zone_offset = 8
admin_users = ["hector", "solomon"]
developer_users = ["leke", "david"]
asg_instance_types = ["t3.small", "t2.small"]
autoscaling_minimum_size_by_az = 1
autoscaling_maximum_size_by_az = 10
autoscaling_average_cpu = 30Note: The
admin_usersanddeveloper_usersvalues must correspond to existing IAM users in your AWS account. An alternative approach is to manage users in a dedicatediam.tffile and reference them via data source ARN interpolation.
Step 4 - Initialize Modules and Deploy
Because eks.tf and network.tf reference external Terraform registry modules, terraform init needed to be run again to download them:
hector@hector-Laptop:~/Project24/eks$ terraform init
Initializing modules...
Downloading registry.terraform.io/terraform-aws-modules/eks/aws 18.27.1 for eks_cluster...
- eks_cluster in .terraform/modules/eks_cluster
- eks_cluster.eks_managed_node_group in .terraform/modules/eks_cluster/modules/eks-managed-node-group
Downloading registry.terraform.io/terraform-aws-modules/vpc/aws 3.14.2 for vpc...
- vpc in .terraform/modules/vpc
Initializing the backend...
Initializing provider plugins...
- Installing hashicorp/cloudinit v2.2.0...
- Installing hashicorp/kubernetes v2.12.1...
- Installing hashicorp/tls v3.4.0...
Terraform has been successfully initialized!Troubleshooting - Module Not Installed Error
Before running terraform init a second time, attempting terraform plan produced this error:
β Error: Module not installed
β
β on eks.tf line 1:
β 1: module "eks_cluster" {
β
β This module is not yet installed. Run "terraform init" to install all modules
β required by this configuration.
β Error: Module not installed
β
β on network.tf line 11:
β 11: module "vpc" {
β
β This module is not yet installed. Run "terraform init" to install all modules
β required by this configuration.
Resolution: Whenever new module blocks are added to any .tf file, terraform init must be run to download them before terraform plan or terraform apply will work.
Troubleshooting - Terraform Prompting for Variable Values Interactively
After the first terraform plan, Terraform stopped and asked for every variable to be typed in by hand:
hector@hector-Laptop:~/Project24/eks$ terraform plan
var.admin_users
List of Kubernetes admins.
Enter a value:
var.asg_instance_types
List of EC2 instance machine types to be used in EKS.
Enter a value:
var.cluster_name
EKS cluster name.
Enter a value:
# ... and so on for every variableRoot Cause: Variable values were stored in a file named variables.tfvars. Terraform does not automatically load .tfvars files with custom names, it only auto-loads files named exactly terraform.tfvars or matching the pattern *.auto.tfvars.
Resolution: The full contents of variables.tfvars were moved into terraform.tfvars (which Terraform loads automatically), and the original file was disabled by renaming it:
# contents moved from variables.tfvars β terraform.tfvars
cluster_name = "tooling-app-eks"
iac_environment_tag = "development"
name_prefix = "hector-eks"
main_network_block = "10.0.0.0/16"
subnet_prefix_extension = 4
zone_offset = 8
admin_users = ["hector", "solomon"]
developer_users = ["leke", "david"]
asg_instance_types = ["t3.small", "t2.small"]
autoscaling_minimum_size_by_az = 1
autoscaling_maximum_size_by_az = 10
autoscaling_average_cpu = 30hector@hector-Laptop:~/Project24/eks$ mv variables.tfvars variables.tfvarsXAfter that, terraform plan ran cleanly without prompting for any values.
Troubleshooting - EKS Cluster Fails Due to Availability Zone Capacity
When attempting to deploy the cluster, terraform apply failed with the following error:
β Error: error creating EKS Cluster (tooling-app-eks):
β UnsupportedAvailabilityZoneException: Cannot create cluster 'tooling-app-eks'
β because us-east-1e, the targeted availability zone, does not currently have
β sufficient capacity to support the cluster.
β Retry and choose from these availability zones:
β us-east-1a, us-east-1b, us-east-1c, us-east-1d, us-east-1f
Root Cause: The data "aws_availability_zones" data source in data.tf was returning all AZs including us-east-1e, which had insufficient EKS capacity at the time. The VPC subnets were being created across all returned AZs, and EKS tried to deploy into us-east-1e.
Attempts to fix this in data.tf:
Attempt 1 - skip_names attribute (not a valid argument):
data "aws_availability_zones" "available_azs" {
state = "available"
skip_names = [us-east-1e] # β Error: Unsupported argument
}Attempt 2 - names attribute (read-only, cannot be set):
data "aws_availability_zones" "available_azs" {
state = "available"
names = ["us-east-1a", "us-east-1b", "us-east-1c", "us-east-1d", "us-east-1f"]
# β Error: Value for unconfigurable attribute
}Attempt 3 - filter with name = "names" (not a valid EC2 filter):
filter {
name = "names"
values = ["us-east-1a", "us-east-1b", "us-east-1c", "us-east-1d", "us-east-1f"]
}
# β Error: The filter 'name' is invalidResolution: Using the correct EC2 filter key zone-name as documented in the AWS EC2 API Reference and the Terraform aws_availability_zones filter block docs:
# data.tf
data "aws_availability_zones" "available_azs" {
state = "available"
filter {
name = "zone-name"
values = ["us-east-1a", "us-east-1b", "us-east-1c", "us-east-1d", "us-east-1f"]
}
}
data "aws_caller_identity" "current" {} # used for accessing Account ID and ARNThe error message itself was the key clue, it said The filter 'name' is invalid, which indicated that name inside the filter block refers to an EC2 API filter key (like zone-name), not a Terraform argument. Looking up the Terraform filter Configuration Block pointed directly to the AWS API docs listing all valid filter keys.
Summary
| Phase | What Was Accomplished |
|---|---|
| Bootstrap | S3 bucket + DynamoDB table provisioned for remote state |
| Backend Migration | Local state migrated to S3; locking enabled via DynamoDB |
| Networking | VPC, private/public subnets, NAT Gateway defined in network.tf |
| EKS Modules | eks.tf, locals.tf, variables.tf, terraform.tfvars configured |
| Cluster Deploy | EKS cluster and managed node groups successfully provisioned |
The biggest learning in this project was understanding Terraformβs bootstrapping order, you cannot configure a backend in the same apply that creates it. The AZ filtering issue also reinforced that Terraform data source filter blocks map directly to AWS API filter parameters, so the Terraform docs and AWS API docs must be read together to use them correctly.