Original Project 24

Overview

In this project I provisioned a production-grade Elastic Kubernetes Service (EKS) cluster on AWS using Terraform. Rather than clicking through the AWS console, the goal was to define the entire infrastructure as code, networking, state management, and the EKS cluster itself, so it is repeatable, version-controlled, and easy to tear down and rebuild.

This lab is self-contained with one exception: the remote state backend pattern (S3 bucket + DynamoDB lock table) is borrowed from Project 18, where that pattern was first introduced. All networking, EKS configuration, and variable management is built fresh here.

The project is split into two parts:

  • Part 1 - Bootstrap remote state storage (S3 + DynamoDB) and configure the VPC/networking layer.
  • Part 2 - Define and deploy the EKS cluster using the official Terraform EKS module.

Prerequisites

  • AWS CLI installed and configured with appropriate IAM permissions
  • Terraform installed (v1.x recommended)
  • An AWS account with access to create EKS, VPC, S3, and DynamoDB resources
  • Familiarity with Terraform remote state, the S3 bucket and DynamoDB table naming conventions used here (hector-dev-terraform-bucket, terraform-locks) follow the pattern established in Project 18 (Automate Infrastructure with IaC Using Terraform). That is the only carry-over from prior projects; everything else in this lab is built from scratch.
  • General familiarity with Kubernetes concepts (pods, nodes, clusters) is helpful but this project does not depend on any prior Kubernetes hands-on work

Part 1 - Remote State Backend & Networking

Why Use a Remote Backend?

By default Terraform stores state locally in a terraform.tfstate file. This becomes a problem in team environments or when working across multiple machines, there is no locking mechanism to prevent two applies from running simultaneously, and state can easily go out of sync.

The solution is to store state remotely in S3 with state locking provided by DynamoDB. Before the backend can be configured in Terraform, the S3 bucket and DynamoDB table must already exist. This creates a bootstrapping challenge: Terraform needs to create resources that Terraform itself will later depend on.

The approach is to run Terraform twice:

  1. First, with the backend configuration excluded, provision the S3 bucket and DynamoDB table using local state.
  2. Then, enable the backend configuration and re-run terraform init to migrate state to S3.

Step 1 - Bootstrap the S3 Bucket and DynamoDB Table

A new working directory called eks/ was created. The backend configuration was placed in a file named backend.tfX (the .tfX extension tells Terraform to ignore it) so it would not interfere during the first apply.

Initial directory structure:

hector@hector-Laptop:~/Project24/eks$ tree
.
β”œβ”€β”€ backend.tfX  ← temporarily excluded from Terraform
β”œβ”€β”€ main.tf
└── providers.tf

0 directories, 3 files

Running terraform init at this point initializes the project with a local backend and downloads the AWS provider:

hector@hector-Laptop:~/Project24/eks$ terraform init
 
Initializing the backend...
 
Initializing provider plugins...
- Finding hashicorp/aws versions matching "~> 4.0"...
- Installing hashicorp/aws v4.26.0...
- Installed hashicorp/aws v4.26.0 (signed by HashiCorp)
 
Terraform has been successfully initialized!

Running terraform apply provisions the four required resources, S3 bucket, bucket versioning, server-side encryption, and the DynamoDB table:

Plan: 4 to add, 0 to change, 0 to destroy.

  Enter a value: yes

aws_dynamodb_table.terraform_locks: Creating...
aws_s3_bucket.terraform-state: Creating...
aws_s3_bucket.terraform-state: Creation complete after 2s [id=hector-dev-terraform-bucket]
aws_s3_bucket_versioning.version: Creating...
aws_s3_bucket_server_side_encryption_configuration.first: Creating...
aws_s3_bucket_server_side_encryption_configuration.first: Creation complete after 0s [id=hector-dev-terraform-bucket]
aws_s3_bucket_versioning.version: Creation complete after 1s [id=hector-dev-terraform-bucket]
aws_dynamodb_table.terraform_locks: Still creating... [10s elapsed]
aws_dynamodb_table.terraform_locks: Creation complete after 14s [id=terraform-locks]

Apply complete! Resources: 4 added, 0 changed, 0 destroyed.

Step 2 - Enable the S3 Backend

With the S3 bucket and DynamoDB table in place, the backend configuration was activated by renaming backend.tfX back to backend.tf:

# backend.tf
terraform {
  backend "s3" {
    bucket         = "hector-dev-terraform-bucket"
    key            = "global/s3/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}
hector@hector-Laptop:~/Project24/eks$ mv backend.tfX backend.tf

Running terraform init again detects the new backend and offers to migrate the existing local state to S3:

hector@hector-Laptop:~/Project24/eks$ terraform init
 
Initializing the backend...
Do you want to copy existing state to the new backend?
  Pre-existing state was found while migrating the previous "local" backend to the
  newly configured "s3" backend. No existing state was found in the newly
  configured "s3" backend. Do you want to copy this state to the new "s3"
  backend? Enter "yes" to copy and "no" to start with an empty state.
 
  Enter a value: yes
 
Releasing state lock. This may take a few moments...
Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.
 
Terraform has been successfully initialized!

State is now stored remotely and all subsequent applies benefit from S3 versioning and DynamoDB locking.

Troubleshooting - Backend Initialization Required Error

After renaming backend.tfX to backend.tf, running terraform plan before re-running terraform init produces the following error:

β”‚ Error: Backend initialization required, please run "terraform init"
β”‚
β”‚ Reason: Initial configuration of the requested backend "s3"
β”‚
β”‚ Changes to backend configurations require reinitialization. This allows
β”‚ Terraform to set up the new configuration, copy existing state, etc. Please run
β”‚ "terraform init" with either the "-reconfigure" or "-migrate-state" flags to
β”‚ use the current configuration.

Resolution: Any time the backend block is added or changed, terraform init must be re-run before any other Terraform commands. It is not optional even if everything else is unchanged.


Step 3 - Network Configuration

A network.tf file was created to define the VPC and subnets. This includes:

  • A VPC with a configurable CIDR block
  • Private and public subnets spread across availability zones
  • An Elastic IP and NAT Gateway so private subnet workloads can reach the internet

This networking layer is a prerequisite for the EKS cluster, nodes in private subnets need outbound internet access to pull container images and communicate with AWS services.


Part 2 - EKS Cluster Provisioning

Configuration Files

The following files were created to define the EKS cluster:

FilePurpose
eks.tfCalls the eks module to provision the cluster and managed node groups
locals.tfDefines reusable local values shared across configurations
variables.tfDeclares all input variables (cluster name, instance types, scaling limits, etc.)
terraform.tfvarsSupplies concrete values for all declared variables (auto-loaded by Terraform)
data.tfData sources, queries available AZs and the current AWS account identity

The additional input variables added to variables.tf cover cluster scaling, instance types, and user access:

# variables.tf (additions)
variable "admin_users" {
  type        = list(string)
  description = "List of Kubernetes admins."
}
variable "developer_users" {
  type        = list(string)
  description = "List of Kubernetes developers."
}
variable "asg_instance_types" {
  type        = list(string)
  description = "List of EC2 instance machine types to be used in EKS."
}
variable "autoscaling_minimum_size_by_az" {
  type        = number
  description = "Minimum number of EC2 instances to autoscale our EKS cluster on each AZ."
}
variable "autoscaling_maximum_size_by_az" {
  type        = number
  description = "Maximum number of EC2 instances to autoscale our EKS cluster on each AZ."
}
variable "autoscaling_average_cpu" {
  type        = number
  description = "Average CPU threshold to autoscale EKS EC2 instances."
}

Variable values were customized for this environment. Note the name_prefix and admin_users were updated from the lab template to reflect personal AWS IAM users:

# terraform.tfvars (customized values)
cluster_name                   = "tooling-app-eks"
iac_environment_tag            = "development"
name_prefix                    = "hector-eks"          
main_network_block             = "10.0.0.0/16"
subnet_prefix_extension        = 4
zone_offset                    = 8
admin_users                    = ["hector", "solomon"]
developer_users                = ["leke", "david"]
asg_instance_types             = ["t3.small", "t2.small"]
autoscaling_minimum_size_by_az = 1
autoscaling_maximum_size_by_az = 10
autoscaling_average_cpu        = 30

Note: The admin_users and developer_users values must correspond to existing IAM users in your AWS account. An alternative approach is to manage users in a dedicated iam.tf file and reference them via data source ARN interpolation.


Step 4 - Initialize Modules and Deploy

Because eks.tf and network.tf reference external Terraform registry modules, terraform init needed to be run again to download them:

hector@hector-Laptop:~/Project24/eks$ terraform init
 
Initializing modules...
Downloading registry.terraform.io/terraform-aws-modules/eks/aws 18.27.1 for eks_cluster...
- eks_cluster in .terraform/modules/eks_cluster
- eks_cluster.eks_managed_node_group in .terraform/modules/eks_cluster/modules/eks-managed-node-group
Downloading registry.terraform.io/terraform-aws-modules/vpc/aws 3.14.2 for vpc...
- vpc in .terraform/modules/vpc
 
Initializing the backend...
 
Initializing provider plugins...
- Installing hashicorp/cloudinit v2.2.0...
- Installing hashicorp/kubernetes v2.12.1...
- Installing hashicorp/tls v3.4.0...
 
Terraform has been successfully initialized!

Troubleshooting - Module Not Installed Error

Before running terraform init a second time, attempting terraform plan produced this error:

β”‚ Error: Module not installed
β”‚
β”‚   on eks.tf line 1:
β”‚    1: module "eks_cluster" {
β”‚
β”‚ This module is not yet installed. Run "terraform init" to install all modules
β”‚ required by this configuration.

β”‚ Error: Module not installed
β”‚
β”‚   on network.tf line 11:
β”‚   11: module "vpc" {
β”‚
β”‚ This module is not yet installed. Run "terraform init" to install all modules
β”‚ required by this configuration.

Resolution: Whenever new module blocks are added to any .tf file, terraform init must be run to download them before terraform plan or terraform apply will work.


Troubleshooting - Terraform Prompting for Variable Values Interactively

After the first terraform plan, Terraform stopped and asked for every variable to be typed in by hand:

hector@hector-Laptop:~/Project24/eks$ terraform plan
var.admin_users
  List of Kubernetes admins.
  Enter a value:
 
var.asg_instance_types
  List of EC2 instance machine types to be used in EKS.
  Enter a value:
 
var.cluster_name
  EKS cluster name.
  Enter a value:
# ... and so on for every variable

Root Cause: Variable values were stored in a file named variables.tfvars. Terraform does not automatically load .tfvars files with custom names, it only auto-loads files named exactly terraform.tfvars or matching the pattern *.auto.tfvars.

Resolution: The full contents of variables.tfvars were moved into terraform.tfvars (which Terraform loads automatically), and the original file was disabled by renaming it:

# contents moved from variables.tfvars β†’ terraform.tfvars
cluster_name                   = "tooling-app-eks"
iac_environment_tag            = "development"
name_prefix                    = "hector-eks"
main_network_block             = "10.0.0.0/16"
subnet_prefix_extension        = 4
zone_offset                    = 8
admin_users                    = ["hector", "solomon"]
developer_users                = ["leke", "david"]
asg_instance_types             = ["t3.small", "t2.small"]
autoscaling_minimum_size_by_az = 1
autoscaling_maximum_size_by_az = 10
autoscaling_average_cpu        = 30
hector@hector-Laptop:~/Project24/eks$ mv variables.tfvars variables.tfvarsX

After that, terraform plan ran cleanly without prompting for any values.


Troubleshooting - EKS Cluster Fails Due to Availability Zone Capacity

When attempting to deploy the cluster, terraform apply failed with the following error:

β”‚ Error: error creating EKS Cluster (tooling-app-eks):
β”‚ UnsupportedAvailabilityZoneException: Cannot create cluster 'tooling-app-eks'
β”‚ because us-east-1e, the targeted availability zone, does not currently have
β”‚ sufficient capacity to support the cluster.
β”‚ Retry and choose from these availability zones:
β”‚ us-east-1a, us-east-1b, us-east-1c, us-east-1d, us-east-1f

Root Cause: The data "aws_availability_zones" data source in data.tf was returning all AZs including us-east-1e, which had insufficient EKS capacity at the time. The VPC subnets were being created across all returned AZs, and EKS tried to deploy into us-east-1e.

Attempts to fix this in data.tf:

Attempt 1 - skip_names attribute (not a valid argument):

data "aws_availability_zones" "available_azs" {
  state      = "available"
  skip_names = [us-east-1e]  # ← Error: Unsupported argument
}

Attempt 2 - names attribute (read-only, cannot be set):

data "aws_availability_zones" "available_azs" {
  state = "available"
  names = ["us-east-1a", "us-east-1b", "us-east-1c", "us-east-1d", "us-east-1f"]
  # ← Error: Value for unconfigurable attribute
}

Attempt 3 - filter with name = "names" (not a valid EC2 filter):

filter {
  name   = "names"
  values = ["us-east-1a", "us-east-1b", "us-east-1c", "us-east-1d", "us-east-1f"]
}
# ← Error: The filter 'name' is invalid

Resolution: Using the correct EC2 filter key zone-name as documented in the AWS EC2 API Reference and the Terraform aws_availability_zones filter block docs:

# data.tf
data "aws_availability_zones" "available_azs" {
  state = "available"
  filter {
    name   = "zone-name"
    values = ["us-east-1a", "us-east-1b", "us-east-1c", "us-east-1d", "us-east-1f"]
  }
}
 
data "aws_caller_identity" "current" {} # used for accessing Account ID and ARN

The error message itself was the key clue, it said The filter 'name' is invalid, which indicated that name inside the filter block refers to an EC2 API filter key (like zone-name), not a Terraform argument. Looking up the Terraform filter Configuration Block pointed directly to the AWS API docs listing all valid filter keys.


Summary

PhaseWhat Was Accomplished
BootstrapS3 bucket + DynamoDB table provisioned for remote state
Backend MigrationLocal state migrated to S3; locking enabled via DynamoDB
NetworkingVPC, private/public subnets, NAT Gateway defined in network.tf
EKS Moduleseks.tf, locals.tf, variables.tf, terraform.tfvars configured
Cluster DeployEKS cluster and managed node groups successfully provisioned

The biggest learning in this project was understanding Terraform’s bootstrapping order, you cannot configure a backend in the same apply that creates it. The AZ filtering issue also reinforced that Terraform data source filter blocks map directly to AWS API filter parameters, so the Terraform docs and AWS API docs must be read together to use them correctly.