The code used in this blog can be found here.

Infrastructure as Code defines your infrastructure in some form of text (typically code).

You would execute this code to define,deploy,update or destroy your Infra.

It’s a concept where you shift your mindset to “treat all aspects of operations as software” đŸ’ģ

Gone are the days where you would log into a cloud console and press buttons to build your resources … instead, these are created using code templates!

The value of IaC is that you have a versioned, easy-to-read source-of-truth. It doubles up as documentation - You don’t have to count down the days until that sysadmin returns from holidays 🏖ī¸ to find out information about your infra, you can step past that bottleneck.

If your infra is defined in code, then developers can take ownership of automating their pipelines and kicking off their own deployments. 🎉

If this workflow is automated, deployments are faster and more reliable. Humans are prone to error… when tired we can miss a step or mistype a command, having this process automated means it will be more consistent and reliable. 🤖

So, let’s dive right into it… đŸ¤ŋ



Creating our infrastructure đŸ“Ļ

We will use the IaC tool Terraform to provision our cloud resources.

⚠ī¸ This blog uses AWS as its cloud provider!

From 50,000 feet, our end goal will be to create of an environment that looks like 👇

png

Terraform will create a VPC with a specific CIDR block.

Within this VPC, there will be multiple subnets that will house X number of EC2 compute instances.

These machines will be provisioned across multiple Availability zones.

TCP/UDP traffic will be allowed (on several ports) between instances that exist within the VPC (This is a requirement for Docker swarm’s network).

Terraform will generate an inventory file which will be used by Ansible afterwards to install the required software needed for swarm.

It will also create a temporary private key that will allow the executor of the terraform script SSH access to each machine.

How Terraform does all this magic is beyond the scope of this blog, but in layman terms

  • We write some Terraform code (using HCL) that describes the infrastructure we want.
  • Terraform converts this code into a plan… showing us exactly what changes it will make to reach the desired state that we defined in the code.
  • Once we’re happy with the plan, terraform applies the changes (under the hood, it makes the necessary API calls to our ☁ī¸ provider to create/modify resources)
  • Terraform keeps track of the state of our infrastructure 🔑, so it knows what resources have been created and what changes have been made!
  • Need to make changes to our Infra? No problem. We can easily update our code and repeat the âŦ†ī¸.
  • Terraform will generate a new plan highlighting the changes and applies the updates to our infra (the state will be updated too!)

Terraform doesn’t force us to use a particular file structure.

If we really wanted to, we can define everything in a single file.

However, as you can imagine, terraform configs can get biggggg. So, it’s desirable that we split our infrastructure into logical groupings called modules.

For this blog, I decided to split this miniproject into three child modules:

  • network - All networking-related infrastructure, including the VPC, subnets and internet gateway
  • security - All security related resources, including security groups / Network ACLs.
  • compute - All compute resources - EC2 instances.

⚠ī¸ NOTE: Every workspace has a root module. This is the directory where we will run terraform plan / terraform apply etc. Within this root module we will have the child modules (as described above).

Our directory layout will therefore look like:

❯ tree . -a -L 2
.
├── modules
│   ├── compute
│   │   ├── main.tf
│   │   ├── outputs.tf
│   │   ├── templates
│   │   │   └── inventory.tmpl
│   │   └── variables.tf
│   ├── network
│   │   ├── main.tf
│   │   ├── outputs.tf
│   │   └── variables.tf
│   └── security
│       ├── main.tf
│       ├── outputs.tf
│       └── variables.tf
├── main.tf
├── outputs.tf
└── variables.tf

So, let’s look at the child modules 👀

Networking 🕸ī¸

Diving into it:


data "aws_availability_zones" "available" {}

# Creating subnets dynamically from CIDR blocks
locals {
  public_subnets = [
    cidrsubnet(aws_vpc.main.cidr_block,8,1),
    cidrsubnet(aws_vpc.main.cidr_block,8,2),
    cidrsubnet(aws_vpc.main.cidr_block,8,3)
  ]
}

# Internet VPC
resource "aws_vpc" "main" {
  cidr_block           = "${var.vpc_cidr_prefix}.0.0/16"
  enable_dns_hostnames = true

  tags = {
    Terraform   = "true"
    Name        = "${var.namespace}-main_vpc"
    Environment = "${var.environment}"
  }
}

# Internet Gateway
resource "aws_internet_gateway" "igw" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name        = "${var.namespace}-igw"
    Environment = "${var.environment}"
  }
}

# Subnets
resource "aws_subnet" "public_subnets" {
  count = length(local.public_subnets)

  vpc_id                  = aws_vpc.main.id
  availability_zone       = element(var.availability_zones, count.index)
  map_public_ip_on_launch = true
  cidr_block = element(local.public_subnets, count.index)
  depends_on = [
    aws_internet_gateway.igw
  ]
  tags = {
    Name        = "${var.namespace}-${format("public_subnet-%03d", count.index)}"
    Environment = "${var.environment}"
  }
}

# Route table for internet gateway
resource "aws_route_table" "public_route_table" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name        = "${var.namespace}-public"
    Environment = var.environment
  }
}

resource "aws_route" "to_public_internet_route" {
  route_table_id         = aws_route_table.public_route_table.id
  destination_cidr_block = "0.0.0.0/0"
  # goes to IGW
  gateway_id = aws_internet_gateway.igw.id
}

resource "aws_route_table_association" "public_subnet_route_table_assoc" {
  count = length(local.public_subnets)

  subnet_id      = aws_subnet.public_subnets[count.index].id
  route_table_id = aws_route_table.public_route_table.id
}

This script creates a virtual private cloud (VPC), public subnets and an internet gateway. ☁ī¸

It sets the CIDR block for the VPC, creates the subnets in specified availability zones, and associates them with a public route table. đŸ—ēī¸

By using the locals block and an in-built terraform function cidrsubnet we create the subnet CIDR blocks programmatically!

For e.g.

cidrsubnet("172.16.0.0/16",8,1)
172.16.1.0/24

The route table contains a default route pointing to the internet gateway, allowing outbound internet access for instances in the public subnets. 🌐

I’m a big advocate of tagging each resource … so I’ve set tags everywhere for easier resource management!

Nice, we’ve got our networking infra setup. Where to next?

Security 👮

Let’s look at the script that will create our security groups:


data "http" "myip" { url = "https://ifconfig.io" }

locals {
  myip = ["${chomp(data.http.myip.response_body)}/32"]
}

resource "aws_security_group" "docker" {
  name        = "docker_swarm"
  description = "docker swarm security group"
  vpc_id      = var.vpc_id
  ingress {
    description = "ssh access from local"
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    cidr_blocks = local.myip
  }
  ingress {
    description = "HTTP access to NGINX"
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
  ingress {
    description = "HTTPS access to NGINX"
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
  ingress {
    description = "Docker swarm management"
    from_port = 2377
    to_port   = 2377
    protocol  = "tcp"
    cidr_blocks = [
      var.vpc_cidr_block
    ]
  }
  ingress {
    description = "Docker container network discovery"
    from_port = 7946
    to_port   = 7946
    protocol  = "tcp"
    cidr_blocks = [
      var.vpc_cidr_block
    ]
  }
  ingress {
    description = "Docker container network discovery"
    from_port = 7946
    to_port   = 7946
    protocol  = "udp"
    cidr_blocks = [
      var.vpc_cidr_block
    ]
  }
  ingress {
    description = "Docker overlay network"
    from_port = 4789
    to_port   = 4789
    protocol  = "tcp"
    cidr_blocks = [
      var.vpc_cidr_block
    ]
  }
  ingress {
    description = "Docker overlay network"
    from_port = 4789
    to_port   = 4789
    protocol  = "udp"
    cidr_blocks = [
      var.vpc_cidr_block
    ]
  }

  ingress {
    description = "Blue-Green Applications"
    from_port = 8080
    to_port   = 8081
    protocol  = "tcp"
    cidr_blocks = [
      var.vpc_cidr_block
    ]
  }

  egress {
    description = "Outside world"
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

In a nutshell, this script will create a Security group with serveral ingress and egress rules.

There’s a few cool things going on in the above…

The data block at the top of the script is retrieving the IP address of the client (me!) making the request to https://ifconfig.io.

It stores this value in a local Terraform variable myip (appending /32 to represent a CIDR range of a single IP address).

The chomp function just removes any trailing whitespaces or newlines from the response body of the HTTP request.

The next resource created is the main player… the security group which will be attached to all EC2 instances in our VPC!

Blasting through it:

  • Port 22 is required for SSH access from your local.
  • Ports 80 and 443 will be used for HTTP/s access to a proxy (NGINX resource to be exact)
  • The rest of the ports is used by Docker Swarm for internal communication.
    • 2377 is for cluster management communications
    • TCP and UDP ports 7946 and 4789 are used for communication among nodes and overlay network traffic respectively.
    • 8081-8082 are ports that will be exposed by Docker containers.
  • The last entry allows communication from the cluster to the outside 🌎 without any restriction.

Ok…

  • [✔ī¸] Networking
  • [✔ī¸] Security

Let’s now write a script which will create some EC2s!

Compute đŸ–Ĩī¸

Looking at:



# -----------------------------------------------------------------------------
# AMI
# Get AMI ID for amazon linux 2 box
# -----------------------------------------------------------------------------
data "aws_ami" "ami" {
  owners      = ["amazon"]
  most_recent = true
  filter {
    name   = "name"
    values = ["amzn2-ami-minimal-hvm-*"]
  }
}

data "aws_subnets" "current" {
  filter {
    name = "vpc-id"
    values = [var.vpc_id]
  }
}

# -----------------------------------------------------------------------------
# Private key
# Creates a temporary private key which will be used for testing ssh access to swarm servers
# -----------------------------------------------------------------------------
resource "tls_private_key" "tls_connector" {
  algorithm = "RSA"
  rsa_bits  = 4096
}

resource "aws_key_pair" "testKeyPair" {
  key_name   = var.key_pair_name
  public_key = tls_private_key.tls_connector.public_key_openssh
}

resource "local_file" "priv_key" {
  content         = tls_private_key.tls_connector.private_key_pem
  filename        = "test.pem"
  file_permission = "0600"
}

resource "local_file" "pub_key" {
  content         = tls_private_key.tls_connector.public_key_openssh
  filename        = "test.pub"
  file_permission = "0600"
}

# -----------------------------------------------------------------------------
# Create EC2 instances
# These resources will create a swarm manager node[s] and swarm workers
# -----------------------------------------------------------------------------
resource "aws_instance" "swarm-manager" {
  count                  = var.swarm_managers
  ami                    = data.aws_ami.ami.id
  subnet_id              = tolist(data.aws_subnets.current.ids)[count.index % length(data.aws_subnets.current.ids)]
  instance_type          = var.swarm_manager_instance
  key_name               = aws_key_pair.testKeyPair.key_name
  vpc_security_group_ids = [var.security_group]
  user_data              = <<EOF
                #!/bin/bash
                sudo hostnamectl set-hostname "swarm-manager-${count.index}"
  EOF

  root_block_device {
    encrypted   = true
    volume_size = var.root_volume_size
    volume_type = "gp2"
  }

  tags = {
    Terraform   = "true"
    Name        = "swarm-manager-${count.index}"
    Environment = "${var.environment}"
  }
}

resource "aws_instance" "swarm-worker" {
  count                  = var.swarm_workers
  ami                    = data.aws_ami.ami.id
  subnet_id              = tolist(data.aws_subnets.current.ids)[count.index % length(data.aws_subnets.current.ids)]
  instance_type          = var.swarm_worker_instance
  key_name               = aws_key_pair.testKeyPair.key_name
  vpc_security_group_ids = [var.security_group]
  user_data              = <<EOF
                #!/bin/bash
                sudo hostnamectl set-hostname "swarm-worker-${count.index}"
  EOF 

  root_block_device {
    encrypted   = true
    volume_size = var.root_volume_size != "" ? var.root_volume_size : "16"
    volume_type = "gp2"
  }

  tags = {
    Terraform   = "true"
    Name        = "swarm-worker-${count.index}"
    Environment = "${var.environment}"
  }
}

# -----------------------------------------------------------------------------
# Generates an ansible inventory file contain worker / manager nodes IPs
# -----------------------------------------------------------------------------
resource "local_file" "ansible_inventory" {
  content = templatefile("./modules/compute/templates/inventory.tmpl",
    {
      manager    = aws_instance.swarm-manager[*].tags["Name"]
      worker     = aws_instance.swarm-worker[*].tags["Name"]
      manager_ip = aws_instance.swarm-manager.*.public_ip
      worker_ip  = aws_instance.swarm-worker.*.public_ip
    }
  )

  filename        = "../ansible/inventory.ini"
  file_permission = "0644"
}

This script is your shop window. 💸

It will create the computing infrastructure required for a Docker swarm cluster to run in the cloud (AWS).

We firstly go and grab the AMI ID for Amazon linux 2.

Since this project is for development purposes only, we will create a temporary RSA private key which will grant us SSH access to the swarm servers.

This key is also used to create an AWS Key Pair.

We’ve defined two resources in the above for creating manager and worker nodes in AWS (aws_instance.swarm-manager & aws_instance.swarm-worker).

These resources take several variables (count,instance_type,vpc_security_group_ids) to spin up the required EC2 instances.

The key-pair that we created earlier is used for governing ssh access.

User data is also passed to each instance to set the hostname dynamically.

Finally, a local file is created containing an Ansible inventory which will list the IPs of the swarm manager[s] / swarm worker[s].

TL;DR All being well, this script should create several manager / worker nodes 🙏

Executing our terraform scripts

So, now that we’ve created the relevant child modules… how do we go about executing each script?

Recall terraform apply will attempt to create the resources specified in a .tf file… so do we cd into the below directories and execute terraform apply in each of them?

  • /modules/network/main.tf
  • /modules/security/main.tf
  • /modules/compute/main.tf

Simple answer: No.

Recall, inside the root module, we create a main.tf file.

This serves as the primary entry point for provisioning infrastructure in Terraform.

This root module will also house:

  • outputs.tf: declarations of all output values.
  • inputs.tf: declarations of all input values.

Looking at main.tf in the root module:


provider "aws" {
  region = var.region
}

module "network" {
  source             = "./modules/network"
  namespace          = var.namespace
  vpc_cidr_prefix    = var.vpc_cidr_prefix
  environment        = var.environment
  availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
}

module "security" {
  source     = "./modules/security"
  depends_on = [module.network]
  vpc_id     = module.network.vpc_id
  vpc_cidr_block = module.network.vpc_cidr
}

module "compute" {
  source                 = "./modules/compute"
  depends_on             = [module.security]
  namespace              = var.namespace
  environment            = var.environment
  swarm_manager_instance = var.swarm_manager_instance
  swarm_worker_instance  = var.swarm_worker_instance
  swarm_managers         = var.swarm_managers
  swarm_workers          = var.swarm_workers
  root_volume_size       = var.root_volume_size
  security_group         = module.security.dockerSG
  vpc_id                 = module.network.vpc_id
}

We can see that each of the modules we discussed above network,security and compute are declared.

Running terraform apply inside the root module will execute this script and in-turn:

  • Create all networking infrastructure
  • Create the required security group
  • Create the EC2 instances which will form the manager & worker nodes of the swarm cluster!
    • Modules depend on each other.
      • The network will be created first, followed by the security group and then lastly the creation of EC2 instances.
    • The provider block specifies that AWS should be used with a region specified by the var.region variable.

Ok, I think that’s enough for this blog post 😴

We’re set up nicely to configure these freshly created terraform resources using Ansible! This will be the focus of the second blog in this series.

We might even deploy a stack to our swarm cluster that illustrates đŸ”ĩđŸŸĸ deployments 👀