1ī¸âŖ Terraform + Ansible + Docker Swarm = đĨ
The code used in this blog can be found here.
Infrastructure as Code defines your infrastructure in some form of text (typically code).
You would execute this code to define,deploy,update or destroy your Infra.
It’s a concept where you shift your mindset to “treat all aspects of operations as software” đģ
Gone are the days where you would log into a cloud console and press buttons to build your resources … instead, these are created using code templates!
The value of IaC
is that you have a versioned, easy-to-read source-of-truth. It doubles up as documentation - You don’t have to count down the days until that sysadmin returns from holidays đī¸ to find out information about your infra, you can step past that bottleneck.
If your infra is defined in code, then developers can take ownership of automating their pipelines and kicking off their own deployments. đ
If this workflow is automated, deployments are faster and more reliable. Humans are prone to error… when tired we can miss a step or mistype a command, having this process automated means it will be more consistent and reliable. đ¤
So, let’s dive right into it… đ¤ŋ
Creating our infrastructure đĻ⌗
We will use the IaC
tool Terraform to provision our cloud resources.
â ī¸ This blog uses AWS as its cloud provider!
From 50,000 feet, our end goal will be to create of an environment that looks like đ
Terraform will create a VPC with a specific CIDR
block.
Within this VPC, there will be multiple subnets that will house X number of EC2 compute instances.
These machines will be provisioned across multiple Availability zones
.
TCP/UDP
traffic will be allowed (on several ports) between instances that exist within the VPC (This is a requirement for Docker swarm’s network).
Terraform will generate an inventory file which will be used by Ansible
afterwards to install the required software needed for swarm.
It will also create a temporary private key that will allow the executor of the terraform script SSH
access to each machine.
How Terraform does all this magic is beyond the scope of this blog, but in layman terms…
- We write some Terraform code (using
HCL
) that describes the infrastructure we want.- Terraform converts this code into a plan… showing us exactly what changes it will make to reach the desired state that we defined in the code.
- Once we’re happy with the plan, terraform applies the changes (under the hood, it makes the necessary API calls to our âī¸ provider to create/modify resources)
- Terraform keeps track of the state of our infrastructure đ, so it knows what resources have been created and what changes have been made!
- Need to make changes to our Infra? No problem. We can easily update our code and repeat the âŦī¸.
- Terraform will generate a new plan highlighting the changes and applies the updates to our infra (the state will be updated too!)
Terraform doesn’t force us to use a particular file structure.
If we really wanted to, we can define everything in a single file.
However, as you can imagine, terraform configs can get biggggg. So, it’s desirable that we split our infrastructure into logical groupings called modules.
For this blog, I decided to split this miniproject into three child modules:
network
- All networking-related infrastructure, including the VPC, subnets and internet gatewaysecurity
- All security related resources, including security groups / Network ACLs.compute
- All compute resources - EC2 instances.
â ī¸ NOTE: Every workspace has a root
module. This is the directory where we will run terraform plan
/ terraform apply
etc. Within this root module we will have the child modules (as described above).
Our directory layout will therefore look like:
⯠tree . -a -L 2
.
âââ modules
â âââ compute
â â âââ main.tf
â â âââ outputs.tf
â â âââ templates
â â â âââ inventory.tmpl
â â âââ variables.tf
â âââ network
â â âââ main.tf
â â âââ outputs.tf
â â âââ variables.tf
â âââ security
â âââ main.tf
â âââ outputs.tf
â âââ variables.tf
âââ main.tf
âââ outputs.tf
âââ variables.tf
So, let’s look at the child modules đ
Networking đ¸ī¸⌗
Diving into it:
data "aws_availability_zones" "available" {}
# Creating subnets dynamically from CIDR blocks
locals {
public_subnets = [
cidrsubnet(aws_vpc.main.cidr_block,8,1),
cidrsubnet(aws_vpc.main.cidr_block,8,2),
cidrsubnet(aws_vpc.main.cidr_block,8,3)
]
}
# Internet VPC
resource "aws_vpc" "main" {
cidr_block = "${var.vpc_cidr_prefix}.0.0/16"
enable_dns_hostnames = true
tags = {
Terraform = "true"
Name = "${var.namespace}-main_vpc"
Environment = "${var.environment}"
}
}
# Internet Gateway
resource "aws_internet_gateway" "igw" {
vpc_id = aws_vpc.main.id
tags = {
Name = "${var.namespace}-igw"
Environment = "${var.environment}"
}
}
# Subnets
resource "aws_subnet" "public_subnets" {
count = length(local.public_subnets)
vpc_id = aws_vpc.main.id
availability_zone = element(var.availability_zones, count.index)
map_public_ip_on_launch = true
cidr_block = element(local.public_subnets, count.index)
depends_on = [
aws_internet_gateway.igw
]
tags = {
Name = "${var.namespace}-${format("public_subnet-%03d", count.index)}"
Environment = "${var.environment}"
}
}
# Route table for internet gateway
resource "aws_route_table" "public_route_table" {
vpc_id = aws_vpc.main.id
tags = {
Name = "${var.namespace}-public"
Environment = var.environment
}
}
resource "aws_route" "to_public_internet_route" {
route_table_id = aws_route_table.public_route_table.id
destination_cidr_block = "0.0.0.0/0"
# goes to IGW
gateway_id = aws_internet_gateway.igw.id
}
resource "aws_route_table_association" "public_subnet_route_table_assoc" {
count = length(local.public_subnets)
subnet_id = aws_subnet.public_subnets[count.index].id
route_table_id = aws_route_table.public_route_table.id
}
This script creates a virtual private cloud (VPC
), public subnets and an internet gateway. âī¸
It sets the CIDR block for the VPC, creates the subnets in specified availability zones, and associates them with a public route table. đēī¸
By using the
locals
block and an in-built terraform functioncidrsubnet
we create the subnet CIDR blocks programmatically!For e.g.
cidrsubnet("172.16.0.0/16",8,1) 172.16.1.0/24
The route table contains a default route pointing to the internet gateway, allowing outbound internet access for instances in the public subnets. đ
I’m a big advocate of tagging each resource … so I’ve set tags everywhere for easier resource management!
Nice, we’ve got our networking infra setup. Where to next?
Security đŽ⌗
Let’s look at the script that will create our security groups:
data "http" "myip" { url = "https://ifconfig.io" }
locals {
myip = ["${chomp(data.http.myip.response_body)}/32"]
}
resource "aws_security_group" "docker" {
name = "docker_swarm"
description = "docker swarm security group"
vpc_id = var.vpc_id
ingress {
description = "ssh access from local"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = local.myip
}
ingress {
description = "HTTP access to NGINX"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
description = "HTTPS access to NGINX"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
description = "Docker swarm management"
from_port = 2377
to_port = 2377
protocol = "tcp"
cidr_blocks = [
var.vpc_cidr_block
]
}
ingress {
description = "Docker container network discovery"
from_port = 7946
to_port = 7946
protocol = "tcp"
cidr_blocks = [
var.vpc_cidr_block
]
}
ingress {
description = "Docker container network discovery"
from_port = 7946
to_port = 7946
protocol = "udp"
cidr_blocks = [
var.vpc_cidr_block
]
}
ingress {
description = "Docker overlay network"
from_port = 4789
to_port = 4789
protocol = "tcp"
cidr_blocks = [
var.vpc_cidr_block
]
}
ingress {
description = "Docker overlay network"
from_port = 4789
to_port = 4789
protocol = "udp"
cidr_blocks = [
var.vpc_cidr_block
]
}
ingress {
description = "Blue-Green Applications"
from_port = 8080
to_port = 8081
protocol = "tcp"
cidr_blocks = [
var.vpc_cidr_block
]
}
egress {
description = "Outside world"
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
In a nutshell, this script will create a Security group with serveral ingress and egress rules.
There’s a few cool things going on in the above…
The data
block at the top of the script is retrieving the IP address of the client (me!) making the request to https://ifconfig.io
.
It stores this value in a local Terraform variable myip
(appending /32
to represent a CIDR range of a single IP address).
The chomp
function just removes any trailing whitespaces or newlines from the response body of the HTTP request.
The next resource created is the main player… the security group which will be attached to all EC2 instances in our VPC!
Blasting through it:
- Port 22 is required for SSH access from your local.
- Ports 80 and 443 will be used for
HTTP/s
access to a proxy (NGINX resource to be exact) - The rest of the ports is used by Docker Swarm for internal communication.
- 2377 is for cluster management communications
TCP
andUDP
ports 7946 and 4789 are used for communication among nodes and overlay network traffic respectively.- 8081-8082 are ports that will be exposed by Docker containers.
- The last entry allows communication from the cluster to the outside đ without any restriction.
Ok…
- [âī¸] Networking
- [âī¸] Security
Let’s now write a script which will create some EC2s!
Compute đĨī¸⌗
Looking at:
# -----------------------------------------------------------------------------
# AMI
# Get AMI ID for amazon linux 2 box
# -----------------------------------------------------------------------------
data "aws_ami" "ami" {
owners = ["amazon"]
most_recent = true
filter {
name = "name"
values = ["amzn2-ami-minimal-hvm-*"]
}
}
data "aws_subnets" "current" {
filter {
name = "vpc-id"
values = [var.vpc_id]
}
}
# -----------------------------------------------------------------------------
# Private key
# Creates a temporary private key which will be used for testing ssh access to swarm servers
# -----------------------------------------------------------------------------
resource "tls_private_key" "tls_connector" {
algorithm = "RSA"
rsa_bits = 4096
}
resource "aws_key_pair" "testKeyPair" {
key_name = var.key_pair_name
public_key = tls_private_key.tls_connector.public_key_openssh
}
resource "local_file" "priv_key" {
content = tls_private_key.tls_connector.private_key_pem
filename = "test.pem"
file_permission = "0600"
}
resource "local_file" "pub_key" {
content = tls_private_key.tls_connector.public_key_openssh
filename = "test.pub"
file_permission = "0600"
}
# -----------------------------------------------------------------------------
# Create EC2 instances
# These resources will create a swarm manager node[s] and swarm workers
# -----------------------------------------------------------------------------
resource "aws_instance" "swarm-manager" {
count = var.swarm_managers
ami = data.aws_ami.ami.id
subnet_id = tolist(data.aws_subnets.current.ids)[count.index % length(data.aws_subnets.current.ids)]
instance_type = var.swarm_manager_instance
key_name = aws_key_pair.testKeyPair.key_name
vpc_security_group_ids = [var.security_group]
user_data = <<EOF
#!/bin/bash
sudo hostnamectl set-hostname "swarm-manager-${count.index}"
EOF
root_block_device {
encrypted = true
volume_size = var.root_volume_size
volume_type = "gp2"
}
tags = {
Terraform = "true"
Name = "swarm-manager-${count.index}"
Environment = "${var.environment}"
}
}
resource "aws_instance" "swarm-worker" {
count = var.swarm_workers
ami = data.aws_ami.ami.id
subnet_id = tolist(data.aws_subnets.current.ids)[count.index % length(data.aws_subnets.current.ids)]
instance_type = var.swarm_worker_instance
key_name = aws_key_pair.testKeyPair.key_name
vpc_security_group_ids = [var.security_group]
user_data = <<EOF
#!/bin/bash
sudo hostnamectl set-hostname "swarm-worker-${count.index}"
EOF
root_block_device {
encrypted = true
volume_size = var.root_volume_size != "" ? var.root_volume_size : "16"
volume_type = "gp2"
}
tags = {
Terraform = "true"
Name = "swarm-worker-${count.index}"
Environment = "${var.environment}"
}
}
# -----------------------------------------------------------------------------
# Generates an ansible inventory file contain worker / manager nodes IPs
# -----------------------------------------------------------------------------
resource "local_file" "ansible_inventory" {
content = templatefile("./modules/compute/templates/inventory.tmpl",
{
manager = aws_instance.swarm-manager[*].tags["Name"]
worker = aws_instance.swarm-worker[*].tags["Name"]
manager_ip = aws_instance.swarm-manager.*.public_ip
worker_ip = aws_instance.swarm-worker.*.public_ip
}
)
filename = "../ansible/inventory.ini"
file_permission = "0644"
}
This script is your shop window. đ¸
It will create the computing infrastructure required for a Docker swarm cluster to run in the cloud (AWS).
We firstly go and grab the AMI ID for Amazon linux 2.
Since this project is for development purposes only, we will create a temporary RSA
private key which will grant us SSH access to the swarm servers.
This key is also used to create an AWS Key Pair.
We’ve defined two resources in the above for creating manager and worker nodes in AWS (aws_instance.swarm-manager
& aws_instance.swarm-worker
).
These resources take several variables (count
,instance_type
,vpc_security_group_ids
) to spin up the required EC2 instances.
The key-pair that we created earlier is used for governing ssh access.
User data is also passed to each instance to set the hostname dynamically.
Finally, a local file is created containing an Ansible inventory which will list the IPs of the swarm manager[s] / swarm worker[s].
TL;DR All being well, this script should create several manager / worker nodes đ
Executing our terraform scripts⌗
So, now that we’ve created the relevant child modules… how do we go about executing each script?
Recall terraform apply
will attempt to create the resources specified in a .tf
file… so do we cd
into the below directories and execute terraform apply
in each of them?
/modules/network/main.tf
/modules/security/main.tf
/modules/compute/main.tf
Simple answer: No.
Recall, inside the root module, we create a main.tf
file.
This serves as the primary entry point for provisioning infrastructure in Terraform.
This root
module will also house:
outputs.tf
: declarations of all output values.inputs.tf
: declarations of all input values.
Looking at main.tf
in the root module:
provider "aws" {
region = var.region
}
module "network" {
source = "./modules/network"
namespace = var.namespace
vpc_cidr_prefix = var.vpc_cidr_prefix
environment = var.environment
availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
}
module "security" {
source = "./modules/security"
depends_on = [module.network]
vpc_id = module.network.vpc_id
vpc_cidr_block = module.network.vpc_cidr
}
module "compute" {
source = "./modules/compute"
depends_on = [module.security]
namespace = var.namespace
environment = var.environment
swarm_manager_instance = var.swarm_manager_instance
swarm_worker_instance = var.swarm_worker_instance
swarm_managers = var.swarm_managers
swarm_workers = var.swarm_workers
root_volume_size = var.root_volume_size
security_group = module.security.dockerSG
vpc_id = module.network.vpc_id
}
We can see that each of the modules we discussed above network
,security
and compute
are declared.
Running terraform apply
inside the root module will execute this script and in-turn:
- Create all networking infrastructure
- Create the required security group
- Create the EC2 instances which will form the manager & worker nodes of the swarm cluster!
- Modules depend on each other.
- The network will be created first, followed by the security group and then lastly the creation of EC2 instances.
- The provider block specifies that
AWS
should be used with a region specified by thevar.region
variable.
- Modules depend on each other.
Ok, I think that’s enough for this blog post đ´
We’re set up nicely to configure these freshly created terraform resources using Ansible! This will be the focus of the second blog in this series.
We might even deploy a stack to our swarm cluster that illustrates đĩđĸ deployments đ