2οΈβ£ Terraform + Ansible + Docker Swarm = π₯

The code used in this blog can be found here.
Weβve a list of hosts provisioned by terraform.
We need a list of things to happen on these hosts.
We need to ensure that those tasks happen.
π₯ (drum roll) β¦
Enter Ansible
. Shell scripts on steroids. Configuration as code.
To keep this brief, I want to install a litany of packages on each machine (π³ + π etc), I want to create a user profile for moi, I want to initiate a Docker Swarm cluster, I want to deploy a stack within Swarm.
I could create a bash script to run all these actions for meβ¦ π€
What happens if I rerun the script? Iβll probably get an error saying that the system is already in the desired state.
I could put a lot of error checking into that script to make sure thereβs no unintended consequences if I run it again.
I could also get the script to report if it had to make a change or if the change had already been executed.
But, if I followed through with all these steps, then Iβve basically went and created my own version of Ansible
!
Ansible ββ
Ansible is a configuration management tool
It helps manage and automate the configuration of systems, applications and infrastructure.
It uses YAML
to write clear and concise playbooks that define a set of automation tasks.
Most of the modules built into Ansible are idempotent i.e. no matter how many times we run the task, we will still get the same result.
IMO, Ansible is still fundamental in DevOps engineering. I find it incredibly useful and powerful. If you have to get info from 100 machines concurrently, or copy data from A to B-Z, Ansible makes this a breeze π₯
Letβs run an ad-hoc ping
command to ensure our terraform provisioned servers are up:
Looking good π
Now, we can expand on this.
Letβs create a set of ansible roles that will configure all these servers for us. Configure
= update packages, install a newer version of python, install the Docker engine, initiate a Swarm, tag each node in the Swarm with some relevant label and finally deploy a stack to the cluster!
This is what our directory looks like π
As you can see, we have 10 roles. Iβll not be explaining every role, much to your delight!
Please refer to the repo for a better understanding of what they actually do.
They should be named in such a way that it should be self intuitive.
For e.g. letβs look at swarm-init
:
In summary, this playbook performs four tasks:
- It ensures that the Docker daemon is running
- It Initialises Docker Swarm if it has not already been initialised.
- It retrieves the join-token for the manager nodes.
- It retrieves the join-token for the worker nodes.
- These tokens are stored in the variables
manager_token
/worker_token
respectively using theregister
keyword.
- These tokens are stored in the variables
So weβve 10 roles we want to run (give or take).
Letβs create a script deploy_swarm.yml
in the root directory which will execute this for us:
Speed round β©
In summary, this playbook automates the deployment and configuration of a Docker Swarm by installing required packages, creating new user accounts, initializing the Swarm, adding nodes as managers and workers, adding labels to nodes, and deploying a Docker stack to illustrate a π΅π’ deployment.
What did we deploy there? π³β
Our last task in the above playbook involved deploying some Docker services to the cluster.
If you cd
into the terraform directory, you should be able to run some remote exec commands to inspect what services were created:
Wow, letβs take a step backβ¦ How did we SSH
into one of the manager nodes?
So, to jump onto a server that Terraform created for us, you mightβve been inclined to copy the IP directly from the Ansible inventory.ini
file (created by terraform) or by going to the AWS EC2 Dashboard and taking the IP from there.
You can do this β¦ but, we donβt need to.
Terraform has a command that can be used to retrieve any information we defined as an output.
Therefore, terraform output swarm_manager_public_ip
returns the public IP of the first Swarm manager!
We used this output command to construct a custom SSH
command which we wrapped in the alias terraform_ssh
.
We can leverage this command to remotely execute commands on the Docker swarm cluster, Γ la:
Ok, we can see three services running:
swarm_green-app
swarm_blue-app
swarm_nginx
In a nutshell, we want to hit an
NGINX
container that has exposed a port80
.This process will act as a reverse-proxy and will forward our request to an appropiate backend application (blue-app or green-app).
We control what backend application NGINX will forward the request to via an ENV variable -
DEPLOYMENT
(shown later)The backend app is written in Python and uses the
Flask
web framework to create a web application.If a GET/POST request is made to a flask endpoint, it will return a welcome message including the
hostname
of the server and the value of the ENV VARDEPLOYMENT
.
The prefix swarm_
is the name of the stack that Ansible asked Docker Swarm to deploy.
It uses a compose
file to create the stack in the cluster:
β οΈ A few important things to note about the above.
-
The configuration for NGINX is generated by a template(
default.template
) using theenvsubst
command.- The values of the environment variables
ACTIVE_BACKEND
andBACKUP_BACKEND
are substituted into the template to generate the final configuration file:/etc/nginx/conf.d/default.conf
. - This is how we will control traffic to blue-green applications.
- We set parallelism to 1 which means 1 container will be updated at a time!
- The values of the environment variables
-
We defined how many replicas of a service we want to run.
- The NGINX, π΅π’ apps have multiple replicas running.
-
Weβve used placement contraints to ensure services run on particular node[s].
- Swarm will try and place containers with the goal of providing maximum resiliency. So, if you had five replicas of one service, it will try and place these containers on five different machines.
- Sometimes, however, you need to control where a container is run. In this case, itβs logical to separate green and blue applications.
- We ran an ansible task that runs a shell script to tag nodes with a specific label (
green
/blue
/worker
/manager
) - We control where services are placed by filtering on these tags/labels.
Controlling traffic between Blue-Green Applications ποΈβ
NGINX diverts the traffic to the backend containers based on what the environment variable ACTIVE_BACKEND
is set to:
The default.template
which NGINX uses to generate its config, uses these ENV vars to create upstream paths (recall the command envsubst
is used for variable substitution):
Therefore, in our current state, all traffic will be diverted to the green-applications π’
We can confirm this by issuing a HTTP GET
request against <manager ip>/home
.
We can change the value of ACTIVE_BACKEND
by updating the nginx service:
β οΈ Note: If you recall our docker-stack.yml
file, we set parallelism to 1
. This dictates how many containers are updated at a time. Therefore, there will be some overlap between green/blue apps when the NGINX service is updated!
If we now go and hit the endpoint again, we should see return messages from our blue applications:
All of the above behaviour can be simulated and observed by running the switch_traffic.sh
bash script:
This will update active backends every 45s:
Finishing upβ
Iβve lost count of how many commands we ran over the course of these last 2 blogs π€·
We initβd Terraform, provisioned infra, configured servers via Ansible, initiated a Docker Swarm cluster β¦ the list goes on.
So, Iβve packaged everything up into one master script called deploy.sh
.
You can provision and deploy all of what we covered with one command π.
Likewise, when youβre finished β¦ just run the script again to destroy everything!
Pass:
1
to create and deploy everything3
to destroy everything
Weβve just provisioned and deployed a full IaC environment highlighting a green-blue deployment π₯³
If this has peaked your interest, head over to the repo and play around!