Ben Koller

Immutable Infrastructure.

Immutable Infrastructure describes the separation of infrastructural state and application. Consecutive deployments spin up unique and explicitly defined infrastructure instead of replacing application artifacts on already existing infrastructure.

In the desire to deliver working software with each deploy without downtimes lie challenges immutable infrastructure attempts to solve.

Background

Startups in their early stages tend to neglect infrastructural design in favour of perceived speed in development. A prototypical production environment is focused on a single server. Backups are added as an afterthought and revolve around cronjobs. Deployments are a concatenation of manual steps and increase in risk each time. Downtime is almost always necessary during deployments. Even distributed designs suffer from the risks introduced by consecutive changes to their infrastructure.

Talentry is no stranger to this and deliberately hired me to mitigate these facts. I will offer a glimpse in the “how”.

CI/CD for your infrastructure

Continuous integration and continuous delivery taught us to build a single application artifact per version and to test it thoroughly before deployment to achieve predictable outcomes.

Each deployment introduces a change to the infrastructure running your application. Consecutive changes blur knowledge about the state of your infrastructure. Configuration management systems like Chef, Puppet, Ansible and Salt are built to solve this by applying a desired state configuration to your infrastructure as needed. In reality however maintenance of state configurations add an immense overhead to your work without providing full coverage of your server’s state, leading to unexpected results.

Spinning up a fresh infrastructure with explicitly defined state provides an environment with predictable results for your deployment. Base images for your servers are built with continuous integration pipelines and fully automated tests providing the same peace of mind for your servers as for your application.

Don’t be fooled - no matter how good your base image is configured, slight adjustments to your infrastructure will always be necessary, and if only to pull the latest application artifact. One of the simpler (and preferable) ways to pass additional configurations to your infrastructure is in form of a bash script delivered as metadata during initial boot, a native option on AWS, Google Compute as well as Microsoft Azure. Other possibilities obviously exist but add additional overhead in maintenance and learning curve.

Ownership

One of the main pillars of embracing DevOps and a mentality of “you build it - you run it” is ownership beyond the code of your applications. The buy-in of creating and maintaining self-healing state definitions required by config management systems is high and the learning curve steep, even for experienced developers. Defining the initial state of an environment is far less of a hurdle as there’s a close resemblance to the initial setup of a development environment.

Additionally will the safety harness provided by automated testing build confidence and help to reduce inhibitions of ownership further.

Close collaboration on base images will be necessary however to solve problems like monitoring and debugging. Bake in a pre-configured Datadog binary. Add a log forwarder like Fluentd, connected to your flavor of log aggregation backend (ELK, Loggly, Splunk). Keep in mind that your infrastructure is immutable and challenge the teams choices of additional software. Do they need vim, and why? Should you add strace and htop? Are you sure you need sudo? Whose keys should be on the box, which users do you absolutely need? Be smart about your choices and bear in mind to start lean.

Deployments and environments

One key aspect of running immutable infrastructure are deployments. Vendor-agnostic tools like terraform and vendor-specific counterparts are good means to establish transparency of infrastructure components and their connections. A good deployment flow can be used without customisation of its inner workings for all environments, in turn making identical environments for production, staging and testing possible.

The practise of blue/green deployments provides another layer of safety during deployments, as the “old” environment is not disposed of immediately during deploy but kept as hot-spare and the redirection of production traffic can be delayed at will. I’ll address this topic at a later point in time with it’s own post.

Wrapping it up

Immutable infrastructure makes servers disposable and allows for quick scaling. It has helped me at Stylight in the past and currently once more helps me at Talentry to bring not only developers closer to their production environments but also to reduce the headaches and pains involved in deployments to unstable and volatile production servers.

By no means should this be seen as a silver bullet. As an example, persistence of data (as required in database operations) is a incredibly tough challenge on its own and won’t always play well the concepts I layed out. Total cost of operations is also something to consider as you’ll inevitably encounter orphaned environments every once in a while and external services might bill you with unfavourable conditions (e.g. AWS bills instances by the hour).

Nonetheless, embracing immutable infrastructure will establish a more healthy operations environment and get you a lot closer to the engineering culture you’re looking for.

Imprint