
IaC Tooling Opinions
tl;dr
Anything but CDK for me please ¯\(ツ)/¯
So… What is this?
These days there’s a choices around IaC tooling for even just AWS, but how much do you care when picking your tooling?
Here’s some of the opinions that I’ve formed over my time using Terraform, Cloudformation and CDK that I hope is also expandable in concept to things like Bicep.
I have mostly been using Terraform since 2020, so my opinions may be outdated and I’m willing to admit that.
I’m choosing to ignore re-usability because you can ship a library in CDK for constructs, Cloudformation now has module support and Terraform has modules too, so it’s not anything I that makes me pick one tool over another.
CDK
Of the three tools I have decent experience with, CDK is actually, personally my least favourite.
It was neat to be able to write TypeScript/C# and get working infrastructure, but it came with the consequence of not truly understanding what was happening underneath.
Create a security group? Great, defaults to 0.0.0.0/0
because you don’t know to look for that because it’s an optional field.
I don’t think I’d hate it if you actively inspected the output to make sure it was sane, but I also just personally dislike that level of abstraction.
I also have a large issue with the abstraction here because I have had graduate engineers say they want to learn AWS using Infrastructure as Code. Amazing. Love it. L3 constructs skip the actual learning though.
If you want to learn CDK, go for it, but I just feel like it’s that, it’s learning the CDK framework, not learning anything else.
If you’re an early stage startup, 100% invested in AWS and just wanting to move fast? Not a bad option. But you might outgrow it at some point.
CloudFormation
CFN is actually kinda nice.
I don’t have to deal with state storage because it just exists.
Stack exports are a really nice way of sharing resources between stacks while ensuring safety.
If I am just wanting something quickly on AWS I still find this to be a great way to get going.
If someone is asking me how to learn AWS, this still the way I would recommend.
If I am doing anything to do with Lambda/API Gateway I will use something CloudFormation based 100% of the time because SAM is just too damn useful for simplifying event mappings.
Another neat thing about Cloudformation (and Cloudformation as well) is failure mode handling. I like that “if things fail it will attempt to rollback”, not just “if I deploy it will end up in a half-complete state I need to rush out another commit to deal with”.
For me the major annoyances are:
- AWS (what I mean here is covered in the Terraform section).
- One stack is one file, this can make breaking out files for readability messy, especially if that then translates into nested stacks and sharing resources between them.
- The need to manually check for state drift.
- The lack of ability to do lookups for other resources at runtime instead of based on inputted parameters.
Terraform
You still find people augmenting their AWS deployments by using o11y platforms like DataDog, New Relic, SumoLogic or additional compute such as Kubernetes, or you may need to do additional steps when setting up a database for things like “setting up additional user accounts so that your applications aren’t logging in with the DB admin credentials”.
CloudFormation is great if you want to limit yourself to AWS, sure, you can use a custom resource but, you know, then you need to deploy the custom resource so that you can use it.
This is where I find Terraform really nice.
It isn’t designed for one vendor. It is designed for extensibility.
If I have a use case, 99% of vendors I care about offer providers and you can always find providers for things like Dominos Pizza.
If I wanted to create a security group to all web
subnets? This is just readable for the intent of what I’m trying to do rather than being reliant on implicit knowledge of what subnet id’s match to what or what CIDR ranges happen to associate to.
data "aws_subnet" "web_subnets" {
filter {
name = "tag:Type"
values = ["web"]
}
}
resource "aws_security_group" "subnet_security_group" {
vpc_id = data.aws_subnet.web_subnets.vpc_id
ingress {
cidr_blocks = [data.aws_subnet.web_subnets.cidr_block]
from_port = 443
to_port = 443
protocol = "tcp"
}
}
Unlike Cloudformation imports though, I don’t get the same level of deletion protection. Terraform will fail if the underlying AWS API fails, but I can’t easily state that a secondary Terraform deploy relies on an item from the first and that a deletion shouldn’t even start due to a consumer being tracked by Cloudformation as a platform concern.
It’s annoying that I need to deploy a resource to store centralised state. It’s annoying that I need to deploy a resource to manage state locks myself 1.
But everything else?
It’s just nice the second I go past the limit of “100% of what I run is on AWS managed services”.
Footnotes
-
I know Terraform Enterprise would cover this, but I have no experience using it and I’m just speaking to my own pain. ↩