One of my favorite things about Ansible and network automation as a whole is that I can do things in a repeatable manner, super quickly.
One of the worst things about network automation is I can uniformly break things super quickly.
Recently I was working on spinning up a core for a customer, who had some funky VPN stuff, which meant I’d needed to jump traffic through a jumphost (no biggie).
But I’d set ControlPath in my ansible.cfg file with %%h but that’s escaped as %h in the SSH config, except my ansible.cfg is not the same as an SSH config, so it passed it literally.
Protip: It’s just %h not %%h.
This meant it when Ansible created the socket, it didn’t fill with target hostname, so I had a single socket, which happened to be created on the first VM that we connected to (not consistent).
Then all the other commands for all the playbooks were run on a single VM that the socket was on, while Ansible reported it was running the roles and tasks across hosts it wasn’t, as everything was happening on one host.
This was, very confusing to debug.
If I sshed into box X, it’s hostname would show box Y, and it’d have the roles deployed from box Z.
I’ve no idea if anyone else will make the same stupid mistake as I did today, but I probably will, so I wrote this done.