All the automated configuration management (CM) tools (e.g,. Chef, Ansible, and Salt) claim that they are ‘idempotent’. They claim this as one of their key features. I will argue here that, on the contrary, they have not yet achieved it. Furthermore, they can only achieve it by changing the way they function. But when they do, it will fundamentally change how systems administration is done. So, what is idempotence and why do I claim it isn’t truly being implemented?
Different vendors define idempotence in slightly different ways. OpsCode defines it to mean that a script “can run multiple times on the same system and the results will always be identical.” Ansible states that its scripts “will seek to avoid changes to the system unless a change needs to be made.” This mean that their scripts can be run over and over again and will only change something when the script and actual server configuration differ. As I will will show below this isn’t actually how they currently function.
First, however, I think their definition is inadequate. I think a better one is: the system administrator defines the desired state of the server and the automated CM tool takes whatever steps necessary to achieve that state. The administrator defines this state with a comprehensive specification and the tool figures out how to achieve it. This is idempotency as state. It isn’t just about automating deployments and patching—those are easy problems. Idempotency, if implemented as a true specification, would also enable security, auditing, monitoring, and many other tasks. If I specify the state of a system, for example, any deviation from that state is a potential security issue.
Idempotency as state blurs traditional product lines. It will transform automated CM tools into security and monitoring tools. The administrator defines the specification and the tool changes the system to meet it and also checks that it remains within spec. Performance measures could even be part of the specification and monitored, i.e., DNS queries shall resolve within 500 ms. on average. Likewise network accessibility rules should be included, i.e., port 25 is only accessibly from subnet x. What I am trying to describe is something like Ansible combined with serverspec, tripwire, and nagios but with extremely sophisticated application logic behind it.
Specific issues with current implementations
Managing running services
I started to bump up against the limits of idempotency in current implementations first with my automated Xen VM deployment script and then more recently with BIND. I realized my VM deployment script wasn’t actually idempotent. If it were then if I changed, say the VM RAM allocation in my script, then it would change the running VM’s RAM allocation. It doesn’t. My script only automates the install process. Useful but not idempotent. This is certainly due to the nature of my script. Perhaps if I wrote a full Ansible module it could be truly idempotent. Perhaps.
But I’ve seen this pattern in other areas as well. All the tools have a problem with running services. They deploy them well, but manage them poorly. Services like BIND accumulate state such as dynamic DNS entries from the DHCP server while running that can differ from the static zone files . Changes to a running server cannot always be made using the
service x reload command the CM tolls provide. Administrators often have to use special admin tools such as
nsupdate for BIND or
omshell for dhcpd, or
fs_cli in Freeswitch. These tool “patch” a running server in a way that cannot be done using file copying and service reload.
A truly idempotent solution would require a module that a) allowed the administrator to specify the state of these service and b) that understood the complex logic needed to change the running server from its current state to the new state. The specification would require a custom written format and probably a unique DSL just for that service. The way you would specify a DNS server configuration is not the same way you would configure a DHCP server or an SMTP server. The tool need the intelligence to understand how to properly and non-disruptively make the changes—all the small steps needed to change a running service at a very fine level of granularity. An idempotent DNS spec needs to be at the record level not at the file level.
Instead of specifying the desired state, the CM tools describe change actions. They allow the administrator to add this or remove that. In Chef, for example, you manage Ruby gem packages like this:
This adds the package “syntax” to whatever other gem packages you happen to have. And that’s the problem. Chef (and the other CM tools) let you add or remove objects like these in the context of some undefined and unknown existing state. This isn’t much better than an old-school shell script. And if you put
action :remove then theoretically that remains in your script forever. Of course, once you run that remove action you will take it out of your script. But if you take an action out once it is done, then you have defeated the purpose of idempoteny. Now what you are running is one-time automated patches.
This same problem occurs with user accounts, yum/apt packages, file and directories, and so on. Instead of the patching approach, I would like to see a true state specification. In the case of gem packages, the administrator would list the gem packages he wants on the system and the CM takes the steps to make sure those and only those gem packages are installed. If a package is installed but not in the specification then it is removed. All changes are logged and saved for the auditing record.
The administrator would do the same for user accounts, software packages, and all aspects of the system configuration. This even applies to open socket ports. If the administrator specifies that port 22 (SSH) and port 25 (SMTP) are open then the automated CM tools makes sure that only those ports are open. It would not do what the current tools do which is open ports 22 and 25 in addition to whatever other ports just happen to be open.
The automated CM tools are still very early in their life-cycle yet they are already very useful and even revolutionary. My goal in writing this is point out an area where they can expand and mature a core concept. Looking a few years ahead I see these tools increasing in sophistication and capability. I am very excited by their potential and offer this critique to help push them forward.
It’s a shame Prolog has fallen out of fashion. It was designed for declarative, state-based cases like this and might implement a smart, automated CM tool better than Ruby or Python. But no matter what the language, the key to making these CM tools better is for them to take a state based specification approach.