Automated Configuration Management: Challenges With Idempotency


All the automated configuration management (CM) tools (e.g,. Chef, Ansible, and Salt) claim that they are ‘idempotent’. They claim this as one of their key features. I will argue here that, on the contrary, they have not yet achieved it. Furthermore, they can only achieve it by changing the way they function. But when they do, it will fundamentally change how systems administration is done. So, what is idempotence and why do I claim it isn’t truly being implemented?

Idempotency Defined

Different vendors define idempotence in slightly different ways. OpsCode defines it to mean that a script “can run multiple times on the same system and the results will always be identical.” Ansible states that its scripts “will seek to avoid changes to the system unless a change needs to be made.” This mean that their scripts can be run over and over again and will only change something when the script and actual server configuration differ. As I will will show below this isn’t actually how they currently function.

First, however, I think their definition is inadequate. I think a better one is: the system administrator defines the desired state of the server and the automated CM tool takes whatever steps necessary to achieve that state. The administrator defines this state with a comprehensive specification and the tool figures out how to achieve it. This is idempotency as state. It isn’t just about automating deployments and patching—those are easy problems. Idempotency, if implemented as a true specification, would also enable security, auditing, monitoring, and many other tasks. If I specify the state of a system, for example, any deviation from that state is a potential security issue.

Idempotency as state blurs traditional product lines. It will transform automated CM tools into security and monitoring tools. The administrator defines the specification and the tool changes the system to meet it and also checks that it remains within spec. Performance measures could even be part of the specification and monitored, i.e., DNS queries shall resolve within 500 ms. on average. Likewise network accessibility rules should be included, i.e., port 25 is only accessibly from subnet x. What I am trying to describe is something like Ansible combined with serverspec, tripwire, and nagios but with extremely sophisticated application logic behind it.

Specific issues with current implementations

Managing running services

I started to bump up against the limits of idempotency in current implementations first with my automated Xen VM deployment script and then more recently with BIND. I realized my VM deployment script wasn’t actually idempotent. If it were then if I changed, say the VM RAM allocation in my script, then it would change the running VM’s RAM allocation. It doesn’t. My script only automates the install process. Useful but not idempotent. This is certainly due to the nature of my script. Perhaps if I wrote a full Ansible module it could be truly idempotent. Perhaps.

But I’ve seen this pattern in other areas as well. All the tools have a problem with running services. They deploy them well, but manage them poorly. Services like BIND accumulate state such as dynamic DNS entries from the DHCP server while running that can differ from the static zone files . Changes to a running server cannot always be made using the service x reload command the CM tolls provide. Administrators often have to use special admin tools such as nsupdate for BIND or omshell for dhcpd, or fs_cli in Freeswitch. These tool “patch” a running server in a way that cannot be done using file copying and service reload.

A truly idempotent solution would require a module that a) allowed the administrator to specify the state of these service and b) that understood the complex logic needed to change the running server from its current state to the new state. The specification would require a custom written format and probably a unique DSL just for that service. The way you would specify a DNS server configuration is not the same way you would configure a DHCP server or an SMTP server. The tool need the intelligence to understand how to properly and non-disruptively make the changes—all the small steps needed to change a running service at a very fine level of granularity. An idempotent DNS spec needs to be at the record level not at the file level.

Full State

Instead of specifying the desired state, the CM tools describe change actions. They allow the administrator to add this or remove that. In Chef, for example, you manage Ruby gem packages like this:

gem_package "syntax" do
  action :install
  ignore_failure true
end

This adds the package “syntax” to whatever other gem packages you happen to have. And that’s the problem. Chef (and the other CM tools) let you add or remove objects like these in the context of some undefined and unknown existing state. This isn’t much better than an old-school shell script. And if you put action :remove then theoretically that remains in your script forever. Of course, once you run that remove action you will take it out of your script. But if you take an action out once it is done, then you have defeated the purpose of idempoteny. Now what you are running is one-time automated patches.

This same problem occurs with user accounts, yum/apt packages, file and directories, and so on. Instead of the patching approach, I would like to see a true state specification. In the case of gem packages, the administrator would list the gem packages he wants on the system and the CM takes the steps to make sure those and only those gem packages are installed. If a package is installed but not in the specification then it is removed. All changes are logged and saved for the auditing record.

The administrator would do the same for user accounts, software packages, and all aspects of the system configuration. This even applies to open socket ports. If the administrator specifies that port 22 (SSH) and port 25 (SMTP) are open then the automated CM tools makes sure that only those ports are open. It would not do what the current tools do which is open ports 22 and 25 in addition to whatever other ports just happen to be open.

Conclusion

The automated CM tools are still very early in their life-cycle yet they are already very useful and even revolutionary. My goal in writing this is point out an area where they can expand and mature a core concept. Looking a few years ahead I see these tools increasing in sophistication and capability. I am very excited by their potential and offer this critique to help push them forward.

It’s a shame Prolog has fallen out of fashion. It was designed for declarative, state-based cases like this and might implement a smart, automated CM tool better than Ruby or Python. But no matter what the language, the key to making these CM tools better is for them to take a state based specification approach.



Categories: DevOps

Tags:

3 replies

  1. Your post makes some very interesting points.

    However, idempotency as a concept has existed in computer science for forty years. It means “can be run more than once and nothing bad happens.” Full stop. It’s not fair to redefine it to mean “how I think a configuration tool should work.”

    It would be much less confusing to simply point out that current tools are not really idempotent, despite claims to the contrary, and then go on to explain how they could become so.

    There is a good deal of literature on declarative programming and constraint propagation, which I really wish more working programmers would read into. It’s so embarrassing to make all the same mistakes over again …

    Like

    • Nick,

      That’s a fair point and I agree that it would have been better to point out that they are not really idempotent.

      My view has changed since I wrote this. I would prefer that the automated CM tools avoid idempotency entirely. Instead I would like configuration to be managed by the OS’s native package manager. For example, it might work something like this:
      – I have a set of application or system configuration files (under version control)
      – This set of configuration files is “compiled” into an RPM package with a version number
      – It is published to a artifact repository such as Artifactory
      – The versioned package is pushed to the servers and installed like any other package
      – If there is a problem this version is rolled back to a previous version of the package
      – If a change to a config file is made then a new RPM is generated and deployed

      These packages are the configuration item (CI) in the configuration management process. The configuration files are the “source code” and the packages are the “binaries”.

      This isn’t fully thought through but that’s the general idea.

      Cheers,
      Aaron

      Like

Trackbacks

  1. Run Your Own Mail Server - The Order of the White Rose

Share Your Ideas

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: