An automated systems management tool is essential to the DevOps approach. I decided to start by choosing one of the popular tools. I read the documentation for Chef, Puppet, Salt Stack, and Ansible and decided to use Ansible. I chose Ansible because it does not require a client agent on the managed machine, it is the simplest of the tools I reviewed, and it uses SSH as its communication protocol.
Because of the way Ansible is designed I can also use it to manage my switches and other appliances. If the DevOps goal is full automation, and I think it is, then automated system management tools must work with every device on the network from VoIP phones to routers to servers to printers. Since SSH is standard on most of these devices Ansible can manage them without custom software. Ansible provides the
raw module for this purpose.
I was able to quickly execute tasks in Ansible without having to install special purpose servers. Puppet appears to require 3 or 4 servers which is overkill for my purpose. I suspect that is overkill for a thousand servers but I imagine others would disagree. I am very suspicious of complex frameworks. Reliability in software comes from simplicity and long testing over many years.
Using Ansible these past few weeks I have been able to build scripts to create VMs, set basic configuration settings such as passwordless SSH, install servers, etc. I think Ansible is well designed and can easily perform basic systems administration tasks. I haven’t done enough testing to fully recommend one product over another for a corporate setting but right now I would be inclined to recommend Ansible.
However, I have two big problems with Ansible. The first is that to install a VM from scratch and then configure it requires a lot of complicated steps, some of which require the VM to be offline for a period of time. This seems to be a case Ansible is not designed to handle and I am having difficulty getting it to work. The second issue is testing. My goal, as I have written, is to test every change. This means that every step in an Ansible task has to be tested. Currently, there is no way to do this from Ansible (in fairness to Ansible, none of the other tools seem to be able to do these either.)
This makes me question whether a special tools is actually needed. From a developer’s point of view these tools are constraining because they do not provide full access to the underlying programming language (Python or Ruby). For a developer, a systems administration library would be more useful than the special purpose languages and templates the tools provide. From a system administrator point of view I imagine these tools may already look very complex. I wonder what the right balance is? As system administrators become programmers, perhaps they too will quickly feel constrained by the current tools.
I dislike reinventing the wheel. Still, fully automating an IT infrastructure is going to require sophisticated programming, a level of sophistication beyond the capabilities of the current tools. I may experiment by writing my own systems management code. I’d like to compare my experience with Ansible to custom code. Then I could answer whether or not the tools make sense.
Update: It turns out that Ansible already had a module to handle my VM issue. This is the third time I have identified a need and then found that Ansible had already anticipated it. That is a good sign. Testing is still an issue but I am even more impressed with Ansible. I think I will focus my custom scripting on the testing problem instead.