As I get my hardware setup in order, I’ve been planning my first automation step: fully automated provisioning and installation. I am basing my plan on a common use case. A company needs to expand its capacity by installing one or more new hosts. Hosts are physical servers. A fully automated install should be able to handle 1 new server or a thousand. It should only be a matter of creating the proper configuration files.
I see the process working as follows:
- Manually install the new server in the rack and plug it into the network.
- The hypervisor is installed on the host automatically via PXE when it first boots.
- The hypervisor is configured by an automated system management tool such as Chef or Ansible.
- A designated number of VMs are installed on the hypervisor.
- The VM’s OS is configured.
- The VM applications are configured per their designated role; for instance one VM might be designated a web server, one a database, one a Hadoop node.
- Whatever software packages are needed for that role are installed and configured.
- Custom data sets are loaded.
- The servers are joined to whatever cluster they are part of.
- Every piece of software installed via this process is tested and verified.
- Every host, every VM, and every application is tested to ensure that the install was correct and is working as expected.
I will write about testing in a separate post. It is an important topic that deserves an expanded discussion. My goal is to apply the test-driven development (TDD) process I use for software development to server deployment. This means everything is tested. Just because Chef didn’t produce an error doesn’t mean it worked.
The other thing I would add is that every component used in this process should be under configuration management (CM). All software repositories are locally managed. All software package versions are base-lined and this baseline is only changed using a change management process. And, of course, all the configuration files for the applications and the automated system management tools are placed in a version control systme such as git.
In some limited tests the biggest problem I have seen so far is how to bootstrap the hostnames. If I install 100 new hosts and I want them to run 100 web server VMs called Web001 through Web100, and 100 to run database VMs named DB001 to DB100 how do I do that? It seems like I need some way to map hosts to VMs to roles (by hostname.) I assume that DHCP will take care of the IP addressing issues automatically so I can always refer to the hosts and VMs by hostname.
I have a lot of research to do!