I’ve been a proponent of configuration management with Chef for a while. It’s done amazing things for me and my workplace and I think everyone could benefit from it. But when I talk to people the question always comes up: “How do I get started? Things move slowly here.”. I’m going to share the plan that worked for me. YMMV.
Note - While I talk about Chef, this also goes for Ansible, Salt, Puppet, cfengine, etc.
The plan
First, establish a beachhead. Get the Chef agent on all the servers you can, including your snowflakes. Then, start automating your “new box” workflow so that it’s as hands-off as possible and results in a fairly standardized build with Chef on it. Finally, commit to using those new boxes for everything you do.
Once you have this done, you’ll immediately be able to prove the value of configuration management. You’ll be able to query Chef for things that normally took a while to get (who is running that old kernel version) and be able to automate many ad-hoc tasks (delete that account on all the servers). Over time you can improve to deploy your servers using cookbooks.
Step 1: Chefify all the current infrastructure
Install Chef Server on a separate box. The server is not necessary to get use out of Chef but it makes things easier. More importantly, once you finish this step you’ll immediately be able to store configuration management data and run queries on your infrastructure (once they run the Chef client).
Next, create your first cookbook that will be a very sparse configuration that applies to all of your current infrastructure. When I did it I called it role-minimal
and went down the role cookbook path. The TL;DR of that is that you create a role that only includes the role cookbook so that you get the benefits of versioning.
What do you put in the minimal role? It can be nothing to start, if you want. Or maybe something that’s so benign that no-one could complain, like setting the banner or motd:
1 2 3 4 |
|
and then put your motd in files/default/motd
. This will manage the file and ensure that all nodes have the same MOTD. Who can argue with that? You probably already have an item in your backlog to do that anyway.
The other thing to add to this cookbook is to make the Chef client run on a schedule with the chef-client cookbook
1 2 3 4 5 6 7 8 9 10 |
|
That can go in your attributes for the recipe to run it every half hour, or whatever you want. Don’t forget to include_recipe 'chef-client::cron'
to have Chef manipulate your crontab to add the job.
You may want to create environments if that’s the way you do things.
After this, start bootstrapping your machines with knife bootstrap and a run list containing your new role. Maybe start with the non production servers. Don’t worry too much if people resist, you can leave those servers alone for now.
Now you have Chef running in a low risk fashion. But what’s going to happen? Someone will eventually need something:
- We need to move to LDAP authentication
- We need to ensure that we’re never running an old version of openssl
- We need to know which servers are running which kernels
And then you can put up your hand and say it’s easy, because you happen to have Chef agents on each server, so the easy way would be to leverage that. Except that server that they were concerned about earlier – did they want that one done by hand or should we use our new repeatable automated process? Great, I’ll just go bootstrap that node.
Step 2: Fix your provisioning
This one really depends on how you build new machines. The general idea is that you want to come up with a base machine configuration that everything you do from now on is built from. We use the knife vsphere plugin to create new images with Chef already bootstrapped, but depending on what you use, you may need to search the plugin directory.
Create a second role for all the new stuff. We call ours role-base
. This can be identical to role-minimal
, but you may want to add some stuff to make your life easier. For example, I feel it should be a crime to run a server without sar
, so I have our base role make sure that the sysstat
package is up to date, plus we throw in some other goodies like screen
, strace
, lsof
, and htop
.
After this, commit to using this base image wherever humanly possible. Your boxes will be more consistent.
Step 3: Write cookbooks for new stuff
Sooner or later you’ll have a project that needs some servers. Do what you can to leverage community cookbooks or your own cookbooks to save yourself time and enforce consistency.
The benefit here is that all your MySQL servers will be consistent. You may not be able to fix your old servers, but at least everything new will be consistent. You’ll also be able to create new machines or environments much easier because the base image and the apps will be in code and not some checklist in the wiki. You’ll spend less time worrying about small details and spend more time thinking about bigger picture items.
I don’t have much advice here other than to do it. You’re definitely going to learn as you go and make mistakes. But it’s code, you can correct it and move on.
The other part of this step is to promote this new tool with your co-workers. Show them how knife
tools can make your life easier. Learn how to write cookbooks as a group. Get a culture of reviewing each other’s code so you learn new ways of doing things and share information.
Step 4: while (true) { get_better }
There’s not a while lot to say here. Your first few weeks with Chef are going to be hard, but you’ll find that it gives you so many benefits in consistency and speed that it’s worth it. Mastering devops practices is an ongoing thing.
I recommend contributing patches back to community cookbooks as a way to get better. You will find problems with other people’s cookbooks eventually, and you can submit fixes and have them help other people.
At some point, not doing things in Chef will just seem strange.