Although I haven’t really talked about it here, I joined a new startup a couple of months ago called Stackdriver where we’re working on building a hosted solution to make infrastructure monitoring and management suck less for users of the public cloud. After a having to duct tape the various pieces together a couple of times now, it’s super clear that the need is there so it’s exciting to be working on solving it. More on the side of being at a very early startup to come in the future.
Today I had planned to do some work around some of our provisioning and deployment code and Amazon had another EBS outage making the AWS API pretty unavailable for much of the afternoon. So after doing some other things, I took a look at what fails along with EBS to help us remember what fails along with EBS and thought it was interesting enough to share.
I finally got around to trying the Chef omnibus installer and it’s a step up from what I was doing previously but still not great. Grabbing a shell script with curl or wget and piping it to your shell is an anti-pattern which I wish had never taken off. Luckily, in this case, the shell scripts is just pulling down an rpm and installing it. One step nicer would be if there were just a repo that you could use via yum and have things a yum install chef-full away. And as I thought that this afternoon, I remembered the baseurl support in createrepo. Thus, without further ado, I’ve thrown together a quick set of repos that just point to the files in the opscode s3 bucket and minimizes the amount of storage I have to do 😉 If you want to use them, just drop a file into /etc/yum.repos.d named something obvious like chef.repo
name=Chef Omnibus Packages
I’ve only tested the EL6 x86_64 package but I went ahead and created the repos for EL5 and EL6, both i686 and x64_64. Yes, the packages aren’t signed right now. Hopefully that’s something that can be remedied relatively easily. And even better would be if Opscode would just integrate the simple call to createrepo into their build process for the omnibus installer.
I have a decent amount of experience at this point with puppet both from experience using it to manage the infrastructure running Fedora as well as setting it up at a pretty large scale at HubSpot. But in a new gig, I decided it was worth rounding myself out a bit and giving chef a try. Not out of any deep seated dislike of puppet but there are a few pieces that I’ve continued to run up against which are a little grating and so I figured it was worth broadening my horizons. The nice thing is that both are fairly successful open source communities and realistically, as long as you’re using a system, you probably can’t go that wrong or switch in the future.
Side-note: I’ve also been playing with Michael Dehaan’s new project, ansible which is also interesting. But I don’t think it’s mature enough to use for a production environment yet and I also was mostly interested in it as a better remote execution layer as opposed to another full fledged config management tool. But yeah. It’s there. It’s interesting. I’ll probably write more about it later.
With a little bit of chef time under my belt, I have to say that I’m not struck by drastic differences. The terminologies are different, the DSL used on the config side is a bit different but they act pretty similarly and you can get either of them to do what you want. That said, there are a few things (good and bad) that I’ve noticed about chef and figured I’d share for others who are looking at deciding for themselves. Note that a few of the things in the dislikes section may well just be me missing something and being a n00b… suggestions welcome!
Things I’ve Liked
- Hosted Chef is a very very nice option to have. Props to the Opscode team for building an infrastructure to run the server side for youand especially for making the barrier to entry nearly zero by letting you manage up to five hosts for free. Given some of my headaches around running a puppetmaster previously, I’m glad not to be having to pull together everything to run a chef server
- Knife is actually pretty cool. I was skeptical before using it but it does a pretty nice job of encapsulating a lot of common tasks for you
- Knife gets really cool with the addition of the ec2 plugin. Launch servers, register them with hosted chef and have them ready to go. I’ve built all of the surrounding bits and as the environment I’m dealing with grows, I think I’ll grow out of being able to use knife ec2 effectively, but it’s great for an easy starting point
- Chef solo seems to work okay and have a few niceties over a master-less puppet setup but I didn’t spend much time with masterless puppet, so it’s probably just that I didn’t find the related nice pieces
Things I’ve Disliked / Been Annoyed By
- The package support in the Fedora/CentOS/RHEL universe is pretty poor. I realize that all the cool kids use Ubuntu these days but tons of server infrastructures are not. Todd does a great job with the puppet (+ ecosystem) packages for Fedora and EPEL. Would love to see someone do similar for all of the Chef stuff
- A lot of the cookbooks that are out there and published are Ubuntu specific. Even the ones which strive to work across distros often end up coercing the Fedora universe to look more like Debian. Which isn’t necessarily a path I want to go down
- Probably just a side effect of this but a lot of cookbooks using things which aren’t the standard init system (eg, depending on runit)
- knife-ec2 makes you think you can get away with using it but I keep tripping across things it doesn’t support and making me consider abandoning it
- Trying out cookbooks from others drives me crazy. I’m pretty sure I’m missing the good workflow here but polluting my checkout by adding vendor branches and auto-committing things. There’s gotta be something I’m missing here
So am I now a rabid chef fan? Nope. But it’s a nice system with some definite advantages for certain use cases. I suspect I’ll find more of them as I use it more.
Like many people, we use Jenkins at work as our continuous integration server and we require that all changes that are committed go through being built in CI before they can get deployed. Yesterday, someone asked if we could add another jenkins slave to try to reduce the amount of time spent waiting on builds. While the slaves are fully puppetized and so it’s not much work to bring an additional slave online, my own anecdotal experience made me think that we weren’t really held up often in a way that additional slaves would help. I had a vague memory of some graphs within jenkins so eventually found them but didn’t really find them that enlightening. The scale is funky, it’s a weird exponential moving average and I just didn’t find it that easy to get any insight from them.
So last night, I sat down and wrote a quick little script to run via cron and pull some statistics and throw them into graphite. Already with less than a day of data, I’m better able to tell that we end up with a few periods of about ten minutes where having more executors could help that are correlated with when someone does a commit to one of the projects at the base of our dependency tree. So that gives us a lot better idea of whether or not the cost of an additional machine is worth the few minutes that we’d be able to save in those cases.
Since it didn’t look like anyone else had done anything along these lines yet, I put the code up on github. There are a lot more stats that could be pulled out via the jenkins api, this is really just a starting point for what I needed today.