Like many people, we use Jenkins at work as our continuous integration server and we require that all changes that are committed go through being built in CI before they can get deployed. Yesterday, someone asked if we could add another jenkins slave to try to reduce the amount of time spent waiting on builds. While the slaves are fully puppetized and so it’s not much work to bring an additional slave online, my own anecdotal experience made me think that we weren’t really held up often in a way that additional slaves would help. I had a vague memory of some graphs within jenkins so eventually found them but didn’t really find them that enlightening. The scale is funky, it’s a weird exponential moving average and I just didn’t find it that easy to get any insight from them.
So last night, I sat down and wrote a quick little script to run via cron and pull some statistics and throw them into graphite. Already with less than a day of data, I’m better able to tell that we end up with a few periods of about ten minutes where having more executors could help that are correlated with when someone does a commit to one of the projects at the base of our dependency tree. So that gives us a lot better idea of whether or not the cost of an additional machine is worth the few minutes that we’d be able to save in those cases.
Since it didn’t look like anyone else had done anything along these lines yet, I put the code up on github. There are a lot more stats that could be pulled out via the jenkins api, this is really just a starting point for what I needed today.
Wrote up a nice post that maps pretty well to the Ignite talk I gave at Velocity about using using monitoring to help drive your infrastructure development.
Go check it out over on the HubSpot dev blog
I spent last week out in California for the O’Reilly Velocity Conference. It was in Santa Clara, which I hadn’t been to and frankly, I would be perfectly happy to not return. Parts of California are nice, Santa Clara is an office building wasteland. No good food options, nothing really going on, etc. But I was there for a conference and not for other stuff, so it sufficed.
The conference was actually very good. It has been a few years since I’ve been to a conference between grad school, my daughter being born, and being at a startup where conferences weren’t the priority. But it was good to get back to it. Had a lot of good hallway conversations with people about things that are relevant to us and saw a lot of good presentations. And Velocity is especially relevant to me at this point as it was all about various web performance and operations stuff. Where, unsurprisingly, there’s a lot of cool stuff going on.
I mostly kept to the more operations-y tracks just because they map better to what I’m currently working on. I’ve come away with a bunch of things to look into and posted a whole bunch of choice quotes over on Twitter, but a few takeaways boiled down for here would include
- If you’re using a public cloud provider, plan for things to fail. Build your systems expecting it and you’ll have less pain.
- HubSpot is doing an awesome job with post-mortems. DanM actually posted a great blog post over on our dev blog about things we’ve learned from doing a lot of them.
- DevOps has mostly been about putting developers into ops (hi!) but also needs to be about putting ops into dev
- Web performance has been very successful in tying itself to business metrics. Weirdly, operations has overall been less successful at that
- There’s a lot of work going on to help with debugging and working on webapps for mobile platforms. Very cool.
None of those are particularly earth shattering revelations, but still good to see/hear.
Also, on Tuesday night I did a talk for the Ignite track. So 5 minutes, 20 slides, auto-advancing. My topic was “Just Too Late” and was largely around some things I’ve discovered transitioning into a role where I’m doing more ops stuff and the fact that I feel like I get to things too late. But then turning it around and showing that’s not really so. Stay tuned for a longer blog post on the topic. But the talk went really well. It was fun, a lot of positive feedback and was good for me to get back to it. Looking forward to submitting some (full-length) proposals for talks for some conferences later this year.
I also had a few thoughts on the way conferences have changed since I last went to one
- Twitter really is a pretty big game changer. Lots of conversation on twitter during the conference about which sessions were good, useful tidbits from sessions, etc. I actually felt that the experience was pretty strongly enhanced by it
- Conference wireless still sucks. But you can get decent data now for devices and avoid the use of the conference wireless entirely. This made it easier to stay on twitter during the conference
- An iPad (or other tablet) is a pretty perfect device for looking at stuff during a conference. It sits on your lap so you can just check it sporadically, the battery lasts all day, you can get data from a cellular provider, and it’s reasonably fast.
Anyway, good time was had. Thanks to all the people that I met and chatted up. And hopefully it won’t be as long before I make it to another conference 🙂
I’m still at HubSpot but my role within the company has changed a bit over the past few months. Related to the article that Yoav wrote which was posted on onStartups today about how we’re trying to better empower our engineers and teams to really own things, I’ve shifted my focus some.
Instead of working on the product which is front and center to all of our customers or even working on the free tools at grader.com that millions of people use, I’m now instead focused quite a bit on various infrastructure related things for us. Obviously, I’ve done some of that all along, but at this point, it’s my primary job.
It’s a lot of fun. We are heavy users of EC2 and some of the other Amazon services. We also are using Rackspace Cloud some. And I wouldn’t be surprised if we add another provider in the future. So there is a challenge in making all of these environments look the same for the rest of our dev team as well as our on call folks. We’re also working to make it so that we can easily continue to scale out as our compute needs increase. All the sorts of things that I’ve spent some time thinking about over the years, but there’s no theoretical here — we’re really deploying, managing and everything else a pretty large distributed system. We are using a fair bit of open source stuff in addition to building some stuff ourselves. The first thing was obviously ami-creator but there’s more to come almost certainly. In addition, we’ll probably be doing some work and submitting some patches to improve some of the tools and things that we use as it makes sense to do so.
And as we we are growing like crazy, I’m looking to hire some people to join my team to help us get even more things done. If I were writing a job description it would probably include bits and pieces like Linux administration, python, puppet, probably devops (as it’s something that’s in mind), cloud automation (… even though I still hate the word cloud), release and build tooling, monitoring, and more. Sound interesting? Drop me a line and let’s talk.
I’ve been having to build some new CentOS images to be used with EC2 for work recently. I went into it thinking that it shouldn’t be too big of a deal. I know that some work had been going on in this area and Fedora 14 is now available on EC2, so I figured I could convince the same toolchain to work.
Unfortunately, I was pretty disappointed with my options.
- Do some building by hand on an actual instance, then do the bundling and upload off of the running instance.
- Some of the ThinCrust stuff initially looked promising, but it seems like it’s largely unmaintained these days and the ec2 conversion bits didn’t really work at this point. I was able to get my initial images this way, but mostly by having a wrapper shell script of doom that made me sad.
- There’s always the rPath tools, but I wanted to stick to something more native and fully open source
- The new kid on the block is apparently BoxGrinder but I found it to be a lot over-complicated and not that robust. I’m sorry, but generating your own format that you then transform into a kickstart config and even run through appliance-creator via exec from your ruby tool just felt wrong. No offense, but just felt like a lot more than I wanted to deal with
So, I sat down and spent an evening hacking and have the beginnings of a working ami-creator.
It’s pretty straight-forward and uses all of the python-imgcreate stuff that’s used to build Fedora live images. Your input is a kickstart config and out the other side pops an image that you can bundle and upload to EC2.
Thus far, I’ve tested it to build CentOS 5 and Fedora 14 images. I’m sure there are some bugs but at this point, it’s worth getting it out for more people to play with. Hopefully it’s something that’s a lot simpler and more accessible for people to build images and I think it will also fit in a lot better with having Fedora release engineering building the EC2 images in Fedora 15 if they want.
One of the big outstanding pieces that I still want to add is the necessary bits to be able to (optionally) go ahead and upload and register as an AMI with your EC2 account. But release early, release often.
Comments, etc appreciated in all the normal ways.
Minor update: switched the repo to live on github instead
Amazon’s EC2 service is great for being able to roll out new servers quickly and easily. It’s also really nice because we don’t ever have to worry about physical hardware and can just spin up more instances as we need them for experimenting or whatever.
Unfortunately, they’re still stuck in the dark ages with the newest AMIs available for Fedora being Fedora 8 based. With Fedora 12 around the corner, that’s two years old — something of an eternity in the pace of distribution development. I’d love to help out and build newer images, but while anyone can publish an AMI and make it public, you can’t publish newer kernel images, which really would be needed to use the newer system.
So, if you’re reading this at Amazon or know of someone I can talk with to try to move this forward, please let me know (katzj AT fedoraproject DOT org). I’d really strongly prefer to continue with Fedora and RHEL based images for our systems as opposed to starting to spin up Ubuntu images for the obvious reasons of familiarity.
At HubSpot, we have a pretty wide array of different things being used for the webapps running behind the scenes. This isn’t surprising. There’a also some home-grown scripts (in python, as that’s the scripting language of choice… something I’m not complaining about) to take care of deploying the various webapps. It works, but I really want to get it doing a bit more so that it’s more useful and also get the different scripts doing a bit more sharing of code so that we can improve one place and get the benefits for everything.
Given that this seemed like a pretty typical problem, I figured I’d take a look and see what open source projects exist out there to see if any of them were suitable or could be at least close to a good fit for what we need and want. Unfortunately, I was kind of disappointed…
- Capistrano seems to be the big player in this arena. It was originally written for Rails and still very very strongly shows that heritage. This isn’t necessarily bad, but it makes it a lot harder to get to work if you’re not doing something that’s rails-like. There are some people who have gotten some things working with Java app deployments for tomcat, but they all feel a bit hacky. The other downside for me/us is that Capistrano is very much Ruby-based, both in how its own deployment language looks as well as some of the “how it depends on things working” aspects. Also, the fact that it’s written in Ruby and thus a little bit more difficult for us to hack on if/when we run into problems is a point against. So it’s probably a non-starter for now, or at least a pretty difficult sell
- Fabric is written in python and seems to be following in the footsteps of Capistrano. Right now, it’s far far simpler. This is in some ways good but some of the pieces that we’d want (eg, scm integration) aren’t there and so I’d have to write them. And I’m not sure if the Fabric devs are really interested in expanding in that way; haven’t sent email yet, but planning to tomorrow to feel it out.
- Config Management + Binary deployment is the approach taken in Fedora Infrastructure for app deployment and it seems to be working pretty well there. It might be something to get to eventually, but that’s going to be a longer term thing and I’m not actually convinced that it’s really the best approach. For Fedora it grew out of only a couple of things which could be considered “webapps” and a lot of system config that has turned much later into more webapps. It also pre-supposes a bit more homogenous of an environment than we use at HubSpot from the work I did there
- Func is something that a few people have been working on that I keep wanting to find a use for but it seems a little less well suited to doing a lot of java app building/deployment given that it’s more https/xml-rpc based than shell based.
- Roll your own is what we’re doing now and what it seems like is pretty common. I don’t necessarily like this, but it’s certainly the path of least resistance
So, what am I missing? Is there some great tool out there that I haven’t come found that you’re using for Java (and more) webapp deployments? Bonus points if its python-based and pretty extensible.
One thing that’s quite nice about the new gig is that the office is in Kendall Square. Much, much, much better location-wise than Westford. It means that my commute is just about seven miles which is quite nice to do via bike. Also, if the weather’s bad or I feel lazy, I can take the bus to Alewife from right outside my house and then take the train in.
Unfortunately, I’ve now had two weird pedal failures in the past week. Last Thursday, I was leaving the office and clipped in. As I got about a block away, I noticed my foot moving weirdly on the pedal. As I pulled over to check it out, it became clear that the cleat was stuck in the pedal. After some investigation later, I realized that I lost one of the two screws holding the cleat into the plate in the shoe. It looks like the plate where the screw went in is actually pretty stripped. And in getting the cleat and shoe disengaged from the pedal, I essentially had to take the pedal apart so I decided to switch the pedals out for the plain SPDs instead of the slightly fancier SPDs that were on there.
Today, I was riding home and realized about halfway home that one of the pedals was coming unscrewed from the crank. I made it home without incident and re-installed the pedal without any noticeable problem, but I’m going to be keeping an eye on it over the next few days. Hopefully the crank isn’t stripped — it looked okay, but at this point, I’m a little cautious of it.
Maybe I should look at building a new commuter bike sooner rather than later 🙂 Although I really would like to get the Redline to last another year to year and a half.
As I wrap up my first week at HubSpot, I have a few observations that are at least sort of interesting.
- Real hardware. I’m pretty happy with my current laptop so I just got a desktop machine to use at work. The box I got is a Dell quad core with 8 GB of RAM. Nice box overall and Fedora installed with no problems. The nVidia graphics work fine for 2d and even xrandr seems to be doing the right thing. One thing that is annoying is that Dell is still shipping machines with VT turned off in the BIOS. Once I turned that on, though, KVM is also working pretty well on the box
- Windows is both just as annoying as ever, less annoying and more annoying. You can run it in a virtual machine without real problems. But installing things, the terminals, etc are all still a pain. Stability is a bit improved. The whole “run as administrator” nonsense is a real pain when you’re trying to get a lot of stuff going.
- Coming in at the end of the scrum cycle seems to sort of be a good thing. Get to see the final push and then the demos from that cycle followed by getting to sit in on the planning for the next sprint. I won’t be on a scrum team until the next sprint and so hopefully I’ll have a better frame of reference¡
- Commuting to Kendall Square works really well for me. Okay, I knew this from riding into MIT but it’s still a takeaway. The bike ride in is a nice length; shorter would be fine, but longer really isn’t as practical.
- Complex build processes exist everywhere and are despised everywhere. But it always seems like a build and deployment process is the last thing cared about.
- I’m having a lot of fun being back in a startup environment.
So yeah, all in all, its been a good week. Now for a long weekend. Two four day weeks in a row for me I guess.
The new chapter begins… today was my first day working for HubSpot.
It’s a big change for me as I’ve been doing pretty much purely (fairly) low-level operating system work for a decade now. Going to a company that’s doing much more web development is making me shift how I think about everything from considering using Eclipse rather than a combination of Emacs/vim/terminals to the languages I’m writing in and the types of code I’ll be writing. And I think it’s a change that I need — I’ve been feeling a bit stagnant and so getting out of my comfort zone should help a lot.
Also, I think that HubSpot is doing some interesting stuff and I’m glad to be joining the team to help out in a variety of different ways.