Sean’s Obsessions

Sean Walberg’s blog

Managing Servers With Cfengine

As part of my work with b5 I’m wrangling with 10 servers now. When I started out it was 3, so keeping configurations in sync was pretty easy, a bit of shell scripting, a bit of rsync. But now we’ve got server roles, we’ve got config files that are slightly different per server, plus the constant need to roll out minor changes to files across different nodes.

I’ve had my eyes on cfengine for a little while. It’s a system that runs agents on all your servers, you define a policy file that ensures files are at the right version, packages are correct, and can run scripts and other such magic. I finally got around to getting it running on 3 servers to develop the policy, and finish the rest off soon. We used to have a full page of things to do on new servers, it’s now down to about half a page, and most of that has to do with editing internal scripts to recognize the new servers.

The power of cfengine is that you can assign roles to servers and have different actions based on roles. For example, I can define my nis clients and servers like this:

1
2
3
classes:
        nisservers = ( FileExists(/var/yp/nisdomain) )
        nisclients = ( any -nisservers )

And then make sure my nis clients have the right configuration:

1
2
3
4
5
6
editfiles:
       nisclients::
        { /etc/yp.conf
        AppendIfNoSuchLine "ypserver server1"
        AppendIfNoSuchLine "ypserver server2"
        }

You can even get really fancy and look for particular lines, such as changing mount options in fstab:

1
2
3
4
5
6
7
8
9
10
11
12
       nfsclients::
        { /etc/fstab
        Backup "true"
        BeginGroupIfNoLineMatching "^nfsserver:/home .*"
                Append "nfsserver:/home /home nfs rw,nosuid,soft,rsize=8192,wsize=8192,noatime,acregmin=10 0 0"
                DefineClasses mounthome
        EndGroup
        LocateLineMatching "^nfsserver:/home .*"
        # These lines had better be identical!!! probably move to a var
        ReplaceLineWith "nfsserver:/home /home nfs rw,nosuid,soft,rsize=8192,wsize=8192,noatime,acregmin=10 0 0"
        DefineClasses remounthome
        }

If either one of those succeed, they define an additional class that lets me run a shell command:

1
2
3
4
5
6
7
8
control:
  # needs to be predefined
  AddInstallable = ( mounthome remounthome )
shellcommands:
        mounthome::
                "/bin/mount /home "
        remounthome::
                "/bin/mount /home -o remount"

If cfengine has to change fstab, then home will be mounted/remounted automatically.

It’s a bit difficult to provide a full tutorial, but I’ve managed to collect some links in del.icio.us. I’d suggest reading the quick start, and getting one or two servers simply talking over the cfengine protocol. Then try to do some file copies to get the hang of that, and move on to editfiles. Once you have an idea of what you need to do, ie managing packages, managing sudo access, do google searches for those (I have some in my links too), and implement those. By the time you get that far, you should be able to use the reference guide to figure out the specific commands to do what you want.

Save Load With Memcached

Over at b5media we have some internal code that ties all the blogs together to make us a network. This was initially implemented as a brilliant little REST API, only requiring minor changes in some database tables that could completely alter how the network was arranged. Additional functionality could be added easily to the API, and all the sites could make use of it.

All was good until we started noticing things were slowing down. Web slaves were CPU bound, and we were consumed with hits to the API. Stepping back and looking at what was going on, we soon realized that REST was an inefficient way of achieving what we needed at the volume we needed.

Enter memcached, a distributed hash table of sorts. In a nutshell, a daemon runs on machines with free memory that responds to network requests using a simple protocol. Client libraries connect and can pull data out by its key, or put data in. By hashing the key in an agreed upon manner the load can be evenly distributed across any number of memcached instances (just read how facebook.com uses over 200 instances of memcached).

A server’s profile can be characterized in terms of its CPU, network, disk (I/O and space), and memory usage. Web servers tend to be CPU, network, and disk I/O heavy, so it’s great to be able to make use of the unused memory. Even at peak loads when the server is running busy, memcached barely registers on the charts. It’s light on CPU and uses epoll(4) for efficiency and performance.

So after seeing memcached for the first time, our development ninjas were able to rewrite the template code that calls the API to cache the data, resulting in much faster page loads, reduced CPU on the web slaves, and a drastic cut in database load.

I’m looking forward to exploring other opportunities for caching in the b5 architecture because of the success we’ve seen so far.

Speaking of opportunities, b5 is looking to hire a full time server administrator because the work needed has grown beyond what I can do as a part timer. You’d get to work with a great team who is passionate about what they do, you’d get to work from home, and you’d get the freedom to take the infrastructure wherever you think it should go. I’ll be working with this person to teach them the environment and all the ins and outs before handing over the reins, so it’s not like you’d be thrown to the dogs. Contact me or Aaron with any questions (I’m on Skype as SeanWalberg though usually on text only, or Google Talk as swalberg)

Building Packages Out of PECL Modules

I’ve had a need to distribute the memcached and APC modules to various web servers in a farm. pecl/pear modules have a pretty cool interface similiar to yum, but it’s all based on source. It does, however, have facilities to help you build an RPM. Because I try to use RPM for everything I was pretty happy that this takes the work out of making the .spec file. However the generated specfile needs a lot of work.

You’ll need php-devel, httpd-devel, php-pear, rpm-build (and probably more, but it’s usually obvious)

First upgrade the package that builds the RPMs… It was changed from the version I had:

pear install channel://pear.php.net/PEAR_Command_Packaging-0.1.2

then upgrade pear otherwise you’ll end up with some really odd errors that make no sense

pear upgrade-all

Then download the module you want, ie

pecl download pecl/APC

and build a .spc file

pecl make-rpm-spec /tmp/APC-3.0.14.tgz

That will generate PECL::APC-3.0.14.spec. Move that file to /usr/src/redhat/SPECS and move the tarball to the SOURCES directory.

You’ll need to make a few changes to the .spec.

- Change the BuildRequires to require php-pear
- Change all instances of package2.xml to package.xml
- Add /usr/lib/php/modules/apc.so to the %files section

Then you should be able to build the package cleanly – rpmbuild -ba PECL\:\:APC-3.0.14.spec

That’s not all, though… You’ll want to include a config file. Make one in the SOURCES directory, ie pecl-apc.ini. It must contain the extension line, the rest are optional (I’m just getting into APC, so my values might be completely wrong)

extension=apc.so
apc.shm_size=512
apc.ttl=60
apc.write_lock=1

Add the following to the top of your .spec file, right under Source0:

Source1: pecl-apc.ini

and at the end of the %install section (right before %clean)

mkdir -p %{buildroot}/etc/php.d/
cp %{SOURCE1} %{buildroot}/etc/php.d/apc.ini

and finally, in the %files section:

%config /etc/php.d/apc.ini

This copies the config file into the php.d directory within the buildroot (where all the files go before they’re packaged up), and defines it as a config file to be included with the RPM. Thus, upgrades won’t stomp on your changes.

APC also includes a file called apc.php that can be used to check the status of the cache… Similar techniques can be used to copy it into /var/www/html, or put an Alias entry in /etc/httpd/conf.d/apc.conf

Update on Jan 8, 2010

Mike (see comments below) indicated some changes in PEAR necessitated a change in the .spec as follows:

1
2
%post
pear install --nodeps --soft --force --register-only --nobuild %{xmldir}/APC.xml

Analyze BGP Peering With TAIND

I’ve uploaded some of my code to sourceforge to a new project called taind - tools for analyzing and interpreting netflow data. I wrote this to analyze BGP peering options, it uses netflow data and multiple BGP tables (from “show ip bgp”) to evaluate what your current traffic distribution is, and what it would be like if you added new peers.

For now it generates stuff like

I hope to figure out more ways of analyzing the information and adding it to the code, hopefully drawing in more people.

A Few Notes on Squid as a Reverse Proxy

b5media uses a handful of web servers fronted by a load balancer. We’re getting quite busy in terms of traffic, so I finally got around to putting reverse proxy in front of the farm. A reverse proxy accepts the request from the user and checks its cache for the result. If no answer is found, the proxy asks a back end web server for the same page the user asked for, saves it, and passes it back to the client. squid is such a piece of software, it is used on wikipedia and other sites for both forward (caching the sites your users visit) and reverse proxies.

After swinging the traffic over to the proxies I was ecstatic. Just by caching images, css, and javascript for a few minutes each I was able to serve 40% of the hits from the proxies instead of going to a backend server, and knock 20% off the CPU usage of the servers. Then one of our celeb blogs got a huge wave of traffic that killed the proxies, I ended up pulling them out of service and going back to the old method.

Squid is an amazing piece of software used on much bigger sites than us, so I knew it had to be the way I had it configured.

I’m pretty sure the reason squid crapped out on us was because it ran out of file descriptors. For some strange reason, squid decides how many FD it’s going to use *at compile time*. When I rebuilt the Fedora RPMs I didn’t do anything special so it was using the default of 1024. At two FDs per connection (one for the inbound, one for the outbound) plus whatever is needed for pulling cache files from disk, we ran out in a hurry once we got busy and couldn’t make any socket connections.

The big reason I missed it earlier was because I had “cache_log none” which is roughly the same as apache’s error_log rather than the info on cache hits/misses I thought it was (that’s cache_store_log).

So the new RPM I’ve put together has it built for 200K FDs and has a ulimit command in all the startup scripts. I’ve also linked it against Google’s tcmalloc which apparently made a big difference at Wikipedia when they tried it.

Some useful links that might also cure insomnia:

Six Things First-Time Squid Administrators Should Know: http://www.onlamp.com/pub/a/onlamp/2004/02/12/squid.html
tcmalloc: http://goog-perftools.sourceforge.net/doc/tcmalloc.html
About wikipedia’s problems: http://wiki.wikked.net/wiki/Squid_memory_fragmentation_problem

MRTG is also graphing the FD usage for the next round.

A Tale of Two PDF Products

Books in PDF (or other online format) are nothing new, but somehow reading 400 pages on my monitor doesn’t do it for me. I’ve noticed a couple of publishers offering documents in the 50-60 page range on more niche topics though. After contacting the publishers and arranging for a couple of samples, I’m here to present my findings.

I looked at two documents from Pearson and O’Reilly (links go to the catalog of PDF documents for each publisher). Both publishers coincidentally call their series “Short Cuts” and sell them for around $9.99 US (some are less). The titles I looked at were AJAX and Web Services and Lead Generation on the Web from O’Reilly, and CCVP GWGK Quick Reference Sheets and CCNP ISCW Quick Reference Sheets from Cisco Press (Pearson).

Pearson’s ShortCuts are delivered through a third party and use Adobe DRM. According to the security settings, the reader is allowed to print the document (which I did), keep the document for an unlimited amount of time, and to make 6 selections (ie copy and paste) every 30 days. There doesn’t appear to be a limit on how much can be copied and pasted at any time. Internet access is required to download and activate the book on the computer for the first time. It appears that the document can be transfered to other computers but because I only have the one Windows machine I wasn’t able to try.

Windows? Yes, this unfortunately won’t work on Linux, at least Pearson’s DRM enabled documents. OS X will work, at least according to the FAQs over on Adobe’s site.

O’Reilly’s document on the other hand is an unencrypted PDF (while there’s nothing technically stopping you from copying it, the copyright, of course, prevents you from legally doing so).

In terms of the actual layout, you can get an idea of Pearson’s from their ShortCut Sampler. I couldn’t find something similar for O’Reilly. Pearson clearly has the visual edge, colour is used, there is more whitespace used and the two column landscape layout enhances the on-screen readibility. O’Reilly’s is denser, they use more of the page, and have a portrait orientation.

In terms of quality and completeness, both were excellent. Each ShortCut delivered as promised and I thought my time was well spent reading.

AJAX and Web Services explored the use of AJAX to make REST and SOAP calls to search engines. From there it branched off into XSLT for transforming the output either server side or client side, and also the use of proxies to work within JavaScript’s security framework. The code (Perl and JavaScript) was well constructed and I really learned a lot from it. The way it was structured allowed for easy re-use of the code.

Lead Generation was an interesting mix of business and tech. I had initially thought it would be all sales based, but I found the content to be applicable for techs looking to market their services.

The ISCW exam refresher was also fairly good. I happen to be doing some editing on the Cisco Press ISCW exam prep book so I’m in a good position to judge the completeness of this one. The length of the PDF doesn’t allow for many diagrams or complex instruction, but as a refresher before writing the exam this is an excellent choice.

The GWGK exam refresher was also good. It made heavier use of diagrams and instructions than the ISCW guide, so for people that work with voice periodically this will be a good quick reference guide. The breadth of the topics in this one made it an interesting read, and worth printing out even after the exam is over with.

After studying the offerings from the two vendors, I realized it’s not a case of one over the other, just which one offers the subject you’re looking for. There is little overlap between the two, both have many AJAX titles for example, but it’s such a focused document they’re really offering different products. The only complaint I can find on the Pearson side is the DRM, not because of any philosophical reasons, but because I’m a Linux user. From O’Reilly, it doesn’t look as polished (visually) as the Pearson ones, but for 60 pages it’s not something that’s an issue.

This PDF format and smaller size means that content can come out faster. A book may take many months to write, print, and distribute, these documents can come out much faster. Because of the tight focus the author can really teach something, or solve a troubling problem. At less than ten bucks a pop and immediate delivery, these downloadable ebooks are well worth it.

I’m Going to the MySQL Conference… Or Not

The folks over at Proven Scaling offered to send three people to the MySQL conference in Santa Clara at the end of the month. I put my name in and won!

Unfortunately my wife’s at a conference herself those days, so I can’t go. Major bummer!

I think it’s wonderful that a company is offering to send people to a conference (airfare and everything… One of the other winners is from Germany!). I think there are a lot of people out there with the right attitude and good ideas but no means to get to events like this.

Flow-tools, Flowscan and Fedora

I’m doing some work with NetFlow accounting. FlowScan parses the data captured by flow-tools. However, FlowScan needs Cflow.pm which has to be specially compiled to work with flow-tools, because it’s made for Caida’s now defunct cflowd.

If you don’t compile Cflow.pm using the flow-tools library, you get something like “Invalid index in cflowd flow file: 0xCF100103! Version 5 flow-export is required with *all* fields being saved.”

All the instructions on the Internet that I found tell you to compile Cflow.pm out of the contrib dir for flow-tools. However, flow-tools and flow-tools-devel are packages with Fedora, so that’s not an option.

Easy instructions to fix this.

  1. Download Cflow.pm (link goes to index page)
  2. Untar wherever you feel like
  3. Edit Makefile.PL, look for the find_flow_tools subroutine
  4. Just before the if (“$libdir”) { line, add $libdir=”-L/usr/lib”;
  5. perl Makefile.PL, you’ll see Found flow-tools…
  6. make; make install