Sean’s Obsessions

  • Archives

14 Apr

A few notes on squid as a reverse proxy

b5media uses a handful of web servers fronted by a load balancer. We’re getting quite busy in terms of traffic, so I finally got around to putting reverse proxy in front of the farm. A reverse proxy accepts the request from the user and checks its cache for the result. If no answer is found, the proxy asks a back end web server for the same page the user asked for, saves it, and passes it back to the client. squid is such a piece of software, it is used on wikipedia and other sites for both forward (caching the sites your users visit) and reverse proxies.

After swinging the traffic over to the proxies I was ecstatic. Just by caching images, css, and javascript for a few minutes each I was able to serve 40% of the hits from the proxies instead of going to a backend server, and knock 20% off the CPU usage of the servers. Then one of our celeb blogs got a huge wave of traffic that killed the proxies, I ended up pulling them out of service and going back to the old method.

Squid is an amazing piece of software used on much bigger sites than us, so I knew it had to be the way I had it configured.

I’m pretty sure the reason squid crapped out on us was because it ran out of file descriptors. For some strange reason, squid decides how many FD it’s going to use *at compile time*. When I rebuilt the Fedora RPMs I didn’t do anything special so it was using the default of 1024. At two FDs per connection (one for the inbound, one for the outbound) plus whatever is needed for pulling cache files from disk, we ran out in a hurry once we got busy and couldn’t make any socket connections.

The big reason I missed it earlier was because I had “cache_log none” which is roughly the same as apache’s error_log rather than the info on cache hits/misses I thought it was (that’s cache_store_log).

So the new RPM I’ve put together has it built for 200K FDs and has a ulimit command in all the startup scripts. I’ve also linked it against Google’s tcmalloc which apparently made a big difference at Wikipedia when they tried it.

Some useful links that might also cure insomnia:

Six Things First-Time Squid Administrators Should Know: http://www.onlamp.com/pub/a/onlamp/2004/02/12/squid.html
tcmalloc: http://goog-perftools.sourceforge.net/doc/tcmalloc.html
About wikipedia’s problems: http://wiki.wikked.net/wiki/Squid_memory_fragmentation_problem

MRTG is also graphing the FD usage for the next round.

2 Responses to “A few notes on squid as a reverse proxy”

  1. 1
    Squid Proxying and its Effect on b5media » Technology, Blogging and New Media Says:

    […] I love not being the king hippo of tech at b5media. Slowly I’ve been able to build a team of well-qualified people and Sean Walberg is one of them. Incidentally, we’re going to hit the community real hard with some new faces in tech that I’m sure will generate some buzz soon - but that’s for soon down the road. […]

  2. 2
    Chrispian Says:

    Sean,

    Did you guys patch WP to correct the header problem that prevents squid from caching WP pages? Matt from WP suggested this as one of the projects from the Summer of Code for Wordpress. Is it caching the WP pages, or just the static content? I’m doing research as I get ready to do ours and I’ve seen some talk of squid not caching WP pages because of a header problem.

Leave a Reply

What's a blog without spam: the intense Hash-cash!

© 2008 Sean’s Obsessions | Entries (RSS) and Comments (RSS)

Powered by Wordpress, design by Web4Sudoku, based on Pinkline byGPS Gazette