Sean’s Obsessions

Sean Walberg’s blog

Linux Newsletter

A few people have written asking for copies of old Linux newsletters. They’re all available at http://ertw.com/~sean/news/ I’ve also added links so you can download the whole shebang as one file (1.4M, MBOX format), or as individual .txt or .html pages.

While I’m on the subject of the Linux news archives…

I’m Lazy, in that I hate doing things more than once. When I started writing the Linux newsletter, I wanted to keep copies of the newsletter as it was distributed (the copies that I submit, which I also have, get edited). At that point, I was mailing publishers and the like to get them to send me review copies of stuff so I could write about it. Having a web archive is a good way to show them that I really am writing a newsletter. As time went on, I needed a way to refer back to old issues, so the archive became more important (as Cramsession didn’t start archiving my newsletter until mid 2002)

So, what I did was write a PHP script that would read my mailbox and print out the correct message. It was pretty simple, you’d give it a date string, it would read through the file until it saw it, then it would start spitting out the content. You can see the code here: http://ertw.com/~sean/newsletter.phps

There were a few problems with this. As more issues were written, the script had to do more work to print out the list. At each hit, it would read through to find all the issues (for the index), and also read through to find the right one. I knew this wasn’t efficient, but honestly, at the time I didn’t think the newsletter would make it far enough so that it became a problem.

Another thing is that I wanted this archive to be search engine friendly. Some search engines skip or mangle URLs with a query string, that is something like

http://mysite/newsletter?issue=1

What I wanted was something like

http://mysite/newsletter/1

That’s done through the PATH_INFO variable in Apache:


ForceType application/x-httpd-php

This line tells apache that any URL with /~sean/newsletter is to be treated as a PHP script. The script above was also called “newsletter”. Thus, if you asked for

http://ertw.com/~sean/newsletter/abc

It would run the PHP script called “newsletter”, and put the rest of the url (/abc) into the $PATH_INFO environment variable. I could then read that, and I’d know what issue to provide. If it was nothing, I knew I was on the index page.

That was all well and good, until Cramsession changed the layout of the banner just enough to break my program. I tweaked the script a bit, it worked. They changed it again, I tweaked. Eventually I gave up and had to change it.

My web server is a mere K6-233 with a slow disk, so reading a megabyte file every hit is too slow. What I wanted at this point was some way to generate individual pages from the mbox in a consistent manner. Furthermore, if I wanted to change the layout of the page (ie when I added the amazon links), I should be able to apply the changes to all pages, not just the ones that happen after. Template engines to the rescue!

An explanation of how I solved the problem will have to wait, but if you’re curious, I used Template::Toolkit and a bit of perl magic to do it. http://www.template-toolkit.org/ is an amazing piece of software, if you need a template engine that works with perl I’d highly recommend it. Having worked a bit with it and Mason, I prefer TT2 (not that Mason isn’t great itself).

Comments

I’m trying something new here. Talk to me on Twitter with the button above, please.