2002 09 12

                    LINUX NEWS
          September 12, 2002 - Issue #98


1) Sean’s Notes

2) Linux News

Cool Devices Running Linux
KOffice 1.2 Released
A Look At Ximian
Revolution OS

3) Linux Resources

101 Uses of SSH
MySQL HA Clusters
RPM One Liners
Run Your Own Pizza Joint
Graphing ISN Values

4) App o’ the Week

~~~~~~~~~~~~~~~~~~~~~~ ADVERTISEMENT ~~~~~~~~~~~~~~~~~~~~~~~

Better Practice Tests at a Better Price! PrepLogic is raising the bar. You deserve the highest quality practice tests but you shouldn’t have to pay the highest price. Our practice tests are written by experienced Certified IT Professionals and designed to help you pass the first time. PrepLogic gives you superb, affordable quality. Still not convinced? Download a FREE demo or buy it and try it!


For information on how to advertise in this newsletter
please contact mailto:adsales@CramSession.com or visit

1) Sean's Notes

It's only been about two weeks since I installed Spam Assassin,
and my "caughtspam" folder has over 700 messages in it.  Other
than a couple of small problems with some mailing lists, the
only false positive was from Network Solutions reminding me I
should renew my domain.  Lucky I caught that, it reminded me
that I wanted to transfer to another registrar and save $25US.

In terms of false negatives, it hasn't been too bad.  I still
get the odd spam, but nowhere near what I was getting before.
In short, Spam Assassin has made me pretty happy.

The thing, though, is that there is some noticeable overhead
when using this mail filter.  But how much?  How can I test the
impact of running Spam Assassin on my measly P-233?

The first thing I'll need is some fodder for testing.  My
"caughtspam" folder, and my inbox should provide enough:

$ grep "^From:" caughtspam $MAIL | wc -l

All I'm doing here is looking for the From: header at the
beginning of a line (that's the ^ doing its work) from both my
caughtspam folder, and my spool file ($MAIL expands to

1,557 messages should give enough variety, and is about 31MB of
data to crunch.

One handy utility that will help here is called "formail",
which is part of procmail.

formail -s program < file

will run "program" on each message within the file mailbox.
Pretty handy, eh?

So, I could run:

time formail -s spamassassin -P > /dev/null < bigmbox

where "bigmbox" is the combination of my spool and the
caughtspam, divide over the size, and arrive at some sort of
figure.  (Since I don't have an hour or so to wait around for it
to finish, I'm running it on a much smaller mailbox.)

I get around 0.8 KBytes/sec.  What does that tell me, though?
Basically, if messages are coming in faster than .8 Kbyte/sec
(about 6.4Kbit/sec), then my machine will fall behind, and needs
some speeding up.

Email administrators tend to think in terms of messages per
second rather than raw throughput, so let's try to get something
that'll make sense.  Counting the messages and the time, I get
about .31 messages per second, or one message every 3.2 seconds.
Any more (based on the distribution within my sample), and I
need to beef up my box, or be prepared for mail delays.

Now that I've calculated that it takes about 3.2 seconds to
process the average message, I'm interested in finding out how
much my data deviates from the average?  Remembering what I can
from statistics, the standard deviation and variance tells me
this.  You can read up on it (like I had to) here:


Basically, for every value, subtract from it the average, square
the result, then add them all up.  Finally, divide by the number
of samples.  That gives you the variance.  The square root of
that is the deviation.  The higher the number, the more the
samples vary from the average.  Sparing you more boring
statistics, about 68% of your values fall within one standard
deviation of your average, and 95% for two.

I'm going to have to do all the statistics in perl, since awk
was making my head spin.  Feel free to send me your awk

To do this, I'm going to rely on some of perl's command line
options.  -n saves me some typing by wrapping it in a loop.
-e lets me put the program on the command line.  So,

perl -ne 'print'

is interpreted by perl as

while (<>) {

All that will do is print whatever comes in.  However, I'd like
to pull out my performance metrics from "time", and run a
variance calculation on it:

formail -s time spamassassin -P < bigmbox | & \
   perl -ne '/^(\d+\.\d+)user/ || next;   \
        $sum += (($1-3.2)\*($1-3.2));      \
        $tot++;                           \
   END { print $sum/$tot}'

The \'s are there to break up the line.  What this script does
is run formail on each message, but times each invocation of
spamassassin.  The output is pumped through a perl script, which:

First, makes sure the line starts with


where nn.nn is the actual time.  That is saved in the $1
variable, since I put parenthesis around the regular expression.
The regex itself is


which means "look for one or more (+) digits (\d), followed by
a period (\.), followed by one or more digits (\d+).  Then, I
take that number, and square it, adding it to my $sum counter.
I also keep track of the number of items in $tot.

The END { ... } block gets executed just before the whole script
is finished.  It simply prints out my sum / (total-1)

On some smaller mailboxes of personal correspondence, I'm
finding that one standard deviation is around .6 of a second,
so 95% of my messages should get processed between 2 and 4.4
seconds.  Just for fun, I'm letting it run overnight on a whole
bunch of mail, but that won't make it in for this week.

What started out as a brief report on my spam situation ended up
being a short lesson on performance measurement and statistics.
My apologies to both the people who hate stats and those that
are keenly in tune with them, I'm sure I've offended both of
these groups with my treatment of the subject!

However, when putting in a CPU or disk intensive program such as
email filters, careful attention must be made to the impact it
will have on the rest of the system.  Some simple shell
scripting, as we saw, can give you a rough estimate of how your
box will fare.

Long live the Penguin,


2) Linux News

Cool Devices Running Linux

People keep asking, "All this stuff about Embedded Linux
'taking off like a rocket' sounds great, but are any companies
really shipping Embedded Linux in real products? And, if so,
when are some of these Embedded Linux-based products going to
start hitting the market?" Well, here's a list!


KOffice 1.2 Released

Congrats to the KOffice team on the 1.2 release. Improvements
include a better spell checker and thesaurus, better PDF
support, and better file import/export filters.


A Look At Ximian

Ximian Inc. is pouring a lot of money into the GNOME desktop,
and various applications. This article is the chronicle of a
reporter's visit to the offices, and has some information on
Ximian's involvement with the Open Office project.


Revolution OS

You may have heard about the documentary about Linux. Well,
here's the first eight minutes and the trailer. Now if only
they'd offer it in a format I can view on my Linux workstation...


3) Linux Resources

101 Uses of SSH

Well, not really 101 uses, but there are a few good ones, such
as how to get passwordless authentication running, and simple
port forwarding over a secure tunnel.


MySQL HA Clusters

Though it's pretty straightforward to make a web application
highly available through web farms, most often they have to go
back to a single point of failure -- the database.  Here's a
project that promises to fix that.


RPM One Liners

RPM is a powerful package management tool. If all you're using
is "rpm -i" to install a package, then you're missing out on a
lot! Keep this link handy for the next time you think you need
to do something with packages.


Run Your Own Pizza Joint

I'm a sucker for silly games. This one is a simulation of a
pizza shop. It's made for Windows, but runs under WINE, and the
author is making a native port with WineLib.


Graphing ISN Values
About a year ago, someone did a study where he used some chaos
theory to draw graphs of the randomness of various operating
systems' TCP initial sequence numbers. He's done an update,
showing who has improved and who hasn't.


4) App o' the Week

"AutoRPM is a Perl program that automates RPM installation. It
is designed to be run from cron nightly and run interactively as
necessary. By default, every night, it will check for official
Red Hat updates for your system. However, you can modify the
configuration file to do much more... like automatically install
the same RPMs on a cluster of machines."


(C) 2002 BrainBuzz.com, Inc. All Rights Reserved.

          This message is from CramSession

You are currently subscribed to the following list
   Hottest Linux News and Resources
   as: sean@ertw.com

To un-subscribe from this newsletter by e-mail,
   send a blank email message to:

To subscribe to this newsletter and many others visit
our site at: