Sep 12 2002


                    LINUX NEWS
            http://www.Cramsession.com
          September 12, 2002 - Issue #98


TABLE OF CONTENTS

1) Sean’s Notes

2) Linux News

Cool Devices Running Linux
KOffice 1.2 Released
A Look At Ximian
Revolution OS

3) Linux Resources

101 Uses of SSH
MySQL HA Clusters
RPM One Liners
Run Your Own Pizza Joint
Graphing ISN Values

4) App o’ the Week

~~~~~~~~~~~~~~~~~~~~~~ ADVERTISEMENT ~~~~~~~~~~~~~~~~~~~~~~~

Better Practice Tests at a Better Price! PrepLogic is raising the bar. You deserve the highest quality practice tests but you shouldn’t have to pay the highest price. Our practice tests are written by experienced Certified IT Professionals and designed to help you pass the first time. PrepLogic gives you superb, affordable quality. Still not convinced? Download a FREE demo or buy it and try it!

http://ad.brainbuzz.com/?RC06&AIV26

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For information on how to advertise in this newsletter please contact mailto:adsales@CramSession.com or visit http://cramsession.com/marketing/default.asp


1) Sean’s Notes

It’s only been about two weeks since I installed Spam Assassin, and my “caughtspam” folder has over 700 messages in it. Other than a couple of small problems with some mailing lists, the only false positive was from Network Solutions reminding me I should renew my domain. Lucky I caught that, it reminded me that I wanted to transfer to another registrar and save $25US.

In terms of false negatives, it hasn’t been too bad. I still get the odd spam, but nowhere near what I was getting before. In short, Spam Assassin has made me pretty happy.

The thing, though, is that there is some noticeable overhead when using this mail filter. But how much? How can I test the impact of running Spam Assassin on my measly P-233?

The first thing I’ll need is some fodder for testing. My “caughtspam” folder, and my inbox should provide enough:

$ grep “^From:” caughtspam $MAIL | wc -l 1557

All I’m doing here is looking for the From: header at the beginning of a line (that’s the ^ doing its work) from both my caughtspam folder, and my spool file ($MAIL expands to /var/spool/mail/sean).

1,557 messages should give enough variety, and is about 31MB of data to crunch.

One handy utility that will help here is called “formail”, which is part of procmail.

formail -s program < file

will run “program” on each message within the file mailbox. Pretty handy, eh?

So, I could run:

time formail -s spamassassin -P > /dev/null < bigmbox

where “bigmbox” is the combination of my spool and the caughtspam, divide over the size, and arrive at some sort of figure. (Since I don’t have an hour or so to wait around for it to finish, I’m running it on a much smaller mailbox.)

I get around 0.8 KBytes/sec. What does that tell me, though? Basically, if messages are coming in faster than .8 Kbyte/sec (about 6.4Kbit/sec), then my machine will fall behind, and needs some speeding up.

Email administrators tend to think in terms of messages per second rather than raw throughput, so let’s try to get something that’ll make sense. Counting the messages and the time, I get about .31 messages per second, or one message every 3.2 seconds. Any more (based on the distribution within my sample), and I need to beef up my box, or be prepared for mail delays.

Now that I’ve calculated that it takes about 3.2 seconds to process the average message, I’m interested in finding out how much my data deviates from the average? Remembering what I can from statistics, the standard deviation and variance tells me this. You can read up on it (like I had to) here:

http://mathworld.wolfram.com/StandardDeviation.html

Basically, for every value, subtract from it the average, square the result, then add them all up. Finally, divide by the number of samples. That gives you the variance. The square root of that is the deviation. The higher the number, the more the samples vary from the average. Sparing you more boring statistics, about 68% of your values fall within one standard deviation of your average, and 95% for two.

I’m going to have to do all the statistics in perl, since awk was making my head spin. Feel free to send me your awk interpretations.

To do this, I’m going to rely on some of perl’s command line options. -n saves me some typing by wrapping it in a loop. -e lets me put the program on the command line. So,

perl -ne ‘print’

is interpreted by perl as

while (<>) { print; }

All that will do is print whatever comes in. However, I’d like to pull out my performance metrics from “time”, and run a variance calculation on it:

formail -s time spamassassin -P < bigmbox | & \ perl -ne ‘/^(\d+.\d+)user/ || next; \ $sum += (($1-3.2)*($1-3.2)); \ $tot++; \ END { print $sum/$tot}’

The 's are there to break up the line. What this script does is run formail on each message, but times each invocation of spamassassin. The output is pumped through a perl script, which:

First, makes sure the line starts with

nn.nnuser

where nn.nn is the actual time. That is saved in the $1 variable, since I put parenthesis around the regular expression. The regex itself is

\d+.\d+

which means “look for one or more (+) digits (\d), followed by a period (.), followed by one or more digits (\d+). Then, I take that number, and square it, adding it to my $sum counter. I also keep track of the number of items in $tot.

The END { … } block gets executed just before the whole script is finished. It simply prints out my sum / (total-1)

On some smaller mailboxes of personal correspondence, I’m finding that one standard deviation is around .6 of a second, so 95% of my messages should get processed between 2 and 4.4 seconds. Just for fun, I’m letting it run overnight on a whole bunch of mail, but that won’t make it in for this week.

What started out as a brief report on my spam situation ended up being a short lesson on performance measurement and statistics. My apologies to both the people who hate stats and those that are keenly in tune with them, I’m sure I’ve offended both of these groups with my treatment of the subject!

However, when putting in a CPU or disk intensive program such as email filters, careful attention must be made to the impact it will have on the rest of the system. Some simple shell scripting, as we saw, can give you a rough estimate of how your box will fare.

Long live the Penguin,

Sean mailto:swalberg@cramsession.com


2) Linux News


Cool Devices Running Linux

People keep asking, “All this stuff about Embedded Linux ‘taking off like a rocket’ sounds great, but are any companies really shipping Embedded Linux in real products? And, if so, when are some of these Embedded Linux-based products going to start hitting the market?” Well, here’s a list!

http://www.linuxdevices.com/articles/AT4936596231.html


KOffice 1.2 Released

Congrats to the KOffice team on the 1.2 release. Improvements include a better spell checker and thesaurus, better PDF support, and better file import/export filters.

http://www.koffice.org/announcements/announce-1.2.phtml


A Look At Ximian

Ximian Inc. is pouring a lot of money into the GNOME desktop, and various applications. This article is the chronicle of a reporter’s visit to the offices, and has some information on Ximian’s involvement with the Open Office project.

http://www.linuxandmain.com/modules.php?name=News&file=article&sid! 1


Revolution OS

You may have heard about the documentary about Linux. Well, here’s the first eight minutes and the trailer. Now if only they’d offer it in a format I can view on my Linux workstation…

http://www.ifilm.com/ifilm/product/film_multimedia/0,4470,2419320,0 0.html


3) Linux Resources


101 Uses of SSH

Well, not really 101 uses, but there are a few good ones, such as how to get passwordless authentication running, and simple port forwarding over a secure tunnel.

http://www.linuxjournal.com/article.php?sidD13


MySQL HA Clusters

Though it’s pretty straightforward to make a web application highly available through web farms, most often they have to go back to a single point of failure – the database. Here’s a project that promises to fix that.

http://mysql-ha.sourceforge.net/


RPM One Liners

RPM is a powerful package management tool. If all you’re using is “rpm -i” to install a package, then you’re missing out on a lot! Keep this link handy for the next time you think you need to do something with packages.

http://www.linuxlaboratory.org/modules.php?op=modload&name=Sections &file=index&req=viewarticle&artid


Run Your Own Pizza Joint

I’m a sucker for silly games. This one is a simulation of a pizza shop. It’s made for Windows, but runs under WINE, and the author is making a native port with WineLib.

http://pizza-business.sourceforge.net/


Graphing ISN Values

About a year ago, someone did a study where he used some chaos theory to draw graphs of the randomness of various operating systems’ TCP initial sequence numbers. He’s done an update, showing who has improved and who hasn’t.

http://lcamtuf.coredump.cx/newtcp/


4) App o’ the Week

“AutoRPM is a Perl program that automates RPM installation. It is designed to be run from cron nightly and run interactively as necessary. By default, every night, it will check for official Red Hat updates for your system. However, you can modify the configuration file to do much more… like automatically install the same RPMs on a cluster of machines.”

http://www.autorpm.org/


(C) 2002 BrainBuzz.com, Inc. All Rights Reserved.


      This message is from CramSession

You are currently subscribed to the following list Hottest Linux News and Resources as: sean@ertw.com

To un-subscribe from this newsletter by e-mail, send a blank email message to: mailto:leave-linuxnews-3825955Y@list.cramsession.com

To subscribe to this newsletter and many others visit our site at: http://newsletters.cramsession.com/signup/default.asp