2001 08 02

                    LINUX NEWS
              Thursday, August 2, 2001
    Read By Over 6,000 Linux Enthusiasts Weekly!


1) Sean’s Notes

2) Linux News

Mandrake Pulls Off IPO
City of Largo Goes Live With KDE 2.1.1
Sparks Fly at Open Source Debate
KOffice Release Candidate

3) Linux Resources

Programming, Graphics, and OS Tips
Diamond Rio and Linux
The Tao of Programming

4) App o’ the week

~~~~~~~~~~~~~~~~~~~~~~ ADVERTISEMENT ~~~~~~~~~~~~~~~~~~~~~~~

What can ASP training do for you? Make sure you have the knowledge you need when you need it. Train Online with FirstClass Systems and have the knowledge you need at your fingertips. Online Training is the most intelligent way to learn anything!


For information on how to advertise in this newsletter
please contact mailto:adsales@BrainBuzz.com or visit

1) Sean's Notes
One of the more common questions on the Perl board is how
to get a script to fetch a web page.  Unsurprisingly, it's
a cinch to do in Perl, as long as you have the right module.

LWP is a "Library for WWW access in Perl".  And who said
Unix jocks couldn't come up with good names?  Actually, it's
a pretty bad name, because you can use the same set of
modules to access HTTPS, FTP, NNTP, and a few other
protocols.  A fine example of one of the three virtues of a


If your system doesn't come with LWP installed, hop onto
CPAN and install it.

# perl -MCPAN -e shell

cpan> install Bundle::LWP

With that, let's start our program.

---- cut here ----
#!/usr/bin/perl -w
use LWP::UserAgent;

# One user agent required per program
my $ua = LWP::UserAgent->new;
# The request object says what we want
my $boards = HTTP::Request->new(GET => "http://boards.brainbuzz.com");
# Pass the request object to the user agent to get the page
my $data = $ua->request($boards);
# $data is an HTTP::Response object, data is in content()
print $data->content();
---- cut here ----

The first line of a script always starts with "#!"
(pronounced "hash bang", or sometimes "she-bang"), which lets
the system know that the rest of the line will specify the
interpreter to run it.  This time, it's /usr/bin/perl -w, the
perl interpreter.

Those who have seen me write Perl code in previous newsletters
will notice I've started using the -w flag.  This turns on
warnings, which makes the interpreter point out potential
errors.  For newbies and experts alike, it is very helpful for
finding bugs that lurk in your code.  Next, I bring in the
LWP::UserAgent module, which gives me access to all the LWP

In LWP, a user agent is used, almost akin to the web browser
itself.  If I needed to set things like proxies, I'd do it to
$ua, and the effects will trickle down to all my requests.
The new method of LWP::UserAgent just invokes an object.

Next, I create a request object, of type HTTP::Request.  As
before, I invoke the new method, but this time I have to tell
it the url and the request method.  Here, I'm GETting the url
for the Brainbuzz boards.  Note the URL is fully qualified --
http and all.  This is because behind the scenes, the system
has to figure out if it should use HTTP, FTP, or any other
method supported.

Finally, I tell the user agent to send off the request I just
made.  Since it's returning an object, I use the content
method to get the data as a scalar (string).

A small sidebar:
Try leaving off the whole content part, and just printing
$data directly:

$ perl bb1.pl

That's perl telling you that you've got a data type of
HTTP::Response, and that it really doesn't know how to print
it, other than the memory location.  Do a "man HTTP::Response"
to find out what your object does, and you'll find you need
the content method to get at the data.

All of this would be useless unless I were to do something.
While I've got http://boards.brainbuzz.com loaded, why don't
I print out a list of the top posters, along with their posts?

---- cut here ----
#!/usr/bin/perl -w
use LWP::UserAgent;

# One user agent required per program
my $ua = LWP::UserAgent->new;
# The request object says what we want
my $boards = HTTP::Request->new(GET => "http://boards.brainbuzz.com");
# Pass the request object to the user agent to get the page
my $data = $ua->request($boards);
# $data is an HTTP::Response object, data is in content()
my ($posters) = grep /Top Posters/, (split /\n/, $data->content());
my @table = split /<\/tr>/, $posters;
for (@table) {
       printf "%20s %d\n", $1, $2 if (/vbd.asp.\*?>(\w+).\*\((\d+)/);
---- cut here ----

It's all the same up to the request... Rather than a lot of
temporary variables, I've taken advantage of some of Perl's
features.  Start at the right side of the fifth to last line.
In the parenthesis, I'm using the split() function, which
splits a string into substrings based on a given delimiter.
In this case, I'm breaking the result of the request into
lines (\n means "newline").  The result of that is grep()'ed
(searched) for the string "Top Posters", and returned in the
$posters variable.  $posters is in parenthesis because grep
is supposed to return a list, but since I only want the first
occurrence, I can use this trick.  So, this line of code
returns the line containing the Top Posters phrase.

Next, I do another split, this time on the </tr> tag.  Note
that I had to escape the / with a \, otherwise it would be
construed as the end of the regular expression.  Now I've
got a list of strings in the @table array, each potentially
containing a poster.  But there is some other junk in there

The last three lines set up a loop on every element of @table.
Since I didn't specify a variable to hold the current value,
it gets stored in a special variable, $_.  (Yes, that's
"dollar underscore").  The significance of that is that the
regular expression search in the next line will search $_
unless it's told otherwise.  Ain't Perl great?

The regexp itself (the stuff between the /'s) can be broken
down for clarity:

/         # begin regexp
vbd.asp   # look for the string "vbd.asp"
.\*?>      # followed by anything (.\*) ending in a >
          # adding ? to the end means to stop at the first match
(\w+)     # Next will be at least one alphanumeric character
          # Parenthesis mean to save it into a positional variable
          # since this is the first in the regexp it is $1
.\*        # skip ahead... match anything
\(        # find a literal left parenthesis
(\d+)     # followed by a number... save this in $2
/         # close regexp

If that is found, then print $1 and $2 (the saved username
and posts respectively) with some formatting.  Finding out
what you're supposed to match on is usually an exercise in
paging through the HTML source to a web page, and slowly
building up the expression.

The output is:

$ perl bb1.pl
         mrobinson52 125
                Tezz 120
             newd00d 119
            robnhood 111
         likeitontop 84
          editormatt 76
           mgraham44 74
                fsec 67
            bizzybot 61
            cjaquess 49

So, as you can see, LWP is a pretty cool module that lets
you pull out the necessary data from web pages...

Poke around the man pages for LWP and LWP::UserAgent, and
find out what else they can do.  Try writing a program to
calculate your stock portfolio by sucking back quotes from
the web!

The Perl board is there to discuss code like this...  Post
your questions, or show us the cool things you can do with
the LWP module:


Long live the Penguin,


Visit the Linux News Board at

2) Linux News

Mandrake Pulls Off IPO
Congrats to the Linux distribution based out of France for
pulling off their IPO. The stock is set to start trading
tomorrow. This news item also has relevant information for
those looking to add this stock to their portfolio, but read
the comments first, it looks like there was a typo in the
ticker symbol.


City of Largo Goes Live With KDE 2.1.1
The City of Largo, Florida were looking for an upgrade to
their current system that is required to support 800 users
and 400 thin client devices. KDE to the rescue! Not only
does this announcement give some details of the system, it
has some items to look out for if you're considering such a

http://eltoday.com/article.php3?ltsn 01-07-23-001-14-PS

Sparks Fly at Open Source Debate
What else did you think you'd get when you have Microsoft
execs speaking at an open source convention? The execs were
defending their position against Open Source (ooh! "Shared"
source!), but the audience wasn't buying it.


KOffice Release Candidate
"KOffice is an integrated office suite for KDE which utilizes
open standards for component communication and component
embedding. The primary goals of this release are to provide
a preview of KOffice 1.1 and to involve users and developers
who wish to request/implement missing features or identify
problems. Code development is currently focused on
stabilizing KOffice 1.1, scheduled for final release in
mid-August, 2001."


3) Linux Resources

Programming, Graphics, and OS Tips
This is one great site, containing links to various
tutorials on everything from C programming to shell
scripting, and a whole lot of tips and tricks about Linux
(and other operating systems).


This article from Server/Workstation Expert is a great
introduction to sendmail, what it is, and how it does what
it does. While you're on the site, go to the main page and
subscribe. It's free, and always has some interesting
columns. Out of all the trade rags that I receive, this is
the one I like the most.


For those getting sick of the spam in their inbox who have
realized that conventional filters can't help, here is
spammaster. It's a custom filter that is designed to weed
out spam, and reduce the false negatives and positives
associated with other filters. Some of the ideas used
are quite interesting, not to mention effective.


Diamond Rio and Linux
Last I checked, my Rio didn't come with Linux software, but
a gadget like that won't stay unsupported for long. The
developers managed to reverse engineer the communications
protocol, and then write the software. Great work, guys!


The Tao of Programming
On the lighter side of the resource section, I thought I'd
share a document that I've always found both informative,
and entertaining. The Tao of Programming explains how
programmers and management should interact and work.


4) App o' the week
Need some fax software for Unix that's network aware, and
has Windows, Unix, and Mac clients (don't forget the web and
email use either)? Hylafax is the answer. Do yourself a favor
and RTFM when you install this... It's not hard, but there
are several important steps that can be easily missed.


(C) 2001 BrainBuzz.com. All Rights Reserved.


         This message is from BrainBuzz.com.

You are currently subscribed to the
   Hottest Linux News and Resources
   as: sean@ertw.com

To un-subscribe from this newsletter by e-mail:
   send a blank email message to:


To Subscribe to this newsletter by e-mail:
   send a blank email message to: