LINUX NEWS
RESOURCES & LINKS FROM BRAINBUZZ.COM
Thursday, August 2, 2001
Read By Over 6,000 Linux Enthusiasts Weekly!
TABLE OF CONTENTS
1) Sean’s Notes
2) Linux News
Mandrake Pulls Off IPO
City of Largo Goes Live With KDE 2.1.1
Sparks Fly at Open Source Debate
KOffice Release Candidate
3) Linux Resources
Programming, Graphics, and OS Tips
Sendmail
SpamMaster
Diamond Rio and Linux
The Tao of Programming
4) App o’ the week
~~~~~~~~~~~~~~~~~~~~~~ ADVERTISEMENT ~~~~~~~~~~~~~~~~~~~~~~~
What can ASP training do for you? Make sure you have the knowledge you need when you need it. Train Online with FirstClass Systems and have the knowledge you need at your fingertips. Online Training is the most intelligent way to learn anything!
http://ad.brainbuzz.com/?RC06&AI345
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
For information on how to advertise in this newsletter please contact mailto:adsales@BrainBuzz.com or visit http://cramsession.brainbuzz.com/marketing/default.asp
1) Sean’s Notes
One of the more common questions on the Perl board is how to get a script to fetch a web page. Unsurprisingly, it’s a cinch to do in Perl, as long as you have the right module.
LWP is a “Library for WWW access in Perl”. And who said Unix jocks couldn’t come up with good names? Actually, it’s a pretty bad name, because you can use the same set of modules to access HTTPS, FTP, NNTP, and a few other protocols. A fine example of one of the three virtues of a programmer–laziness:
http://www.netropolis.org/hash/perl/virtue.html
If your system doesn’t come with LWP installed, hop onto CPAN and install it.
perl -MCPAN -e shell
cpan> install Bundle::LWP … cpan>
With that, let’s start our program.
—- cut here —-
!/usr/bin/perl -w
use LWP::UserAgent;
One user agent required per program
my $ua = LWP::UserAgent->new;
The request object says what we want
my $boards = HTTP::Request->new(GET => “http://boards.brainbuzz.com”);
Pass the request object to the user agent to get the page
my $data = $ua->request($boards);
$data is an HTTP::Response object, data is in content()
print $data->content(); —- cut here —-
The first line of a script always starts with “#!” (pronounced “hash bang”, or sometimes “she-bang”), which lets the system know that the rest of the line will specify the interpreter to run it. This time, it’s /usr/bin/perl -w, the perl interpreter.
Those who have seen me write Perl code in previous newsletters will notice I’ve started using the -w flag. This turns on warnings, which makes the interpreter point out potential errors. For newbies and experts alike, it is very helpful for finding bugs that lurk in your code. Next, I bring in the LWP::UserAgent module, which gives me access to all the LWP functions.
In LWP, a user agent is used, almost akin to the web browser itself. If I needed to set things like proxies, I’d do it to $ua, and the effects will trickle down to all my requests. The new method of LWP::UserAgent just invokes an object.
Next, I create a request object, of type HTTP::Request. As before, I invoke the new method, but this time I have to tell it the url and the request method. Here, I’m GETting the url for the Brainbuzz boards. Note the URL is fully qualified – http and all. This is because behind the scenes, the system has to figure out if it should use HTTP, FTP, or any other method supported.
Finally, I tell the user agent to send off the request I just made. Since it’s returning an object, I use the content method to get the data as a scalar (string).
A small sidebar: Try leaving off the whole content part, and just printing $data directly:
$ perl bb1.pl HTTP::Response=HASH(0x8350c74)
That’s perl telling you that you’ve got a data type of HTTP::Response, and that it really doesn’t know how to print it, other than the memory location. Do a “man HTTP::Response” to find out what your object does, and you’ll find you need the content method to get at the data.
All of this would be useless unless I were to do something. While I’ve got http://boards.brainbuzz.com loaded, why don’t I print out a list of the top posters, along with their posts?
—- cut here —-
!/usr/bin/perl -w
use LWP::UserAgent;
One user agent required per program
my $ua = LWP::UserAgent->new;
The request object says what we want
my $boards = HTTP::Request->new(GET => “http://boards.brainbuzz.com”);
Pass the request object to the user agent to get the page
my $data = $ua->request($boards);
$data is an HTTP::Response object, data is in content()
my ($posters) = grep /Top Posters/, (split /\n/, $data->content()); my @table = split /<\/tr>/, $posters; for (@table) { printf “%20s %d\n”, $1, $2 if (/vbd.asp.*?>(\w+).*((\d+)/); } —- cut here —-
It’s all the same up to the request… Rather than a lot of temporary variables, I’ve taken advantage of some of Perl’s features. Start at the right side of the fifth to last line. In the parenthesis, I’m using the split() function, which splits a string into substrings based on a given delimiter. In this case, I’m breaking the result of the request into lines (\n means “newline”). The result of that is grep()’ed (searched) for the string “Top Posters”, and returned in the $posters variable. $posters is in parenthesis because grep is supposed to return a list, but since I only want the first occurrence, I can use this trick. So, this line of code returns the line containing the Top Posters phrase.
Next, I do another split, this time on the </tr> tag. Note that I had to escape the / with a \, otherwise it would be construed as the end of the regular expression. Now I’ve got a list of strings in the @table array, each potentially containing a poster. But there is some other junk in there too!
The last three lines set up a loop on every element of @table. Since I didn’t specify a variable to hold the current value, it gets stored in a special variable, $_. (Yes, that’s “dollar underscore”). The significance of that is that the regular expression search in the next line will search $_ unless it’s told otherwise. Ain’t Perl great?
The regexp itself (the stuff between the /’s) can be broken down for clarity:
/ # begin regexp vbd.asp # look for the string “vbd.asp” .*?> # followed by anything (.*) ending in a > # adding ? to the end means to stop at the first match (\w+) # Next will be at least one alphanumeric character # Parenthesis mean to save it into a positional variable # since this is the first in the regexp it is $1 .* # skip ahead… match anything ( # find a literal left parenthesis (\d+) # followed by a number… save this in $2 / # close regexp
If that is found, then print $1 and $2 (the saved username and posts respectively) with some formatting. Finding out what you’re supposed to match on is usually an exercise in paging through the HTML source to a web page, and slowly building up the expression.
The output is:
$ perl bb1.pl mrobinson52 125 Tezz 120 newd00d 119 robnhood 111 likeitontop 84 editormatt 76 mgraham44 74 fsec 67 bizzybot 61 cjaquess 49
So, as you can see, LWP is a pretty cool module that lets you pull out the necessary data from web pages…
Poke around the man pages for LWP and LWP::UserAgent, and find out what else they can do. Try writing a program to calculate your stock portfolio by sucking back quotes from the web!
The Perl board is there to discuss code like this… Post your questions, or show us the cool things you can do with the LWP module:
http://boards.brainbuzz.com/boards/vbt.asp?b)0
Long live the Penguin,
Sean mailto:swalberg@brainbuzz.com
Visit the Linux News Board at http://boards.brainbuzz.com/boards/vbt.asp?b2
2) Linux News
Mandrake Pulls Off IPO
Congrats to the Linux distribution based out of France for pulling off their IPO. The stock is set to start trading tomorrow. This news item also has relevant information for those looking to add this stock to their portfolio, but read the comments first, it looks like there was a typo in the ticker symbol.
http://www.newsforge.com/article.pl?sid/07/30/2348248
City of Largo Goes Live With KDE 2.1.1
The City of Largo, Florida were looking for an upgrade to their current system that is required to support 800 users and 400 thin client devices. KDE to the rescue! Not only does this announcement give some details of the system, it has some items to look out for if you’re considering such a move.
http://eltoday.com/article.php3?ltsn 01-07-23-001-14-PS
Sparks Fly at Open Source Debate
What else did you think you’d get when you have Microsoft execs speaking at an open source convention? The execs were defending their position against Open Source (ooh! “Shared” source!), but the audience wasn’t buying it.
http://www.zdnet.com/zdnn/stories/news/0,4586,5094814,00.html?chkpt
01
KOffice Release Candidate
“KOffice is an integrated office suite for KDE which utilizes open standards for component communication and component embedding. The primary goals of this release are to provide a preview of KOffice 1.1 and to involve users and developers who wish to request/implement missing features or identify problems. Code development is currently focused on stabilizing KOffice 1.1, scheduled for final release in mid-August, 2001.”
http://www.koffice.org/announcements/announce-1.1-rc1.phtml
3) Linux Resources
Programming, Graphics, and OS Tips
This is one great site, containing links to various tutorials on everything from C programming to shell scripting, and a whole lot of tips and tricks about Linux (and other operating systems).
http://www.osconfig.com/
Sendmail
This article from Server/Workstation Expert is a great introduction to sendmail, what it is, and how it does what it does. While you’re on the site, go to the main page and subscribe. It’s free, and always has some interesting columns. Out of all the trade rags that I receive, this is the one I like the most.
http://swexpert.com/C2/SE.C2.JUL.01.pdf
SpamMaster
For those getting sick of the spam in their inbox who have realized that conventional filters can’t help, here is spammaster. It’s a custom filter that is designed to weed out spam, and reduce the false negatives and positives associated with other filters. Some of the ideas used are quite interesting, not to mention effective.
http://www.lne.com/ericm/spammaster/
Diamond Rio and Linux
Last I checked, my Rio didn’t come with Linux software, but a gadget like that won’t stay unsupported for long. The developers managed to reverse engineer the communications protocol, and then write the software. Great work, guys!
http://www.world.co.uk/sba/rio.htm
The Tao of Programming
On the lighter side of the resource section, I thought I’d share a document that I’ve always found both informative, and entertaining. The Tao of Programming explains how programmers and management should interact and work.
http://epims.gsfc.nasa.gov/tao.html
4) App o’ the week
Need some fax software for Unix that’s network aware, and has Windows, Unix, and Mac clients (don’t forget the web and email use either)? Hylafax is the answer. Do yourself a favor and RTFM when you install this… It’s not hard, but there are several important steps that can be easily missed.
http://www.hylafax.org
(C) 2001 BrainBuzz.com. All Rights Reserved.
This message is from BrainBuzz.com.
You are currently subscribed to the Hottest Linux News and Resources as: sean@ertw.com
To un-subscribe from this newsletter by e-mail: send a blank email message to: mailto:leave-linuxnews-3825955Y@list.cramsession.com
To Subscribe to this newsletter by e-mail: send a blank email message to: