I found this article about a perl script that downloads True Type fonts from http://grsites.com/fonts/. Looking at the script, it’s about 100 lines of fairly dense, comment-less code that’s quite tied to the web page and url structure that it’s scraping. I rewrote it using WWW::Mechanize in 33 lines including comments and it runs under strict. It took about 15 minutes, and I never had to view the source code of the page I was scraping.
Code below.
1234567891011121314151617181920212223242526272829
#!/usr/bin/perlusestrict;useWWW::Mechanize;# Hit the first pagemy$mech=WWW::Mechanize->new();$mech->agent_alias("Linux Mozilla");$mech->get("http://grsites.com/fonts/");die$mech->response->status_lineunless$mech->success;# Pull out all the page linksmy@links=$mech->find_all_links(text_regex=>qr/^Page/);foreachmy$page(@links){print"Getting ".$page->url_abs."\n";$mech->get($page->url_abs);nextunless$mech->success;my@fonts=$mech->find_all_links(url_regex=>qr/fontview.cgi/);foreachmy$font(@fonts){print"Getting ".$font->url_abs."\n";$mech->get($font->url_abs);unless$mech->success{print$mech->response->status_line;next;}my$fname=$font->url_abs;# pull out the name from the url$fname=~s/.*fn=(.*)&?.*$/$1/;my$fontlink=$mech->find_link(url_regex=>qr/fontdownload/);$mech->get($fontlink->url_abs,":content_file"=>"$fname.ttf");}}
Comments
I’m trying something new here. Talk to me on Twitter with the button above, please.