Thursday, December 06, 2007

Miscellaneous Updates

I think it best to start with an update of how things are going with my new find (see last post).  Strawberry Perl has been running FINE now for almost a month.  I have installed a bunch of different modules and have run a lot of scripts using it and it is running flawlessly so far.

 

That, was a lead in to my next update.  I have a web site which I have paid a membership to so that I can access their information.  The web site provides PDF documents (A LOT of them) that contain stamp albums that are created by the site owner.  I am a philatelist and to not have to spend, literally, thousands of dollars on stamp albums makes a collector a very happy person. 

 

Well, I said there were a lot of files and I was not kidding.  There are a little over 200 countries in the world and each country is split up into multiple pdf files, each containing sections of the full album (this is done for a reason).  To be exact, there are 2,137 files. 

 

Now, wanting to get the most of my $20 (yes, its pretty cheap), I figured that instead of trying to download ALL of the files by hand that I would try and write a perl script that would do the job for me.  So, that is just what I did.  I was reading up on the different modules that I could use and I decided upon the WWW::Mechanize module written by Andy Lester.

 

It took me about 2 days to get the script coded, working and tested.  The O'Reilly book "Spidering Hacks" definitely lead me down the right path with their hack(s) on WWW::Mechanize.   I have to credit the author for the examples as I used some of his code in the script as well.  After working out the typo's and little mistakes, plus, since I had never used this module, I had to figure out how to do the Authentication that was required in order to download files. 

 

I did some searching around the internet and found a link to a page that has a couple of examples of how to do authentication, with one of the examples using WWW::Mechanize.  This was PERFECT.  It worked like a charm and before I could smile I was downloading all 2137 files from the site. 

 

I must say, perl is a very complex and at times, complicated language but if it there is one thing I can say, it is that I LOVE PERL!!!! 

1 comment:

Andy Lester said...

The author of the Mechanize hacks in Spidering Hacks is the same guy who did the WWW::Mechanize module. I think there might be one or two non-Lester hacks in Spidering Hacks, but the main ones are mine.

I'm just glad you were able to find a copy.

 
Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.