scraping images with perl

Tags: perl, wwwmechanize, scraping

So I've been dabbling with perl lately and this is how I feel today:

xkcd.com/208
See full comic here.

I'm going to go through a simple image scraping program I conjured today. I wrote this to scrape images off a t-shirt gallery so that I could help my friend make a choice on his favourite one offline. \()/ Proud to be a nerd \()/

So for perl of course there already are modules on CPAN to help. I'm using two:

  • WWW:Mechanize [ This is a pretty famous module and can be used for some automation, downloading content from the web etc]
  • HTML::TokeParser [ We'll use this to parse the html and get to the tag we want. ]

My target html looked like this:

My "algorithm" went something like:

  • I used WWW::Mechanize::get() to get the page I wanted.
  • After this I supplied the page to Html::TokerParser.
  • From here I went looping through all the tags using the get_tag function till I found one with the gallery-img2 class.
  • Then I jumped to the next img tag and got the src attribute.
  • And finally I used WWW::Mechanize::get to download the image for me.

The code is thoroughly commented:

Any feedback, insults and improvements are welcome! :)

Comments powered by Disqus