pesos per diem

Friday, May 12, 2006

Nifty tool for Mac

While writing a PHP script to extract text from a webpage I came across a cool tool from Metafy called Anthracite. While there are too many features to list here, in a nutshell what it does is let you mine webpages' content.

As an example of what it can be used for I wrote a quick "script" to extract the content section of an article from ArticleCity, remove all html tags and classes, split up the paragraphs and titles, then apply my own html formatting and spit out a text file of the article.

The real beauty of it is that it is scriptable, meaning all I have to do is browse to the page I want to scrape with my webbrowser (safari), click a button and in about 5 seconds I have a copy of the article on my computer ready to be used with my page generation software.

Another nice use for it is as a keyword scraper. By using the integrated Google API module I can mine thousands of keywords from Google by parsing the meta tags of the sites returned for my main keyword, usually only taking a couple of minutes.

While the software does cost some $$ there is a 2 week trial available. If you're on a Mac and know a bit of HTML you should download and give it whirl. I think you'll be quite pleased with the results it can give you.

0 Comments:

Post a Comment

<< Home