Creating a search-based content scraper
If you’re on of those people building MFA-sites you probably know what a pain in the ass it can be to write “quality content”. And since it’s just made for Adsense you can just as well scrape some pages instead of actually writing them…
There are basically three options to get your scraped content:
- Ripping of entire pages
- Scraping RSS feeds
- Scraping SERP pages
My favorite is scraping SERP pages, this will get you guaranteed relevant pages with a nice keyword-density. To demonstrate how you can do this I modified the Google script. For the search results I use Mihalismsearch’s XML feeds (unlimited queries!).
I hacked all the irrelevant stuff out of the Google script (demo: Dutch SEO) leaving just a SERP page with nice titles and heading. If you’re planning on using it yourself you may wanna consider hacking out the links as well.
Next up: URL rewriting. The URLs you really want for your scraper are someting like this. Just add the following code to your .htaccess:
RewriteEngine On
RewriteRule ^([^/]+)\.html$ /xmlsearch/?query=$1 [L]
Now you’re pretty much done except for a nice “little” keywordlist. I prefer Keyword Elite or Google’s Keyword Suggestion Tool. Create links to the scraped content pages and you’re done.
Ah, before I forget. You can download the demo package here offcourse.
July 22, 2007 - RSS for Comments
nice script. Thanks.
just reading the code and try to find how to get local search (dutch for example)
Any hint where to find?
Not too fast with my replies these days but… On the Mihalism site there’s an explanation on how to get results in local languages.