When I turn on the computer in the morning, one of the first things I do is review yesterday’s stats. Sometimes I’ll check to see where the views are coming from. When I find something unusual, I track it back to learn more.
This morning I saw an unusual referrer. Not wanting to give attention to the original domain, I’ve changed the name, but I’ve left the sub-directory and file name in tact. The referring url looked something like this: fictitiouswebsite.com/tools/scraper.php. Naturally I followed the link back to the original site and saw several less than above board looking tools, one of which is the URL scraper.
URL Scraper, what a suspicious sounding name. Why would someone want to scrape my URL? Even more basic than that, what the heck does a URL scraper do? Naturally, this caused me to go off on a URL Scraper tangent.
Why URL Scrapers?
URL Scrapers are used as a shortcut to locate blogs with high page ranks and “do-follow” links. Once the blogs are located, the individual/individuals performing the scrape post comments (aka spam) on the blogs in an attempt to get link juice from the do follow links (I wrote an explanation of do follow vs. no follow in an earlier post ). I was amazed at the sophistication found on some of these scraping utilities. Here’s promo text taken from one scraper tool
…search engine scraper which can be trained to harvest URL’s from virtually any website that has a search feature. It may be a simple WordPress blog with a search feature that you want to harvest all the URL’s from on a particular keyword or number of keywords, or a major search engine like Google, Bing or Yahoo.
Here’s more text:
As you may know many sites and search engines don’t like numerous requests from the one IP address, so the harvester has a number of options for connecting via numerous different proxies every connection it makes.
It’s obvious these guys are serious about their scraping. Maybe if they spent more time and effort building quality content they might actually earn their ranking instead of leaching off of the backs of other blogs.
No Follow Links
This recent scraping education coupled with my spam battle made me decide to make NJFM a no follow blog. To do so I’ve downloaded and installed the External Links plugin. It allows me to apply no follow globally. Hopefully this will deter spammers.
Page ranking on NJFM means diddly squat to me. I write here because I like to write here. I don’t write on NJFM for the money or for Google ranking. If I did I’d have stopped writing here a long, long time ago.
What does mean something to me is peace of mind. Spammers annoy me. If making this blog a no follow blog will let scrapers know that spamming NJFM is an exercise in futility, then its worth it.
Basically I want to write. I would rather not spend time thinking about page rank, plagiarists, scrapers or spammers. Now that I’ve got that out of the way, maybe I can get back to writing (and yes, the tone of this post is one of annoyance…GRRR). 😉