Scrapers, Spam and No Follow

| January 15, 2014 | 6 Comments

When I turn on the computer in the morning, one of the first things I do is review yesterday’s stats. Sometimes I’ll check to see where the views are coming from. When I find something unusual, I track it back to learn more.

This morning I saw an unusual referrer. Not wanting to give attention to the original domain, I’ve changed the name, but I’ve left the sub-directory and file name in tact. The referring url looked something like this: fictitiouswebsite.com/tools/scraper.php.  Naturally I followed the link back to the original site and saw several less than above board looking tools, one of which is the URL scraper.URL Scraper

URL Scraper, what a suspicious sounding name. Why would someone want to scrape my URL? Even more basic than that, what the heck does a URL scraper do? Naturally, this caused me to go off on a URL Scraper tangent.

Why URL Scrapers?

URL Scrapers are used as a shortcut to locate blogs with high page ranks and “do-follow” links. Once the blogs are located, the individual/individuals performing the scrape post comments (aka spam) on the blogs in an attempt to get link juice from the do follow links (I wrote an explanation of do follow vs. no follow in an earlier post ).  I was amazed at the sophistication found on some of these scraping utilities. Here’s promo text taken from one scraper tool

…search engine scraper which can be trained to harvest URL’s from virtually any website that has a search feature. It may be a simple WordPress blog with a search feature that you want to harvest all the URL’s from on a particular keyword or number of keywords, or a major search engine like Google, Bing or Yahoo.

Here’s more text:

As you may know many sites and search engines don’t like numerous requests from the one IP address, so the harvester has a number of options for connecting via numerous different proxies every connection it makes.

It’s obvious these guys are serious about their scraping. Maybe if they spent more time and effort building quality content they might actually earn their ranking instead of leaching off of the backs of other blogs.

No Follow Links

This recent scraping education coupled with my spam battle made me decide to make NJFM a no follow blog. To do so I’ve downloaded and installed the External Links plugin.  It allows me to apply no follow globally. Hopefully this will deter spammers.

External Link Plugin

Page ranking on NJFM means diddly squat to me. I write here because I like to write here. I don’t write on NJFM for the money or for Google ranking. If I did I’d have stopped writing here a long, long time ago.

What does mean something to me is peace of mind. Spammers annoy me. If making this blog a no follow blog will let scrapers know that spamming NJFM is an exercise in futility, then its worth it.

Basically I want to write. I would rather not spend time thinking about page rank, plagiarists, scrapers or spammers. Now that I’ve got that out of the way, maybe I can get back to writing (and yes, the tone of this post is one of annoyance…GRRR). 😉

Tags: , ,

Category: Blog, Page Rank

About the Author ()

Felicia A. Williams is a freelance writer and blogger. She spends the majority of her time with her family and writing. If she’s not writing or commenting on NJFM, she’s either outside smelling the roses or writing articles for one of her other sites.

Comments (6)

Trackback URL | Comments RSS Feed

  1. Joni says:

    Since i am so all these “thingamigig” illerate I just wanna say hi and let you know i’m still reading your blog…LOL. I do understand what you’re saying I just don’t have a blog right now. What a pain in the butt…

  2. Crystal says:

    OMG Felicia – I can’t believe the spam I just found out about! I have a dormant blog I haven’t thought of in a long time but got a dozen or so comment approval notices earlier today. Well, they were all spam so I went in and marked them as such. Then I noticed that I had 216 pages of spam (over 4,500!) and I had never even noticed the spam area before. So I deleted all the spam. Now it’s been a couple of hours and guess what? I’m up to 170 new spam! This is crazy – I know I don’t have any page rank there so I can’t imagine why the heck they’re bothering me!

    • Felicia says:

      Isn’t it absolutely annoying! It’s like getting your name on a mailing list. Once they have your web address they keep sending spam whether you read it or not.

      • Crystal says:

        Not sure why but I didn’t have comments turned off for posts after a certain length of time. I checked the box in the settings to automatically close comments after 14 days and beings my most recent post is nearly 5 years old, this fixed the problem. Now I need to check my other blogs…

  3. Crystal says:

    Geez, Felicia – what next? Yes, very annoying and you’re right about the these jokers spending time developing their own legitimate traffic instead of trying to leech off of others who do. Of course, I guess if you cared about page rank, this scraping could be a good sign, right?

Leave a Reply

Your email address will not be published. Required fields are marked *