A few months ago I learned more than I wanted about internet scrapers when I realized NJFM was being attacked by one (or several). At first I couldn’t quite figure out what was going on. I would periodically receive notifications from my web hosting company, SiteGround (they’re the best), of potentially reaching my daily usage limit. Their notifications prompted me to review my account activity. I couldn’t understand why the excessive amount of activity since my blogs were all but dormant.
Upon further investigation I noticed there was a lot of activity from an IP address I was not familiar with. This had apparently been going on for about a month or so. As I said earlier, my sites don’t get much traffic so I didn’t monitor them.
CPU Usage – Unusual Activity Pattern
This unusual activity involved a particular IP address going through each and every post on my blog. After going through each post it went through the WordPress categories and through the tags. When you consider the fact that NJFM has over 600 posts, that is quite a lot of activity in a short period of time. The scraping pattern had been going on for about a month.
Not knowing how to stop the problem I took NJFM down. I didn’t really take it down, what I did was download a Maintenance Mode plugin and activated it. For approximately 30 days NJFM was not accessible and sported a Maintenance Mode splash page. When I brought NJFM back up the crazy activity had stopped (obviously taking a site down isn’t the preferred choice for stopping scrapers).
Scraping – Here We Go Again!
I didn’t think much of it anymore once the activity stopped, until recently when I created another blog. The new blog had one or maybe two posts. Back when I created it about a year ago, I started noticing a similar phenomenon but didn’t really care because how much scraping can you get from a blog with only two blog posts on it.
Several months later I started adding content to the blog and noticed the scrapers were back. Not wanting to put a maintenance mode splash screen on this blog I realized I had to figure out what to do.
Just like hackers, it’s impossible to stop scrapers. Not only do they take your content but they put your web server through a lot of activity at the same time. This can cause your account to reach its maximum CPU usage and be shut down for period of time. I know because it happened to me.
Feeble Attempts to Stop Scrapers
I know I cannot stop scrapers. If scrapers want to scrape they will, but at least I was able to deny the offending IP address (little consolation as I know they have many IP addresses at their disposal). IP Deny (accessed through my hosting company’s cPanel) is sort of like closing the barn door after all the animals have escaped. It’s also time-consuming and inefficient to continually monitor IP addresses.
Fortunately, after chatting with my hosting company (Site Ground) they put in place a bot deny code list which blocks a majority of spam bots on the internet. It may not stop them all, but it does help. Unless I hire a team whose sole purpose is taking care of scrapers, I’m fighting a losing battle. Doesn’t it sound a lot like the plagiarism issues we online folk face?
Searching for Additional Possible Scraper Solutions
One of the scraper solutions I found was the Yoast SEO plugin. Of its many features, it has an option to modify your RSS feeds to give proper attribution to the original author/blog. Apparently scrapers sometimes use scraped information in their RSS feeds. What this plugin does is it inserts links back to the original post and the original author at the end of each feed.
RSS Author attribution is just one tiny feature of this plugin. After I installed it I realized it’s going to take me some time to truly understand and fully use this feature-rich plugin. Not only does it help with RSS and scrapers, but it does wonders for improving a blog’s SEO.
WordPress is a robust platform. There are so many features in WordPress that I didn’t and still don’t have a clue as to what they are or how they work. This plugin is helping me to properly utilize some of these features.
Although Yoast SEO is new to me, it’s not a new plugin. It’s been around for quite some time. My goal is to get a better understanding of what it could do and how it can help improve my new blog’s SEO. Since the new blog has so little content, I’m hoping this plugin will get me started on the right foot. There are a host of tutorials online to help me get my arms around it.
As you know, each cloud has a silver lining, if only you open your eyes to find it. Well, the scraper cloud opened my eyes to the Yoast SEO plugin silver lining. The other silver lining (which I was already aware of) is my hosting company, Siteground. Thanks Siteground for being there and helping me with my various issues. Yes…there are other issues I’ve come across. Those issues will be the topics of future posts.
Yes, those are affiliate links for Siteground