How to Keep Scrapers from Ruining Your Content Strategy

content scrapers

Content marketing is huge. Consumers love it because they get valuable information that they maybe once had to pay for. Advertisers like it because it drives traffic, brings exposure to their brand and helps convert visitors into clients.

But with these great rewards, there is also a risk. One of the main risks with a content marketing strategy is that your content will get stolen.

As if this wasn’t bad enough, these scrapers may even outrank the original post on the search engines. Content scrapers can destroy your brand, take traffic away from you or simply ruin your day since they are benefiting from your hard work.

Would you like to know how to stop people from stealing your content? Would you like to know how you can actually benefit from scrapers even if you can’t stop them?

Continue reading, and I’ll give you 8 steps you can use to help you fight content scrapers.

Step #1: Use Google Alerts

Let’s say you only generate content once a week. In this case, you might want to consider using Google Alerts to detect stolen copy. (You don’t need a Google account to get started.)

The steps are pretty basic. Once you’ve published your post, drop the headline into the search query box:

google alerts

Next, choose the types of results you’d like to get.

Do you want to search just the blogs? News? Groups? Just to be safe, I’d recommend you choose “Everything.”

google alerts search

Now, how do you want Google to return your results? You have three options: as-it-happens, once a day and once a week.

google alerts setting

Unfortunately, Google will only send you content that is in the top twenty search engine results for the web, and the top ten for the news.

If your scraped content doesn’t make it that high, then you might want to employ one of the other plans I’ve included here. You’ll also want to use a different approach if you publish a lot of content. Otherwise, it would be too hard to do it manually.

Once you have a list of sites that are scraping your content, you can look up their IP addresses and proceed to step 2.

Step #2: Block scrapers

Scrapers like to grab your RSS feed so they know when you publish content. If you know the IP from which they are operating, then you can simply block them from your feed by putting the code I posted below in your .htaccess.

RewriteEngine on RewriteCond %{REMOTE_ADDR} ^69.16.226.12 RewriteRule ^(.*)$ http://newfeedurl.com/feed

If you need help doing this, check out the Apache .htaccess Tutorial and the URL Rewriting Guide.

Step #3: File a DMCA complaint with the search engine

When you find content scrapers who are stealing your stuff, you can file a DMCA complaint with Google.

  1. Tell Google what content was stolen – Your first step is to identify the stolen content. If you have a lot of content that is stolen, this process could take a long time since you have to provide every single URL. Fortunately, Amit Agarwal has a simple process of using Google Docs for this purpose. Check it out as it will save you a lot of time.
  2. Tell Google what site is doing the stealing – You’ll now have to match your content on your site with the stolen pages on the spam site. Being specific is very important.
  3. Share your contact information – Give Google as much information as possible so the staff can contact you. The more the better.
  4. Add these two sentences – “I have a good-faith belief that use of the copyrighted material described above as allegedly infringing is not authorized by the copyright owner, its agent or the law” and “I swear, under penalty of perjury, that the information in the notification is accurate and that I am the copyright owner or am authorized to act on behalf of the owner of an exclusive right that is allegedly infringed.”
  5. Put your signature on paper and then mail or fax it.

Be warned that if you try to go after competitors with this strategy or make a plagiarizing accusation that isn’t true, you could be responsible for court damages.

Use only when you are absolutely certain that your content is being stolen.

Step #4: Create lots of links in your content that point to other pages on your site

Linking to your internal content is just good SEO. But it is also very helpful when your content is getting stolen.

If copies of your posts are landing on a spam site, they will still retain all of the links that you created, pointing back to your own site. So, even if you can’t stop the thieves from taking your content, at least you can drive some traffic back to your site!

The rules on links have changed, particularly when it comes to anchor text, so here are a few resources to help you understand what you need to do:

Step #5: Use duplicate content-detecting services like Copyscape

Another great idea is to use Copyscape Premium. This will help you track down sites that are stealing your content.

copyscape premium

It works by detecting your unauthorized content on scraper sites. There are several ways to do this:

  • Manual – You can simply copy and paste your content into the search box to see if anything comes up. Copyscape says that for the best results, you should share 2,000 words or less, but I’ve tested posts with over 5,000 words without a problem.
  • Check your entire site – This batch search will review up to 10,000 pages on your site in one procedure to see if there are copies floating around online.
  • Manage cases of plagiarism – You also have the ability to follow the progress of your plagiarism cases, including your responses.
  • Include team members – If you manage a staff of writers, you can link Copyscape accounts to track searches under one invoice.
  • Eliminate certain sites in searches – If you own several sites and post duplicate content on them yourself or if you’ve authorized the re-use of a post, then you can actually filter your results not to return these sites.
  • Compare content – Whether it is online or offline, you can evaluate the similarities between two pieces of content.
  • Use the API to automate – These scripts will automatically search for copied or stolen content online.

Once you have a list of sites that are stealing your content, you can either block their IP addresses like we did in step 2 or send a DMCA complaint like we did in step 3.

Step #6: Search Webmaster Tools

You can also find scrapers by going into your Webmaster Tools. Look in Your Site on the Web > Links to Your Site.

Next, organize by the “Linked Pages” column.

What you are looking for are the sites that point a ton of links to your site. You can eliminate any social bookmarking or social media links.

Also note legitimate links from fans.

Next, you’ll have to click through the links manually to see the pages from the sites that are pointing to your content and try to get them removed using steps 2 and 3 above.

Step #7: Use trackbacks

With WordPress, you have the ability to get trackbacks. These links show up at the bottom of the post, indicating how many people have linked to your post.

tackbacks

When it comes to scrapers though, you need to have links in the stolen content that point to other pages on your site. This is where good anchor text writing comes in (it’s also just good on-page SEO).

I try to stay in the habit of following a trackback every time I get one in order to review the site. They may not all be scrapers, and you never know what kind of connection you can make by seeing who is on the other side of that link.

Step #8: Put footers in your blog posts

A great way to get credit for your posts even if someone is scraping them is to put a footer in each post.

Yoast has this feature built into the WordPress SEO plugin:

yoast footer

If you don’t want to use WordPress SEO, you can use the RSS Footer plugin by Yoast. Customize the footer to say, “The post [post title] appeared first on [your blog].”

Conclusion

Keep in mind that there is a little bit of link value when you get trackbacks from stolen content. Of course, the site’s PR will typically be pretty low, but all of that adds up. This is to encourage you if you find it really difficult to stop spammers.

Something else to consider: if your content is getting stolen, that probably means it’s pretty good and people see value in it. So, while it might not be the best way to get a compliment, it’s at least a sign that you are making it!

What do you do to fight off content scrapers?

If you want to break through to real profits online, you need some serious firepower. For a limited time I’m sharing some select tips and tricks Amazon, Microsoft, NBC & Hewlett Packard paid thousands of dollars per hour for, FREE.
  • The step by step guide to monster traffic generation
  • The how-to guide for increasing conversions on your website
  • 7 Cashflow killers your analytics tools are hiding from you
     
 
100% privacy, I will never spam you!

Comments

  1. Good tips!

    Cloudflare also have some pretty nice features for blocking certain spamming ip addresses.

    Also, what about Rel=”canonical”?

    Best,

    Nico

  2. I was doing a check to see if one of my writers had copied work but I saw once in the search results that there was a bunch of results with the exact work I had up on the site my content was published on.

    I’ve never used Copyscape but it looks like it’s worth trying out. I never gave it much thought only because the work that was being copied from me was back in the day when I posted on Ezinearticles a lot, so no big deal for me at least. And I’m sure/hope Google has something for this in place so the original owner doesn’t get penalized.

    -Amir

  3. Neil – right on the money!
    Great points – especially the google alerts!

    But question:
    Step #4: Create lots of links in your content that point to other pages on your site

    – If one does go about doing this – for an overall organic slash penguin friendly way of internally linking – shouldn’t they limit the amount?

    your pal
    Chenzo

  4. Another great post Neil,

    Definitely helpful – thanks!

  5. There has to be better ways to protect the content online. Provided we are putting so much effort in this. There’s a also a service called TYNT that helps in giving attributes to the original source when data is being copied.

    As usual ur post covers yet another important thing to keep in mind :)

    Thnks and happy blogging :)

  6. Great tips, the internal linking is genius. I’ve always (mostly) done it but I never thought of it as a protective measure before.

  7. Thanks for the steps, and the very concise list of resources webmasters and publishers may use. Also, the tree graphic is the absolute best representational image of the ‘circle’ of the web that a lay-client can easily understand.

  8. Nice Articles as Always Neil !

    I got some thing new.

    Thanks again.
    Gyanendra

  9. While creating a lot of links is a great way to profit from the annoying scrapers. Wouldn’t cause problems with google for over optimization?

    • Not sure how you can profit from it, but yes there could be over optimization penalties. I don’t too much though as it would be from duplicate content pages.

  10. Also just to add to this, linking is usually better at the beginning of the posts as most of these spammers cut of the post after the first paragraph

  11. You’ve scored it once again Neil. What an awesome post related to content strategy.

    Just want to share with you Neil. I once heard a quote that sounds like this,

    “Originality is undetected plagiarism.”

  12. Another TERRIFIC article – as always, everything you produce, Neal! THANK YOU, Neal!

  13. Thanks a lot for the useful info. There must be a lot of people out there suffering from the same pain. These tools help.

  14. Really useful, comprehensive post Neil. Thanks for sharing.

  15. Neil, fantastic post you are truly a really great and informative writer, I always come away with some great information to remember. However I was unaware that you can use copyscape to search your entire website, I have used copyscape before but was unaware that you could utilize it for your entire site. Thank you for the awesome info Neil!

  16. Neil
    I’m bookmarking this one for future reference.
    Thank you for writing something so helpful.
    Christopher

  17. Great Post Neil,

    This is actually one of the primary reasons we make sure we use Google Authorship attribute on all of our content. So if our stuff does get scraped, at least Google can see we published it first with with our rel=publisher attribute, and I am certain they will expanding that feature in their SERP’s as well.

    Additionally, while it can be very annoying to see your hard work just straight up stolen by spammy scraper sites, I take solace in knowing that most of those sites wont be around for very long anyway.

    As you said, its a form of flattery in a way.

  18. A great post, thanks for all the useful tips. I’ll be using your methods as my content often gets stolen.

    If I may make one suggestion. If a person just copies and pastes there article’s title you will often get results that are not from your content being stolen. (broad match) I think it is more useful to add quotes (exact match) around the article’s title to only get exact matches to filter out all the noise. Of course there is a trade off here….

  19. great tips neil. scrapers are really pain in the ass, especially when they don’t even grateful enough to link back.

  20. Great advice Neil! I’m using Googel Alerts for monitoring and Copyscape, as well as interlinking. Would like to know more about rel=canonical. Where should I put it? Hope you could make a post about it. =)

  21. Hai Neil,
    Once again superb post.. Really for a website content is more important and your post explained a great points.
    thanks for sharing this post..

  22. Great post Neil, and it comes in perfect timing for me.
    I wanted to add that you can set wordpress within the options to deliver just a summary of your post instead of the whole post to rss feeds. This way scrappers would only get a short piece of your content.

  23. I just make sure I use internal links on my posts, then when someone uses my content I get links back to different pages. It’s also a good way to find out who is using your content by looking at sites that link to you from webmaster tools.

  24. Good article Neil as always. I myself use TYNT and find it a very useful tool.

    It helped me along with Webmaster tools track down a site in Saudi Arabia, where I found an exact replica of one of my blogs.

    The only difference being that my content had been translated into Arabic. And the content had not even been translated well as they must have been using Google translate or something like that. My content read crap in Arabic.

    I found out who the site owner was a fired of a couple of e mails. One in English and the other in Arabic. Telling them bluntly that if they did not take down the site. I would be informing Google and taking legal action against them.

    It did the trick, the site was gone in a couple of days and I broke all links with that URL.

    I don’t mind people sharing my content. I actively encourage it. But this instance was just a blatant rip off and a bad one at that.

    Love your posts they are always informative.

    AF

  25. You always have great content Neil.

    How is content scraping different than content curating?

    • Hi mark,

      Scrapers typically pick up your RSS feed and just copy all of your content as it is right from your site. Some leave the links, some remove your links – its a one for one exact copy of your content. Curating is actually reading someone else’s post, writing your own original post about the topic, and putting some snippets from the original source with a link back to the original source giving them credit for it. It’s basically what many news sources do.

  26. Scrapers are menace, no doubt, but i seen few examples of DMCA complaints that served just to lower number of competitors.

  27. Wont too much code in .htaccess cause issues with the site?

  28. i really loved point 4…
    Create lots of links in your content that point to other pages on your site
    anyways its a perfect way to engage readers in your site and give them reason to stay back..

  29. Thank you Neil for your suggestions, I ue some Yoast Product but I’ve never used the RSS Footer plugin.
    Regarding Copyscape you can also use Plagiarisma that is free to use (if you’ve a small site).
    Even Google Alert is a good free tools, I use it to manage online reputation also

  30. Very information post, Neil. I think with the recent google update scrapers are long gone.

  31. Thanks for sharing some of your insightful thoughts, Neil!

    At the recently concluded SES San Francisco conference, Google spam head Matt Cutts announced the popular online search engine’s intention to encourage webmasters to build websites that can stand the test of time. So it has become essential for the digital marketers and SEO consultants to concentrate on enhancing the search engine ranking and visibility of their website through a comprehensive content strategy.

    The points highlighted in the article will definitely help the web content developers to trackback their stolen content. I was using Copyscape, most of the time, to ensure that no duplicate or stolen content is posted on my website. The additional points will definitely be of great help in identifying and preventing spammers.

  32. my content sucks, so i dont have to worry about this. haha. but great tips nonetheless neil!

    • Ah don’t say that, be positive! Providing useful information to your readers to help them succeed is what it’s all about. That’s why I love Neil’s blog. Same with me, after working in the web design industry, I have so many things I want to write about in regards to business, client handling and general web design because I wish someone taught me these things when I started.

      Just teach people things that you want to know about. Solve a problem not just post for the sake of it :) Hope this helps, but just be positive!!

  33. Good stuff Neil. #4 Works like a charm. It’s great way to get backlinks. I’ve also heard of a WordPress Plugin that keeps people from scraping. Anyone know what I’m talking about?

  34. Sometimes you could first contact the webmaster of the site who has stolen your content and ask them to take it down (or leave an excerpt with a link back to your site – even better).

    But if you have a problem where people are constantly stealing your content, then the above remedies would work perfectly.

  35. Nice suggestions to put copy brats off.

  36. Neil you din’t mentioned about TYNT.. Is it not good for blog?

  37. Thanks, I am going to make a checklist of the above and implement all. I understand it should help me improve my domain strength. Should I be too concerned about not having a PR for a year old blog?

  38. I’m not to bothered if people copy your blog content, you could take it as a compliment as it’s obviously good enough for someone to copy and like your last paragraph states you can get a link back and hopefully with some good link juice

  39. Hello Neil,
    Nice collection of tools and ways to prevent your articles being stolen..or if you can’t do that, to use a little link love.

    But I think that copyscape monitoring solution is the best, because one of my clients used it, and they are so satisfied. Simple and elegant and you get notification by e-mail in seconds! And you have some advance options, too…

  40. Really good tips. I have been noticing lately that some of my content was stolen. I checked on my backlinks. Good to be aware of other methods to check it!

  41. Thanks for the great tips, Neil!
    I have one question that you barely mention on the post.
    If I myself copy part of the content on my website and then post it to other sites like yellow page, facebook, web directories… will my site get risk? As my website is quite new to search engine so I am afraid that google will consider the content on the site I post to as original content and the content on my website is duplicated? :)

  42. Thanks so much Neil for this useful, i really have been searching this sort of helpful stuff to stop scrapers for stealing some of our contents from many sources we post on. I hope , I may now prevent them doping this anymore to cheat with our content. . !

  43. Very helpful article. I didn’t realize u could block ip addresses from urban reserved feeds :) thank you

  44. One more time you have come with great tips Neil. I am pretty sure that these tips will help a lot in rectifying who is copying our content and most importantly we can save our quality content by blocking their IP Address. Thanks for a share mate.

  45. Hi Neil,

    Nice write-up. I just finished going through it and thought I could contribute to point #2 “Block Scrapers” and why blocking at the IP level is not the best approach. In fact, it has proven to be harmful.

    At my work, we see over 100,000 web scrapers and bots daily on our clients’ sites and they rarely use the same IP’s. Because IP addresses are so dynamic, a web scraper can use an IP address and then drop it back into the pool of publicly available IP’s. That IP might then be reassigned to a legitimate user, whom you are now blocking…

    Admittedly, the redirect tip will work on some scrapers but not the majority. I’d love to hear any thoughts on this. Agree/disagree?

    Thx again for the write-up,
    @seanharmer

  46. I use a lot of Google Alerts to get more information on what’s going around with my niche. One of my content was stolen but it was taken down right after I sent them a quick message.

    I guess I was just lucky at that time, but now I know what (additional things) to do as soon as this happens again.

  47. Highly valuable info. I keep finding my content turning up without attribution (which is annoying as a lot of it comes from article marketing directories – surely giving it away in the first place should be enough?).

    I think I’m going to take a few days out and start DMCA clubbing some of the worst offenders.

  48. I have an problem with my post Neil found that some post is getting crawled and published on one blog automatically once i publish on my blog its getting published on that blog too.I have contacted the blog owner 4 times yet i have not received any reply from him.Does it affects my blog?

  49. Excellent, excellent, EXCELLENT tips, Neil! Scrapers aren’t all horrible, you’re right, but more often than not they don’t actually help anything either.

    Most of the sites that scrape our blog posts remove the in-text backlinks, which is frustrating. Some of them are savvy enough to remove references to our company altogether, too, which makes them harder to spot. It seems like scammers and spammers are always evolving to find the best ways to screw people over…sigh.

  50. Neil, another great article packed full of good advice, internal link building is a necessity. I just happened upon your blog a couple days ago, but everything I read is well written with tons of info. Thanks for sharing so much, I will be spending a lot of free time on your site until I get caught up with it.

  51. Hey Neil,

    Have you ever heard of a simple plugin called ‘Blog Protector’? It disables the right-click so people can’t copy-paste your content.

    I know this won’t stop all scrapers, but it at least stops people from easily stealing your content.

    What do you think of it?

    I used this for a while but was then persuaded to just stop worrying about it and installed another plugin called TYNT which tells you when your content has been copied and automatically adds a link back to your site every time someone copies it.

    The theory is that some people may be copying it for genuine sharing purposes, e.g. old-school pasting it into an email to share with friends.

    TYNT claims that they actually help your site by getting more sharing and links though I’m still not sure what I really think about it – people could obviously just delete that link.

    All in all, my approach has been to add more personal stories into my content, because not only does that make for better copy, it’s also less transferable, plus more inter-links as you mention above.

    Overall I have decided to just accept that people will copy my content as an inevitability, at least I wrote it first and to stop worrying about it leaving more ‘mental space’ to focus on simply writing the best content that I can.

  52. I’ve dealt with scrapers and rippers for years with various sites and Google is pretty good at picking up on duplicate content. But they still make mistakes and the main thing is to make sure the search engines (specifically Google) have your content spidered before the scrapers find it. The best solution is what you have outlined in #4. The more signals you can send to Google that you’ve got new content up, the better.

    My process is a bit complex these days when it comes time to publish but it involves a lot of internal linking, social media funneling and RSS throttling (e.g. keep your RSS on time-delay).

  53. Thanks Neil for the info on blocking scrappers. As you said we can use track-backs on wordpress through RSS Footer plugin by Yoast, but alas we can’t use it in Blogger (I am using Blogger)!

  54. Had no idea you could go to such lengths to stop them. I love they tutorial fashion and outline of this. Really an easy step-by-step guide for a newbie.

  55. I think loyal readers of your content can also help alert you to scrapers and defend your site and content as well.

  56. What a simple but practice idea. I use google alerts for 30+ keywords a day, and I have never thought of using it to protect the articles I write.

    Thanks for the advice, you might have just saved me future headaches.

  57. Thanks for help and i used this methods and this have had really helps me in my work when i open this Google alerts i get what i wants thanks once again for such a great sharing.

  58. Creating various links pointing to other pages in content is the best way to hold people on your site because people can acquire useful information at one place. It’s a good way to get more backlinks. I am using copyscrape for duplicate content checking.

  59. The scrapers stopped using RSS feeds a long time ago. We cut down our RSS feed to nothing.

    They use autoblogging software that steals the entire content even when the RSS feed is labled with links and copyrright information. We beleive this software uses Google News somehow.

    We’ve found a way to stop some of the autobloggers. We’re testing it right now. We deny access completely to the autobloggers who steal content.

    DMCA requests have been proven to be a waste of time. Once one gets taken down another one does it all over again.

  60. hey neil,
    you are a genius thinker and writer as well. you have done a great job by writing this helpful awesome article.

    thanks.

    Matt

  61. I like step number 4. That’s really smart. All worth to try Neil.

  62. Neil, can u tell me any plugin that i can use to find out the IP of the scrappers to Block it….Please reply.

  63. I like this strategy!!

  64. Hello!,,,,,,

    Hi!,,,,,,,,Great tips, the internal linking is genius. I’ve always (mostly) done it but I never thought of it as a protective measure before.

    Thanks,,,,,,,,,,,,,,,,,,!

  65. Thank you very much for the information. It is so difficult to keep track of all the posts and then also to see who is stealing what.

  66. This is really helpful, but frankly sometime it’s really hard to stop scrappers.

  67. there is this feedreader.com, its a massive feed scraper that copies from other blogs and do not provide absolute backlink, this way they are ranking higher than most of the blogs.

    Too bad Google is not doing anything about it.

    What are your views on it, how to protect our blogs from feedreader scraper?

  68. Thank you Neil bhai. This will certainly help me keeping away copy cats from my blogs.

  69. Very information post, Neil. But the new Penguin 2.0 has rolled out are the above methods penguin safe ?

Speak Your Mind

*