Protect Yourself Against The Duplicate Content Threat Caused by Scrapers

First I want to answer the obvious question for those who might not be so up on net lingo: what is a Scraper? Without getting too technical, a Scraper is a person or program that searches through websites and steals content. That information is then placed on a separate site, usually copy/pasted wholesale and occasionally badly rewritten.

The express purpose behind a scraper site is to spam search engines. This catches the attention of users who (theoretically) will see it in the results. Thanks to high keyword density and social spam such as through blog commenting, they were in the past able to push these scraper sites through Google ranks with a decent regularity.

Now that SEO manipulation is more difficult to pull off. But it hasn’t stopped the threat of Scrapers, nor has it significantly lessened the number of sites that use stolen content.

Scrapers and Duplicate Content

Duplicate Content Threat Continue reading

Does Content Aggregation Result in Duplicate Content?

Lately I have been seeing a lot of websites that aggregate content from around the web (especially from social media) to compile in a ‘mashup’ style format. Of course, this has been happening for years, but only just started reaching new heights in popularity. With the success of readers like Feeddler and Pulse, it is only natural that other sites started to take advantage of platforms that let them fill that need.

Content Aggregation

But there is a nagging question faced by those who have adopted this practice. Will Google’s bots read it as duplicate content? Is there a way to make it so they don’t?

How Bots See It

The truth is, this is a very hard thing to know for sure. Because the pages that the bots check are random, it is impossible to know what they will be looking at. So if you have some pages that are not as mixed, there is a good chance they will read it and other pages as duplicated from content around the web.

While there is technically no penalty for this, it will banish those pages from the main results. Which is a punishment in and of itself. After all, how will anyone find it?

Only through making sure all content you mashup on your site is meaningful, and mixed enough not to be identical to any other website, will you be more likely not to look like duplicate content. But even then, if it violates the terms of the site you took it from, you are in for a headache. Not to mention a take down notice.

Getting Proper SEO Results

Getting SEO Results

If you are concerned about bringing in good results, there is only one way to do it: providing unique, high quality content. Which cannot be done through purely aggregated links and references from other sites. You might not get struck with duplicate content, but it doesn’t mean you will be drawing any attention to yourself, or helping your rankings/traffic.

Conclusion

You have to make sure that you are writing unique content, while mixing up aggregated content enough to classify as unique content. Otherwise, you won’t be doing yourself any favors.

Image Credits: 1, 2.

What Is The Duplicate Content Penalty?

Recently, I was reading an article written back in 2010 about duplicate content, and how there is no such thing as a duplicate content penalty. It explains that this is a rather misleading phrase that has no actual meaning, because Google only puts penalties on spammers trying to willfully trick the search engine. The most that could happen, the author said, was a few of your pages would be filtered out of results. Continue reading