The Duplicate Content Penalty

March 24th, 2017


Be warned: if Google finds duplicate content, a spike will come out of the internet and impale your website horribly.

Frightened? Many are. But they shouldn’t be.

Duplicate Content Is Necessary

Let’s consider for a moment that Google, or at least some of the higher ups that hold sway at Google such as this guy, state that every website in the universe has some form of duplicate content on it. Think of press releases, blog quoting, product descriptions between manufacturers and dealers… some duplicate content is just natural. If Google were to drop the hammer on all of this content—i.e. implement a “duplicate content penalty” for any duplicate content it came across—it would make a whole lot of content syndicators really grumpy. Google often doesn’t care about making people grumpy—what it does care about is user experience, because users are the ones who click on ads.

On that note, it’s frustrating for a user to see a bunch of webpages in a search result with exactly the same content. Users like to see diverse angles and opinions on a given topic.

It comes back to user experience. We have to think of why the content is duplicated. Is there a logical and justifiable reason? If so, Google will most likely understand. Google has also given us a number of tools to use to properly outline why the content is duplicated.

The Duplicate Content Penalty

A myth that needs to be cleared up is this: there is no “duplicate content penalty”. Although in extreme situations, websites with a ton of duplicate content (hundreds of pages) have been the victim of a manual action. If Google thinks your site is trying to hone in on ranking by posting myriad pages all about the same topic, with identical content, then the skies may open up and Google will smite thee.

If you have content on your website that is the same as other pages on the same site—a paragraph or two let’s say—then you needn’t worry. As long as there is a clear and logical reason for that content to exist, you’ll be fine.

That being said, Google provides us with the necessary tools to identify and categorize our content so there’s no question about why it exists. I’ve listed some examples of these tools below.

Where Canonicalization Fits In

What the blue hell is “canonicalization”? Well, let’s say you have a product that comes in various colours. Your website may have a different webpage for that product in each colour available, which would make for several pages with the same text (describing the product) with minor differences. To keep this organized, it makes sense to define one solitary page that you wish to show up in a Google search for that product. In this case, it is best to use a practice that no one can pronounce, called “canonicalization”. N.B. I’ve actually heard this term referred to as “chocolization”, and I didn’t have the heart to correct the client.

Canonicalization basically means picking a leader. Consider a clone army. You have a bunch of clones (webpages with the same content, only minor differences, like colour red versus blue) and you only want one to step up to the plate and represent. So, you’d canonicalize all those other peon clones towards the leader. The leader will get all the authority, and she’ll be the one putting herself out there in a search result for the world to see. You’ll need to go through the leader in order to get to all of the other clones. If Google has to choose between too many webpages as the authority, it will just get bored and go acquire another company.

Point being, canonicalization will help define for Google the reason for the duplication. You have a better shot of ranking with the one authoritative leader than with several identical pages all competing for the same keywords.

Canonicalization for Pagination

Let’s say you have an article on your website that spans multiple pages. There’s a tag you can implement to tell Google that you want the first page of the article to rank, not all of the subsequent pages.

The tag is used to signal to Google that there’s a logical sequence to the order of the content.

Users wouldn’t benefit from seeing all the internal pages of an article in a search result; why would you want to start reading an article on page 3? The pagination tag allows you to indicate to Google that you’re aware the article has multiple pages, but you only need page one (the leader) to show up in a search result.

301 Redirects?

We’re not going to get into explaining 301 redirects here, but I mention them simply to point out that canonicalization and redirects are two very different things.

Put simply, you use a redirect if you want to route users and search engines to a new location for the content (a change of address). You use canonicalization when you need all the pages to still be accessible by users. The canonical tag is meant to indicate to search engines that you’re aware the content is very similar across those pages, and you only want the “leader” to show up in a search result.

Key things to remember

We need to keep in mind that canonical tags are a recommendation to Google. Whether Google actually follows this recommendation or not is another story. In other words, I can recommend that a train not hit me if I’m standing in the middle of the tracks, but there’s no guarantee it will listen. In Google’s own words, these tags are used as “hints”, not absolute directives.

Is there a good reason for the duplication? If you’re asking whether something will be perceived as duplicate content, put yourself in the mind of the user. Would someone expect the content to be there? If there’s a good reason for the duplicate content to exist, then we should be in good shape. Where we can apply one of the above tags as a hint to Google, then let’s do that.

There is no duplicate content penalty. Google will not shoot your website in the face if it crawls duplicate content. Will SEO tools pick up on duplicate content as a potential negative ranking signal? Sure. But again, let’s think of why the content is duplicated in the first place. If there’s a perfectly good reason for it to be there, then we should be good.

Unique, original content is always best. There are some cases where you may feel compelled to duplicate content. I’ve seen some clients copy and paste text from a source and use it on their own site. I would not ever recommend you do this. Best case scenario is to write your own version of that content so it’s original. Cite the original source of the content with a genuine link. That’s far more natural than simply copying the content.

Thanks for reading! Let me know if you have any questions.

More quality reading about canonicalization

More about duplicate content:


More about chocolization