Is ‘Duplicate Content’ Hurting Your SEO?

"Diversity is good. Pass it down."

If you’re not familiar with the concept of duplicate content, it may at first sound like something that doesn’t apply to you, your first thought being either “I don’t have any duplicate content on my site” or “Why would duplicate content affect my site’s SEO anyway?”

You may not have written any duplicate content, but that doesn’t mean you don’t have any duplicate content – at least not as far as Google sees it – and it most certainly can affect your SEO. In my experience, most sites have some form of duplicate content on the web – be it on their own site or some other sites.

That said, not all duplicate content is a problem. It’s completely reasonable and perfectly acceptable (to search engines) for certain forms of duplicate content to exist. But in most cases, duplicate content can be a real problem for SEO.

As you read this article, you may be surprised to learn about some of the common situations that constitute duplicate content that impact SEO that you’ve never thought about before – and you may soon realize that your own site’s SEO may be suffering due to duplicate content issues that you were previously completely unaware of.

Most duplicate content issues stem from technical factors, though some are purely strategic in nature, and others involve a combination of the two.

The good news is that most duplicate content issues are easy to resolve, and can often yield quick, significant and long lasting SEO gains. We’re talking seriously low-hanging fruit here that you simply can’t ignore.

In this post, I’m going to cover the following:

Ready? Here we go…

What is Duplicate Content?

If identical or highly similar content is accessible from more than one webpage address (i.e. from more than one URL), then search engines will likely consider these pages to be duplicate content.

As you’ll soon discover, one of the surprising characteristics about duplicate content is what can constitute “more than one URL”.

Regardless, duplicate content is a common problem for many websites – in most cases unbeknownst to the site manager or web team – and is a common cause of ranking issues.

What Are The Issues With Duplicate Content?

Of the kinds of duplicate content that can affect SEO, there are three main issues:

  1. Duplicate content may hurt rankings because it divides a page’s authority – a major ranking factor – between all versions of the page. Authority is achieved primarily by other sites that link to yours, as well as (to a far lesser degree) by internal links that you have within your own site.

    So, when other sites (or you yourself) link to multiple versions of a page, each page’s authority is diluted compared to if all the links and authority were consolidated into just one single page.

    Google doesn’t always know whether to attribute all or some of the overall authority to just one page (and if so, to which page), or to keep it split up between multiple versions of a page. Most often their tendency is to split up the authority – at least to some degree – so some authority ends up being needlessly wasted.
  2. Google may not know which version(s) of a page to include in their index, and which one(s) to exclude. Often they’ll either exclude the ‘wrong’ page (i.e. the one you want included), or not include any of them at all.
  3. Google’s primary objective is to provide the most relevant results possible for each search query, and in its efforts to do so, it rarely shows duplicate pages.

    So, when it finds multiple versions of identical (or near identical) content on the web, it may have a difficult time deciding which version is the most relevant to a given search query. Usually it will pick just one of the duplicate pages, but not necessarily the one you want to rank for.

All of the above outlined factors are seen by Google as creating a less than desirable user experience, as well as creating confusion for Google itself.

At the end of the day, duplicate content usually results in pages being excluded from Google’s index, and/or rankings that are significantly lower than if the duplicate content didn’t exist.

13 Common Types of Duplicate Content

There are several types of duplicate content. Most will be damaging (or potentially damaging) to your SEO, though some won’t. The list of duplicate content types below is not exhaustive, but it does include the most common scenarios.

Note that this section will discuss the common types of duplicate content only; the solutions to these duplicate content issues will be addressed in the two subsequent sections.

1. www and non-www duplicates

This duplicate content problem exists when your server or CMS is configured to display your site’s URL in a web browser both with and without the www. In other words, the same content is available from two different URLs:

  • domain.com
    and…
  • www.domain.com

To check if this is happening with your own site, enter your domain in your browser without the www (i.e. your root domain). After the page loads, check if the URL displays the www or not.

Now enter your domain with the www (technically a subdomain), and once again, after the page loads, check if the URL displays the www or not.

Is one version of your site forwarding to the other such that whichever version you enter, the resulting URL that shows in the browser is the same? If this is the case then you don’t have this duplicate content issue (though this doesn’t necessarily mean that the forwarded version of your site is being forwarded in the most SEO friendly way, which we’ll get to later in this post).

However, if both the www URL and non-www URL display in your browser after each is entered and the page loads (and again assuming that the content is same), then you do have this duplicate content problem.

You may be quite surprised to hear that it makes a difference if your site’s URL can be displayed both with and without the www, but the reality is that to a certain (and significant) degree, Google treats domain.com and www.domain.com as separate sites.

2. domain.com and domain.com/some-file-or-folder duplicates

This duplicate content problem exists when your server or CMS is configured to display your site’s content at multiple URLs. For example, some servers/CMSs will by default display a site as something like one of the following:

  • domain.com/default.aspx
  • domain.com/home/
  • domain.com/index.php

… but the same content may also be available at domain.com.

If this is the case, then you have duplicate content.

3. domain-one.com and domain-two.com duplicates

This duplicate content problem exists when you have two or more different domains that display the same site content. For example:

  • domain-one.com
    and…
  • domain-two.com

This is a common scenario that typically occurs when someone uses one domain for their main site, but has also registered another domain name (either for possible future use or to prevent a competitor from using it) and has essentially set up this second domain in the meantime as an ‘alias’ that displays the same content as the main domain.

4. Content Archive duplicates

This duplicate content problem may exist based on how you go about setting up your archives in your CMS. WordPress is an example of a popular CMS where you may encounter this problem.

For example, many WordPress sites have a blog that includes an ‘archive’ – a list of blog posts along with an excerpt from each full post – set up at a URL like this:

  • domain.com/blog/

But WordPress also enables its users to organize their blog posts by ‘categories’, ‘tags’, ‘date’, and ‘authors’. Each of these additional archives (if used) has its own URL, like these:

  • domain.com/category/category-name/
  • domain.com/tag/tag-name/
  • domain.com/year/month/
  • domain.com/author/author-name/

The problem comes into play when two or more of these archives contain either identical or largely identical content. It’s a very common scenario that harms the SEO of many sites (though WordPress plugins can make these issues relatively easy to resolve), yet many site managers are not aware of the issue.

5. Uppercase / Lowercase URL duplicates

This duplicate content problem exists when your server or CMS is configured to show the same content in uppercase, lowercase, or mixed case versions of your URL.

For example, if a page of content can be displayed at any two or more of the following URLs…

  • domain.com/pagename
  • domain.com/PAGENAME
  • domain.com/PageName

… then you may run into a duplicate content issue.

6. URL Parameter duplicates

This duplicate content problem exists when you have one or more parameters in your URL, and the content that is displayed by the URL containing the parameter is identical (or nearly identical) to the URL that doesn’t contain the parameter.

If you’re not familiar with URL parameters, here’s the skinny:

URL parameters are used for a variety of practical purposes, including:

  • to track what a user has added to his or her shopping cart (typically by appending a “session ID” to the URL)
  • to track clicks and pages visited in order to gather analytics data
  • to sort or filter content on a page e.g. to display a calendar by day or week

You can spot a URL parameter by looking for a ‘?’ in the URL, followed by the parameter itself and its value. A few examples:

domain.com/?sessid=107
domain.com/?utm_source=sitename
domain.com/?view=week

7. Printer Friendly duplicates

This duplicate content problem exists when you have a ‘printer friendly’ version of a page that has virtually the same content (though not necessarily the same layout or design) as the main page, but a different URL.

8. Trailing Slash duplicates

This duplicate content problem exists when your server or CMS is configured to display your site’s pages both with and without a trailing slash. For example, some servers/CMSs will by default display a page like this:

  • domain.com/page/

… but the same content may also be available at this URL:

  • domain.com/page

Once again, you have duplicate content.

9. http and https duplicates

This duplicate content issue here is the possibility of Google indexing both non-secure (http) and secure (https) versions of some of your site pages – or even the entire site.

Like is the case with most other forms of duplicate content, Google will often interpret http and https versions of the same page as being completely different URLs even though they contain the exact same content, leading to a potential duplicate content issue.

The way that this problem is commonly (and unwittingly) created is as follows:

If Google crawls an https (secure) page on your site, and that page contains a relative link (as opposed to an absolute link) to another page on your site, then Google will proceed to crawl that page – and any pages it in turn links to – via https, leading them to potentially index secure versions of some of your pages, or even your entire website!

Note: Here’s a simplified example of what an absolute link vs. a relative link looks like in a web page’s source code:

Absolute link:
<a href=”http://domain.com/about-us/”>About Us</a>

Relative link:
<a href=”/about-us”>About Us</a>

Note that the absolute link contains the full domain (domain.com) in the URL including the “http://” protocol, whereas the relative link does not.

All it takes is just one relative link to be innocently added on a secure page and Google could end up quickly indexing a duplicate https version of your entire site, which would be a potentially major duplicate content problem.

10. Mostly Identical, But Slightly Varied Page duplicates

This duplicate content problem usually exists when you knowingly have two or more pages with mostly identical, but slightly varied content.

Why would a site knowingly do this? There are several possible reasons, but often it’s because they want to create separate pages, each of which targets a different audience for their products or services. Each page would contain largely the same content, but would be ‘tweaked’ to address the service differences, needs and interests of each audience.

For example, a travel site may offer its services to residents of different countries, but the services may vary slightly by country. So, each page’s content may be 80-90% the same, the only differences being the slight variations applicable to each country’s residents.

This is a common challenge for many sites, as you need to assess the tradeoff between audience targeting opportunities on the one hand and duplicate content considerations on the other – not to mention many additional ways of handling this kind of scenario that aren’t addressed here (which could be an entire post unto itself).

11. Title Tag Duplicates

This duplicate content problem exists when you have two or more pages with the same title tag. It is a best practice to have a unique title tag for each page, specifically written to address that page’s unique content.

However, it is common to see sites with duplicate title tags. Sometimes the website manager makes the mistake of wanting to heavily target a certain keyword across multiple pages, coupled with a little laziness setting in – so they don’t bother to customize the title tag for each page.

Other times people just don’t know about title tags and related best practices so they have no awareness that duplicate title tags may be problem.

Given that title tags are still the most important on-page ranking factor in SEO, it’s important that they are not duplicated across your site.

12. Social Media duplicates

This duplicate content situation exists when your website’s content is either fully – or most commonly, partially – replicated across one or more social sites. This is a common scenario when people post an excerpt of their own site’s blog posts to social media sites like Twitter, Google+, Facebook, LinkedIn, and others.

In most cases, this form of duplicate content is not a problem for SEO for several reasons:

  1. Most social networks nofollow the links you add to their sites that point back to your own website, which means that Google won’t ‘follow’ the link back to your site, so it essentially ignores that it’s duplicate content.
  2. Most users post only excerpts of their blog posts to social media sites – not the entire post (though some social media sites have such liberal per-post character limits that you could in most cases add the complete post if you wanted to).
  3. Google and other search engines do a very good job these days of understanding that duplicate social content is reasonable and to be expected, and they typically won’t lower a website’s rankings for such duplication.

So, in short, you generally don’t have to worry about sharing your content on social media. In fact, the benefits can potentially be plentiful as others may in turn share your content with their followers which may yield a new source of well-targeted traffic for your site.

13. Guest Posting and Syndicated Content duplicates

This duplicate content problem exists when some of your site content appears on other websites. Typical examples of this situation include:

  • a guest post that you’ve written for another site’s blog but that you also include on your own site
  • a press release that is covered by one or more press release sites but that you also include on your own site

While to a certain degree these types of duplicate content seem perfectly reasonable and should be understood and accepted by search engines without issue, it appears that this may not always the case.

For various reasons, sometimes Google and other search engines get it wrong and your site could take a rankings hit in these situations, sometimes (arguably) undeservedly.

Some credible sites have reported that when their content has been syndicated on much larger, higher authority sites, Google has inadvertently treated the higher authority site as the original content creator, and the actual originator as the duplicate, despite the fact that the true content creator may have even published their content prior to the higher authority site.

8 Solutions to Common Duplicate Content Problems

There are several solutions for effectively handling most duplicate content issues. In this section I’ll outline what these solutions are in a general sense, and how they work.

In the next section I’ll cover how to select the right solution for each specific type of duplicate content problem.

1. 301 Redirects

A 301 redirect (also referred to as permanent redirect) will redirect users and search engines from one page to another, and will pass along 90-99% of the redirected page’s SEO value (sometimes referred to as ‘link juice’ or ‘authority’).

A 301 redirect will often be the best solution for many of the different types of duplicate content issues outlined in this post, as it offers all of the following benefits:

  1. It eliminates duplicate content.
  2. It consolidates the combined authority of the duplicate pages into just one page.
  3. It offers a seamless experience to users and is supported by Google and all major search engines.

But how do you decide which one (or more) of your duplicate pages to redirect and which one is the ‘best’ page to keep accessible to your visitors and the search engines? Generally speaking, you want to consider the following factors:

  • Which page has the highest page authority? (See this section of this post I wrote for how to measure a page’s authority). You generally want to redirect to the page with the highest authority.
  • Is the page with the highest authority already in Google’s index? Ideally you want to redirect to a page that is already indexed.
  • Which page makes the most sense to redirect to from a user’s perspective?

You’ll often find that there’s a consistency when considering the above factors, so your decision of which page to link to will be easy. Sometimes, however, the choice will not be so clear.

For example, what if the page with the highest authority is not currently indexed but the lower authority page is? Or what of the page that is best for your visitors has the lower authority or is not indexed?

Therese are situations where you’ll either need to do a little research to get a better understanding of how to handle such scenarios, or ask a qualified SEO specialist for help.

Whatever decisions you ultimately make about which page to 301 redirect from and to, you’ll also want to edit any internal links that are currently pointing to the URL that is being redirected so that they instead point to the URL you’re redirecting to.

This will ensure that 100% of the link juice from internal links gets passed to the preferred version of the page (otherwise you could potentially lose up to 10% of the internal link value that gets passed).

Implementation

As for how to implement a 301 redirect, it will depend on your server type and/or CMS. Some servers and/or CMSs make it super easy; you just check a box beside the page you want to 301 redirect and enter the URL in a provided field for where you want to redirect to. No code required!

In other cases, you’ll need to add a little code to a server configuration file. But don’t worry — it’s usually quick and easy to implement and something that your web host, IT guys, or SEO specialist can help with.

One cautionary note: Just make sure that you do a 301 redirect and not a 302 redirect — a mistake I come across from time to time. A 302 redirect is a temporary redirect. Humans and search engines will still be redirected, but a 302 will not pass along any link juice, so you’ll just be needlessly wasting some inherent SEO value.

So make sure to use a 301!

2. Rel=Canonical Tag

When you have duplicate pages on your site, you can use the ‘rel=canonical’ tag to tell search engines which page you want to appear in search results (usually the ‘original’ or ‘best’ page for your users) and which page(s) is the duplicate.

A simple example:

You have (for whatever reason) two pages of identical content, page A and page B. You want page B to appear in search results. To avoid duplicate content issues that could negatively impact rankings, you would add the rel=canonical tag to the head of page A, like this:

<link rel=”canonical” href=”http://domain.com/page-B”/>

When Google crawls page A, they’ll see the canonical tag, which essentially says “Hey Google, this page (page A) is a duplicate of page B. Page B is the canonical page so please show that page is search results, and please pass along any SEO value that exists in page A to page B.”

So long as Google sees that your canonical tag is reasonable, they do a very good job of obeying the tag.

Note that the canonical tag can be used on a cross-domain basis too, which can be a great way to deal with syndicated duplicate content (we’ll address this further later in this post).

Rel=Canonical vs. 301

Many digital marketers – some SEO folks included – have become mistakenly under the impression that the rel=canonical tag and a 301 redirect perform the same function and that they can be used interchangeably. But that’s incorrect, and using them in the same way can be a mistake.

Here’s the big difference between the two:

A 301 redirect redirects all traffic (humans and search engine bots) from page A to page B, so page A will no longer be accessible to either humans or search engines. It’s as if page A no longer exists.

The canonical tag, on the other hand, is just for search engines, meaning your visitors and Google can still visit page A (and page B as well of course).

Both solutions are about equal in terms of passing link juice from the duplicate URL to the canonical / accessible URL.

So which solution should you use – rel=canonical or 301 redirect? If you have no need for the duplicate page to be available to your visitors, then use a 301 redirect. If you want both pages to be available to your visitors, then use a rel=canonical.

Implementation

Some CMSs come with a field or offer plugins that enable you to add the rel=canonical tag without code. Otherwise, you can just add the code above to the head of your page’s html – and don’t forget to edit the URL. 😉

3. NoIndex,Follow Tag

The “noindex,follow” directive can be used to keep specific pages out of search engine indices, while allowing the bots to crawl the links on the page which enables link juice to flow throughout the site. You add the code to the head of a page, like this:

<meta name=”robots” content=”noindex,follow”/>

NoIndex,Follow vs. Rel=Canonical

The noindex,follow tag should only be used when you have a page that you want to remain accessible to your visitors but also want to ensure that it is kept out of the search engine indices, whether that page is duplicate content or not.

If you do have pages of duplicate content and don’t mind if both pages show up in search engines, then you should use the rel=canonical tag, not the noindex,follow tag. While it’s unlikely that a page containing a canonical tag would show up in Google’s search results, there is a chance that it can happen — more of a chance than if using a noindex tag, which Google tends to honor virtually all the time.

One common (but understandable) mistake is to include both a rel=canonical tag and a noindex tag on the same page. Google has confirmed that this should not be done, as it may be seen by them as sending conflicting signals (and I agree that it sends conflicting signals, but for completely different reasons than Google provides… but that’s a discussion for another day).

One other point of interest (somewhat advanced – don’t worry if you don’t fully understand it) is the difference between the two solutions with respect to how link juice flows. With rel=canonical, the link juice of the duplicate URL is consolidated with the canonical URL only.

With a URL that has the noindex,follow tag, the link juice flows through the links on the page to all URLs that these links point to, so the link juice will usually be more widely distributed, with a number of pages each receiving a small amount of link juice (as opposed to it all passing to just one page as is the case with rel=canonical).

Implementation

Once again, some CMSs offer a code-free option for adding a noindex,follow tag. But if yours doesn’t, you can simply add the code above to the head of your page’s html.

4. Preferred Domain Setting in Google Webmaster Tools

In the “Site Settings” area of Google Webmaster Tools (click Settings icon at top right), you can choose whether you want your site displayed in search results with or without the ‘www’ in front of the domain e.g. domain.com or www.domain.com. While this setting (referred to as the “Preferred domain”) doesn’t necessarily eliminate this type of duplicate content, it can certainly help to minimize it.

A common mistake that site owners make is to verify only one version of their domain, being either the www or non-www version. To clarify your preferred domain to Google, it is recommended that you verify both versions of your domain – the www one and the non-www one.

Then, be sure to set your preferred domain in both versions of your site (with the same setting in each case, of course).

5. Parameter Handling Tool in Google Webmaster Tools

As the name suggests, this tool can be used specifically to help manage duplicate content issues that may result from URL parameters. It can be found in the ”Crawl” section of Google Webmaster Tools in the left-hand menu.

This topic is more involved than the others, and rather than me getting into the nitty-gritty here, I suggest you have a look at this video from Google which walks you through how to use the tool much better than I could.

Note that the first 5 minutes provides some introductory info. The walkthrough on how to actually use the tool starts at 4:58.

6. Absolute URLs, Not Relative URLs

The issue of using relative URLs instead of absolute URLs was already discussed above in the context of how http and https duplicates can be created. But using relative URLs can also create duplicate content with regular http pages.

For example, if your site is accessible both with and without the www and you use relative URLs, then your browser will display URLs based on the way you first enter the site. So, if you enter domain.com, then as you navigate the site, all URLs will be non-www. Similarly, if you enter www.domain.com, then all URLs will contain www as you navigate.

Your visitors will have this same experience. As your page gets some awareness on the web via other sites’ posts, social media, and other online channels, other sites may link to your page – some with the www and some without – thereby splitting your page’s authority and creating a duplicate content issue with the search engines.

So why do some sites use relative URLs? One of three reasons usually. Either:

  1. Naivety – they just don’t know any better.
  2. Their CMS uses relative URLs (which they shouldn’t, but some still do).
  3. They’re web designers. Web designers (many, but not all of course) are notorious for using relative URLs because it makes the process of moving a website to a new domain much easier for them as they won’t have to edit any URL links throughout the site — but the client will end of paying the price in duplicate content.

    Ironically, the reality is that editing internal site links on a mass basis is not too difficult in most cases, as you can simply use a find & replace tool to do the job. Granted, on larger sites with lots of external links it will be more work, as you’ll then have to go back and manually edit the domain of each external URL link.

Regardless of the reason for using relative URLs, the solution to all these problems is simple: don’t use relative URLs. Use absolute URLs only. And if your CMS uses relative URLs, perhaps it’s time to switch to a new CMS that doesn’t.

7. Duplicate Title Tag Tool in Google Webmaster Tools

The Duplicate Title Tag tool in Webmaster Tools does exactly as you would expect: displays pages that have duplicate title tags, saving you the time of having to find these duplicates manually. Now you can focus your time on making any necessary edits so that each page has a unique title tag.

For a more thorough and feature-rich tool that will display (among many other things) all pages with duplicate title tags, check out Screaming Frog SEO Spider.

The free version limits you to crawling 500 URLs at a time and doesn’t have all the features and support that the paid version offers, but it may meet your needs just fine. Either way, give the free version a try first. User guide can be found here.

8. Consolidate or Reorganize Content

Sometimes with a little planning and creativity you can effectively consolidate or otherwise reorganize your content to avoid or minimize duplicate pages. But do what’s best for your audience first and foremost; duplicate content can be dealt with.

Selecting The Right Solution For Your Duplicate Content Problem

The table below provides a summary of each type of duplicate content issue and the most likely solution(s) for each. In some cases the most appropriate solution for a given scenario is subject to legitimate debate. In other cases, it’s not. 😉

Please also see the general notes below the table.

www and non-www duplicates
Most Likely Solution
301 Redirect + Preferred Domain Setting in Google Webmaster Tools + possibly Absolute URLs, Not Relative URLs
Notes
You should choose a preferred domain in Webmaster Tools, and you need to set a 301 redirect from your non-preferred domain to your preferred domain (most important). If you're using relative URLs, then you also need to change them to absolute URLs.

When deciding which version (www or non-www) will be your preferred domain, consider the following:

- Which version of your site has higher Domain Authority?
- Which version of your homepage has higher Page Authority?
- Which version of your website pages are already indexed the most in Google?

Be sure to also edit any internal links that are currently pointing to the URL that is being redirected so that they instead point to the URL you’re redirecting to, and be consistent with all URLs going forward i.e. either all www or all non-www - whichever version you decide on.
domain.com and domain.com/some-file-or-folder duplicates
Most Likely Solution
301 Redirect + possibly Absolute URLs, Not Relative URLs
Notes
You need to do a 301 redirect from domain.com/something to domain.com (or from www.domain.com/something to www.domain.com, whichever is your preferred domain). If the duplicates are being caused by relative URLs, then you need to change them to absolute URLs as well.

Also edit any internal links accordingly.
domain-one.com and domain-two.com duplicates
Most Likely Solution
301 Redirect or NoIndex,Follow
Notes
If the secondary domain has any domain authority (i.e. greater than 1) and/or links pointing to it from other sites and/or is already indexed in Google, then 301 redirect the secondary domain to the primary domain.

Otherwise, either noindex,follow the secondary domain or just remove the duplicate content and 'park' the domain (though neither of these options should be implemented if you believe that a significant number of people may have bookmarked the secondary domain).
Content Archive duplicates
Most Likely Solution
301 Redirect or Rel=Canonical Tag
Notes
If you have reason for any of the duplicate archives to be made available to your visitors, then add the rel=canonical tag to those archives. Otherwise, 301 redirect the duplicate archives to the archive you want to dispay.
Uppercase / Lowercase URL duplicates
Most Likely Solution
301 Redirect
Notes
In most cases you want to do a 301 redirect from any uppercase and mixed case URLs to lowercase URLs.

However, if most of your URLs (particularly the most important / highest authority URLs) are already in uppercase or mixed case and have significantly higher page authority than the lowercase URLs - and more of these pages are already indexed - then you may be better off just leaving the uppercase/mixed case ones alone. Ask an SEO specialist for help if needed.

Either way, use only lowercase URLs going forward.
URL Parameter duplicates
Most Likely Solution
Parameter Handling Tool in Google Webmaster Tools
Notes
This is generally the best solution.
Printer Friendly duplicates
Most Likely Solution
Rel=Canonical Tag or NoIndex,Follow
Notes
In most cases, the best solution is to implement the rel=canonical tag on the printer friendly page that identifies the regular html page as the canonical. You should also be sure to have a link to the printer friendly page from the canonical page.

Alternatively, you could use a noindex,follow tag on the printer friendly page, but the canonical tag would generally be the preferred option.
Trailing Slash duplicates
Most Likely Solution
301 Redirect
Notes
Check which version of your pages (i.e. those with or without the trailing slash) - particularly the most important / highest authority URLs - have more pages indexed and have higher page authority. You'll generally want to set the 301 redirect from whichever version has the least amount of URLs indexed and lower overall authority to the version with the highest number of URLs indexed and highest overall authority.

In some cases (though less common), depending on the CMS, it may even make sense to not implement a sitewide 301 rule (e.g. not 301ing all non-trailing slash URLs to trailing slash URLs or vice versa), but rather to redirect on a page-by-page basis based on index status and existing page authority, regardless of whether the page has a trailing slash or not. Ask an SEO specialist for help if needed.

Regardless, also edit any internal links accordingly.
http and https duplicates
Most Likely Solution
301 Redirect + Absolute URLs, Not Relative URLs
Notes
This scenario is almost always the result of the existence of a secure (https) page that contains a relative link to another page on your site. The solution to this problem can vary greatly based on a number of factors that are beyond the scope of this post. Speak with an SEO specialist.
Mostly Identical, But Slightly Varied Page duplicates
Most Likely Solution
Rel=Canonical Tag or NoIndex,Follow
Notes
First, see if you can effectively consolidate your content into one page or otherwise reorganize your content to avoid or minimize the duplicate pages - but do what's best for your audience first and foremost.

Assuming you still have multiple pages with slight variations (as described above in this post for example), then In most cases the best solution is to implement the rel=canonical tag on the duplicates that identifies the overall most important page as the canonical. You should also be sure to have a link to each duplicate page from the canonical page.

Alternatively, you could use a noindex,follow tag on the duplicate pages, but the canonical tag would generally be the preferred option.
Title Tag duplicates
Most Likely Solution
Duplicate Title Tag Tool in Google Webmaster Tools (or Screaming Frog)
Notes
Check for duplicate title tags and edit as necessary so that each page has a unique title tag based on best practices.

Screaming Frog's list of duplicates may be more thorough than Google's.
Social Media duplicates
Most Likely Solution
Generally not a concern - no action necessary.
Notes
NA
Guest Posting and Syndicated Content duplicates
Most Likely Solution
Rel=Canonical or NoIndex,Follow or Direct Attribution Link
Notes
Solutions below are in order of best to worst, based on what you can get the publisher of your content to agree to (but even the worst solution is still well worthwhile):

1) Get publisher of your content to add a rel=canonical tag to their page, pointing to the URL of your original page as the canonical.

2) Get publisher of your content to add a noindex,follow tag to their page.

3) Get publisher of your content to include on their page a direct link to your original page, citing that you are the original author. Anyone republishing your content who has an ounce of decency and an acceptance of standard practices should be willing to do this in the very least.

A couple of general notes:

  • Every website should be verified with Google Webmaster Tools and have a preferred domain set.
  • If possible (and reasonably practical), every website should use absolute URLs only, not relative URLs.

Summing Up

This was another long post so nice going if you made it through (the next couple will be much shorter – promise!).

Hopefully you now have a good understanding of the potential impact duplicate content can have with respect to search rankings, as well as how to identify duplicate content problems and how to fix them.

In short, URLs containing duplicate content should be canonicalized; that is, the content should be made accessible from only one URL (at least as far as the search engines are concerned) in order to consolidate the URLs’ authority – and should be accomplished via search engine friendly means.

Does your site have any duplicate content? Have a look, and use the table above as a guide. Remember, most duplicate content issues can be easily resolved, and doing so can often result in quick, meaningful and long lasting SEO gains.

So give it a go — it’s almost always a worthwhile effort.

Thoughts? Questions? Please share them in the comments below.


Michael Gordon

I’m an SEO consultant & trainer based in the Toronto area. I provide customized SEO services tailored to your business's goals, needs and budget.

SEO isn't just a full-time career for me, it's an obsession. The only thing I’m even more obsessive about is the level of service that I provide to my clients. Their happiness means everything to me.

Thankfully, my proven, no-nonsense approach to SEO gets results.

Subscribe To My Newsletter

Get notified by email whenever I publish a new blog post, and receive (occasional) exclusive subscriber-only content.

No spam. Your email will never be shared. Cancel anytime.

4 Comments

  1. Great job on this guide to duplicated content! After going through all of these tips I can only add that apart from the examples of duplicated content that you give, I also bump into pages that either automatically copy the content of different sites or other sites that don’t give credit to the copied content they use. Once I found out who copies my content (thanks to this), I managed to get rid of it and now my website is feeling better 🙂 Cheers

    Reply
    • Thanks Andy. Great points regarding plagiarized content and content that does not give proper (or any) attribution to the original content creator.

      Regarding plagiarism, I had actually considered including a section on content scraping in this post but didn’t because it could be a whole post on its own (and my post was already so long!). However, your comment got me re-thinking this, so I’m either going to write a separate post on the topic, or (more likely) edit the current post with some brief info and perhaps a link to a good resource or two.

      Regarding your point on not giving credit to the content creator, it’s in some ways similar to point #13 of this post on syndicated content, but perhaps deserves its own section as there are other scenarios where attribution applies that are quite different from what I’ve described.

      Thanks again for bringing these points to my attention. I’ll try to get the post updated soon!

      Reply
  2. I received a error Duplicate content between the “domain/news” page and the “domain/author/admin” page. What do I do to fix this?
    Thanks,

    Reply
    • Sorry for the late reply van — Your comment was inadvertently marked as spam.

      Can you please provide more details so that I can try to assist you?

      Thanks.

      Reply

Leave a Reply