Dealing With Duplicate Content
Over the last 18 months we’ve forged a name for ourselves as one of the primary providers of bespoke infographics in the UK, but infographic design isn’t our only service. We also provide manual link-building and SEO consultancy to scores of other brands all over the world, providing a holistic, ethical approach to building businesses online.
In recent weeks we’ve seen enquiries for our consultancy service rise dramatically, as many businesses attempt to recover from the effect of Google’s recent updates. Considering the ‘shortcut’ tactics employed by many other SEO providers (don’t expect any examples here, we’re not the name-and-shame type), we weren’t necessarily surprised by the surge in ‘please help’ messages, but I have been slightly surprised by the number of people asking a variation of the following question: “How do we deal with duplicate content?”.
Considering this is a reasonably basic element of technical, on-site SEO, it’s not something we’ve ever given much attention to in our SEO blog posts, but it’s obviously still a problem for many websites. So, to try and help rectify that situation, here’s a quick guide to dealing with duplicate content.
It’s worth noting here that we’re talking about duplicate content on your own site, rather than content that has been copied and pasted from another source, or scraped from your website and placed on another domain.
The Problem With Duplicate Content
If your content exists on more than one URL (for example on both the www. and non www. versions), then you’re forcing search engines to make a choice on which is the original source (or the best source) of that content. Search engines won’t rank more than one version of the same page, and generally it’s far better for you to dictate which version they are indexing, which you can only do by controlling your duplicate content via one of a range of methods.
It can also have a negative effect on your SEO, as search engines won’t know which version of the page should receive the trust and authority gained from inbound link metrics – in some cases they may even split the metrics between the duplicate pages, which is far from ideal. This can be made worse if your website attracts genuine natural links, as you may find people link to a duplicate version of your content, rather than the page you’re trying to rank.
Reasons For Duplicate Content
There are a variety of reasons why duplicate content might exist on a website, from non-standardised URL’s to unique session ID’s. For example, you may have a non-standardised version of your website, resulting in the following:
Both of these URL’s contain the same text and images, creating an issue with duplicate content. This is the most simple (and common) version of duplicate content we see, but you can also find that session ID’s (generated and assigned to each user when they visit the site) can be included in a URL, giving you numerous different versions of a single page. Other issues include erroneous capitalisation in the URL, printer-friendly versions of a webpage and additional URL parameters (such as click tracking codes and the like).
Dealing With Duplicate Content
So how can you deal with the problem? Luckily, it’s not too difficult to sort this issue out, and there is more than one way of approaching it. All the fixes involve canonicalising the URL’s (if you’re not sure what that term means, there’s a great guide to canonicalisation on the SEOMoz website), which will prevent search engines from indexing the wrong page. Let’s have a look at the best approaches to dealing with duplicate content:
Preferred domain in GWT
The easiest way to deal with non-standardised versions of your homepage (www. vs non www.) is to set a preferred domain in Google Webmaster Tools. This will tell Google which version of the domain you’d like to be indexed and displayed in SERPS, although it’s worth noting that this won’t affect your rankings in Bing, Yahoo or any other search engines. If you’re looking for solutions for all search engines, then you can use one of the approaches below.
Generally speaking, this is my preferred solution for fixing duplicate content issues, as it’s nice and clean and easy to implement. Simply set up a permanent 301 redirect from the duplicate page(s) to the original, and you’ll not only solve your duplicate content issues but you’ll create a stronger page which will have a positive impact on your rankings and indexation.
Another option for preventing search engines from indexing the wrong pages is to add the ‘noindex’ meta robots tag, which will prevent search engines from including that page in search results.
This is a very similar option to the 301 redirect, but works in a slightly different way. By adding the rel=”canonical” tag to the HTML head of your duplicate pages, you can tell search engines that the page is a duplicate of another page, and that all content metrics, links and indexation should be given to the specified URL.
For example, if you have http://www.website.com/pagea and http://www.website.com/page-a (both containing the same content), but you only want /pagea to be ranked, you can add the following code to the HTML head of /page-a:
<link href=”http://www.website.com/pagea” rel=”canonical” />
That’s about it for our brief guide to dealing with duplicate content – if you have any questions or there’s anything you don’t understand, just let us know in the comments section and a member of our SEO team will be more than happy to help you out.