How To Deal With Duplicate Content

How To Deal With Duplicate Content

After sifting through the mountain of information regarding this topic has put a post together that is easy to digest.

The basic definition of duplicated content is content that is present on more then one URL. The two main issues are; duplicated content on different pages within the same website and content stolen from one website to be duplicated on another (plagiarism).

The problem with duplicated content is that it confuses the hell out Google crawlers. The Google spider sees the identical content in different places and has trouble deciding which version is the most relevant to the search.

The places where duplicated content is most commonly found are listed below.

Ecommerce. It is common for manufacturers, producers and publishers to use the same product descriptions on different websites.
Pages used for print-friendly content.
Some companies use subdomains and domains for different countries. The content from the master domain is used in each subdomain and country top level domain.
It is possible for search engines to crawl and index the same page when multiple URLs are created by session IDs.
When articles are shared by authors to be published in other places. The authors think that a link to the original source is good for them but actually it can often have negative effects.
Mirror sites.
Syndicated RSS feeds can be duplicated through a server side script.
When pages are represented by different URLs, this is called ‘canonicalization‘.

An issue which is largely overlooked is duplicated content on different pages with the same website. As a website grows and becomes more complex it is possible for the content to be duplicated and the webmaster to be oblivious to this fact. Two examples that come to mind are, pages created with copies of content made print-friendly and a common issue with blogs called pagination of content. For sure this can be problematic for your website in a two ways; firstly it makes it more difficult for search engines to crawl and index the content; secondly it can lower your PageRank, this happens when PageRank located via incoming links is diffused over pages that are not recognised as duplicates.

How to deal with duplicate content within your own website.

Have you checked that your site has duplicated content? Go over to site:query in Google and search for a segment of text taken from your site limiting it for results on your own website. If you get multiple results then you have duplication.
Once you have found (or not) duplicated content you need to decide which URL is the most suitable home for the content. Once you have decided on the home(s), be sure to use it/them in all possible locations inside your site and don’t forget your Sitemap file.
You can also apply the 301 response code to the URLs with duplicated content, this will redirect users to the preferred URL.
If you can not use 301 redirects then the rel=”canonical” link instead.
Have you thought about disallowing crawling using robots.txt or meta “noindex”

There is growing concern amongst the online community with the amount of plagiarism taking place. Plagiarism is basically stealing and is defined as so in the dictionary. According to all of the following are considered plagiarism.

saying that someone else’s work is your own
copying words or ideas without giving credit
failing to put quotation marks around a quote
failing to give the correct information for the source of a quote
copying sentence structure
imitating the overall feel of someone else’s work

To make sure you are not committing plagiarism or that somebody is not plagiarising you, use the following sites.

If somebody is copying you then it is unlikely that big G will index them first. They have sophisticated means to check trust, authority, inbound links and contextual links to make sure they know who created the original content.

If you think somebody is plagiarising your work then there are certain actions you can take. The most common is to file a DMCA takedown request.

This wouldn’t be an SEO blog if we didn’t repeat our ‘content is king‘ mantra rank. All sites should already be creating new, original and interesting content. Copywriting is not about re-working existing content it’s about creating new content with a dash of keyword or key phrase seasoning.

Plagiarism is causing big problems online and is stifling creativity, people are less inclined to work there hearts out if they feel there content can be easily stolen. Smaller sites are feeling that they are really losing out; even when they file DMCAs the copied content is moved to a place where it cannot be removed.