How do you check for duplicate content?

Posted by admin on September, 14th 2011

One of the first rules you learn when you start doing Search Engine Optimisation (SEO) is that search engines hate duplicate content. Firstly, duplicate content doesn’t add value to web users as several websites popping up with the same content is irritating for users who are trying to do research. Secondly, search engines then have to decide which website posted the content first, which has the most authority etc and then decide which reigns supreme over the others.

This is why Google implemented their Panda update – to encourage people to stop posting copied or similar (spun) content across the web which added no value.

The Panda update affected mostly article websites, but if you post a lot of content on your website, you may want to check that you aren’t harboring duplicate content. One trick to check for duplicate content is to compare the number of pages of your website that Google has indexed to the number of pages you crawl on the site itself.

To do this, you can use the “site:domain.com” function of Google to see how many pages Google has indexed. The program Xenu for Windows (http://home.snafu.de/tilman/xenulink.html) and Integrity for Mac (http://peacockmedia.co.uk/integrity/) are free programs which crawl an entire site and essentially create a sitemap, telling you how many pages are indexed and the page structure of the website.

If Google has ignored many of your pages then it is likely that there is a duplicate content issue. This might become obvious when looking at the the list of crawled pages that you have generated on Xenu or Integrity or you may have to use a service such as Copyscape to check for duplicate content on a page of your website: http://copyscape.com/

If a lot of your pages are missing, what happens next?

If parameters/session id issues are not to blame for the pages Google hasn’t include in it’s index then the issue may lie with copied or shallow content.

Technical issues that may be holding back the number of indexed pages in Google could be problems such as the robots.txt file, mis-used iframes or canocial tag issues. If this is the case then your SEO company needs to liaise with your web developer to rectify this as soon as possible as the negative impact upon SEO can be catastrophic. Check out our blog post of canonical tags to find out more.

Hopefully, once all pages on your site are unique enough to make it into Google’s index, you can concentrate on other aspect of SEO which will help you on your way to conquering your SEO goals.

To find out how SEO can help your business, contact David Wiltshire on 0845 544 1765.

Google Sitemaps – What To Look Out For

When looking at your Webmaster Tools which features do you normally look at? The lazy SEO’er will just click through the links on the left hand side of the page, hoping that nothing stands out as odd or wrong. If you’re new to Webmaster Tools or just find it a little daunting, one thing you [...]

Check your broken links for search spiders

Every search engine (apart from search directories) uses programs called “spiders” to crawl whole websites and grab all the relevant data they can from each page. The spider will enter the homepage of a website and then follow every link it comes across in the code after saving the content.  It will then crawl each [...]

How to upgrade an ecommerce website with little impact on SEO

Upgrading an ecommerce website to a completely new system can have a big impact on SEO whilst the search engines ‘catch up’ with the new information. Simply turning the old site off and switching the new site on will generally destroy a majority of organic traffic for a period of between 3 – 5 weeks [...]

Common Canonical Problems

Lots of websites have different “copied” versions of their homepage which can hinder your SEO efforts.  Whether it is part of your internet marketing strategy or a mistake made within your website’s URL structure, multiple homepage URLs can water down your SEO campaign and have other potentially harmful effects. Examples of this are: – www.domain.com/ [...]

How to Ruin Your SEO Efforts – Copied Content

Let’s face it, writing content for websites for most is a labourious & boring task! If your employer sold for example cardboard boxes or reems of A4 paper your blog/news-page could be very, very boring and the temptation to cut corners would be massive. Many people cut corners when it comes to SEO work, they [...]

Labels: , ,

Posted on 11:58 PM by Rome | 0 Comments