How To Find Duplicate Content On Your Site (Choice of 2 Tools)

Google doesn’t want to populate its SERP’s with the same content. They want to show their searchers 10 unique results per page. Otherwise, there wouldn’t really be a “choice” of results to click on 🙂

Every SEO knows that eliminating duplicate content on your site is critical to maximizing your rankings in Google. This is because the quality score of your pages will increase significantly (as well as the overall “health” of your site) when you start fixing these issues.
Whether it’s fixing duplicate title tags, meta tags, chunks of content or even an entire page, it’s gotta be done folks. Track your rankings and results when you fix these problems and you’ll see a big impact.

The most accurate way of doing this is to follow our “snippet search” method in our Avoid Panda PDF – but here’s 2 tools you can use to make your life a little easier, even if the results aren’t quite as accurate.

(Oh, a BIG thanks to Katie – who is responsible for ALL on-site audits at PosiRank – for writing up the guide further below on URLProfiler.com. Thank you!)

Method #1 – SiteLiner.com 

Pro’s: Nice interface; only need to type in the root domain for a full scan; includes information on “broken links” which is very useful.

Con’s: Harder to see all the data / duplicate content areas. Need to go through page-by-page.


Step 1: Go to SiteLiner.com and type in the domain that you want to analyze.
(I just picked on a random plumbing company)

SiteLiner.com 1

SiteLiner.com is busy scanning rotorooter.com and is starting to populate the data…

I just used a free account to show you its limitations (I recommend upgrading to a paid account, it’s a credit system).

Below you can see they have a significant amount of duplicate content. Yikes.


Step 2: Time to dig deeper; Click the “Duplicate Content” link shown below:

SiteLiner.com 2

This is the results page from using a free account, which limits you to 250 pages of analysis. 


Step 3: Start clicking on each page to see even more detail (where SiteLiner shows you exactly where the duplications are).

As you can see, there’s a nice breakdown of each page that has some duplication issues.

SiteLiner.com 3

 


Siteliner.com does a pretty nice job of highlighting the duplicate content.

SiteLiner.com 4

 

Once you find content that is duplicated, you need to take action and clean up the mess.

The most likely course of action is for you to remove the duplicate content and replace with high quality, extensive unique content to boost the quality score of that page. If you just don’t want the page at all and want to get rid of it, just de-index it!

Method #2 – URLProfiler.com

Pro’s: This is a more accurate method as it excludes “common content”; you can see all the data in a spreadsheet vs clicking from window-to-window. Essentially, the data is easier to see and manipulate.

Con’s: A more time-consuming process

I’d personally use URLProfiler.com if you like to work with spreadsheets!

We’ve recently started using this tool in our site audits and find it does a great job identifying both internal and external duplicate content issues.

If you like to work with spreadsheets, then you may favor this tool more than our 1st recommendation.

(You will have to work a bit to get the information you’re looking for, but the end results are excellent).

Before starting:

  • You’ll need a current list of proxies to load into the software. The duplicate content feature of URL Profiler works by searching Google for exact match snippets of content from the pages you are testing.If you’re only testing a few URLs, you can probably get by without proxies, but for testing URLs in bulk, it’s the only way to go – unless you enjoy manually entering captchas into Google for the rest of the day!
  • Also, make sure you have a list of URL’s that you want to analyze in a list ready to paste into the tool. (You can use Screaming Frog to get that list of URL’s for any domain, or if you don’t have that tool then use this free one which crawls your site and returns a list of URL’s that you can download into Excel. (100 URL’s without registering and up to 1,000 if you do register).

Now that we’ve got that out of the way, here’s our process!


Step 1: Paste the URLs you want to analyze (for duplicate content) into URL Profiler by right clicking in the URL List area and selecting “Paste from clipboard.”

URL Profiler 1


Step 2: Under “Content Analysis” select the “Duplicate Content” checkbox.

URL Profiler 2

Increase the accuracy of results by identifying your “CSS Selector”. By doing this, this will isolate the HTML element containing the main content of the page (thereby leaving out content in sidebar, header, footer etc).

Step 3: To find the CSS selector, right click on your web page and select “Inspect” (See the example below):

URL Profiler 3

 


Step 4: Next, highlight the content on the website and the CSS selector in the Inspection pane will then become highlighted. 

URL Profiler 4

 


Step 5: Copy This Selector and Paste it into The Content Area in URL Profiler:

URL Profiler 5

Step 6: Click “Apply” and then Click “Run Profiler.”

Step 7: Save the file and once the results are ready, simply click “Open.”

Step 8: Once you have your spreadsheet open, expand columns S, T and V to see your results.

  • Column S will contain the first snippet of text that URL Profiler scraped from your site.
  • Column T will contain the second snippet of text the software scraped.
  • Column V contains the URL that Google returned as the first result when searching for “snippet 1″+”snippet2”. If this URL is the same as your original URL (located in Column A), then there are no duplicate content issues!

However, if there is a different URL in Column V, you have a duplicate content issue.

URL Profiler 6

Once you’ve found duplicate content issues, how do you fix them?

The solutions are as follows:

Conclusion – Which Tool is Best To Use?

I personally recommend that to use SiteLiner.com if you want to work on fixing a smaller number of pages that have duplicate content issues (e.g. 20-50 pages or under). You can go through each one and it’ll effortlessly show you the duplications which you can then go about fixing.

Any more than 20-50 pages and I’d personally go a little crazy flicking back and forth between the windows; having it all in a spreadsheet might be easier for some (i.e. use URLProfiler.com)

And so – for larger sites and a more accurate “deep dive”, try URLProfiler.com – ultimately, it’s up to you and I encourage you to test both!

To wrap up: the whole focus of this article was to help you fix your duplicate content issues – and to give you a range of options.

If this is all too much for you, we offer a full-spectrum onsite audit (wherein we do all of this – and more), either for your own sites or your clients. You’ll find it by logging into PosiRank, and going to Order Services > Onsite SEO > Full-Scale, Comprehensive Site Audits.

But nonetheless, now you know the pros do it 🙂

Talk soon,

Alex Miller
PosiRank.com

Comments

  1. Simon - July 31, 2016 @ 6:56 am

    Nice writeup.

    I like Siteliner for being able to explain to people what duplicate content actually looks like on their site.

    You can also use Xenu Link Sleuth to crawl a domain for URL’s. It will also find broken links, redirected URL’s and generate sitemaps. Great companion to using Screaming Frog.

    You can get by the limit on the free version of screaming frog by using Xenu first to get your URL’s and then paste them all into Screaming Frog 🙂

    Simon.

    • Alex Miller - August 2, 2016 @ 3:27 pm

      Great tips, thanks Simon!

  2. frank - August 9, 2016 @ 6:57 pm

    Hi Alex

    Great information. For distribution of other company products name, when I use siteliner, it shows all products name as duplicate info. How do you solving that kind of issue? should I ignore that?

    • Alex Miller - August 9, 2016 @ 7:49 pm

      Hi Frank, I’m not sure I’m totally understanding. Can you share a link? If not, please email me at alexmiller@posirank.com and I’ll do my best to help you.

  3. Richard - August 10, 2016 @ 4:02 pm

    I attempted to use Siteliner but 99% of the duplicate content it is picking up is from the content being shown on the blog article and then again in the category archive and tag archive. Is that actually bad? Should the category and tag archive not be indexed?

    • Alex Miller - August 10, 2016 @ 4:54 pm

      Great question Richard. I would not index your tag pages (they offer zero value to Google), and for your category pages I would recommend adding unique content to those pages to improve their strength. Category pages can rank well (and be useful!) if they are customized and improved in terms of quality. Hope this helps

  4. justin brenner - August 30, 2016 @ 3:31 pm

    Why is your blog on a subdomain? I have read that this is not best practice SEO. you would typically want it more like posirank.com/blog could you elaborate on this?

    • Alex Miller - September 28, 2016 @ 9:27 am

      @Justin – both ways are fine. We chose to place the blog as it is so that the authority from the site flows directly to the blog posts (which will help them to rank). If you place the blog on a subdomain then there won’t be as much link equity flowing to the posts. Benefit of being placed on a subdomain though is that it’ll be it’s own entity which can protect your main domain should you run into link issues on your blog or quality issues (comments etc)

Leave a Reply

Your email address will not be published / Required fields are marked *