Duplicating Content to Avoid Duplicate Content

anhblog.net · May 7, 2009

Duplicate content
Duplicate content is loosely defined as being several copies of the same content in different parts of the web. It causes a problem when search engines spider all the copies, but only want to include a single copy in the search engine index to maintain relevancy. Generally, all other copies of this ‘duplicate’ content are ignored in the search results, bunged into the supplemental index, or perhaps not even indexed at all.

Doing the right thing
Fundamentally, search engines want to do the right thing and let the original author rank in the search engines for the content they wrote. We have just said that only one copy of the content can rank in the search engines, and search engines probably want this to be the original.
When you search for something, and some Wikipedia content appears in the search results, you want to see the copy from wikipedia.org, not one of the thousands of copies of the content on various rubbish scraper sites.
I believe that search engines want to give the search result / traffic to the original author of the content.
Determining the original author of the content
Consider the following 4 copies of the same article on different domains. How is Google to know which is the original copy ?
Duplicate content example
The above example shows 4 copies of the same content. The date indicates the date the content was first indexed, and the PageRank bar indicates, um, PageRank. Let’s assume for this simplified example that PageRank is an accurate measure of link strength / domain authority / trust etc. The smaller pages pointing to each larger page represent incoming links from other websites.
* Document 1 was first indexed a couple of weeks after the other copies, so as a search engine you might decide that this is not the original because it wasn’t published first.
* Documents 2 and 3 have the same good PR, were first indexed about the same time, and have the same number of incoming links.
* Document 4 was indexed slightly after documents 2 and 3, and it also has less PR and less links, so as a search engine you might conclude this is not the best copy to list either.
As a search engine, we are stuck deciding between document 2 and document 3 as to which is the original / best copy to list. At this point, Google is likely to take it’s best guess and leave it at that, which will see the original author “penalised” on many occasions.
Enter the cheesy scraper sites
Let’s recycle that same example, but this time we are going to add an “author credit” link to the bottom of document 4. Document 4 could be considered a cheesy low PR, low value, scraper site, but one that was kind enough to provide a link back to the original document.
Duplicate content example 2
All of a sudden, there is a crystal-clear signal to the search engines that document 2 is the original.
When there are a collection of identical pages out there on the web and it’s hard to decide who is the author - it’s likely that search engines look at how those copies link between each other and use that data to determine an original.
All other things being equal, this seems like a logical assumption to make.
Duplicating your own content
So, if you know your content is being duplicated on scraper sites - I’m saying you can prevent getting penalised by making sure some of the scrapers provide a link back to your original document.
If none of the scrapers are polite enough to do this, then I’m suggesting you should create your own scraper site, scrape your own content, and provide a link back to yourself.
As RSS feeds become more popular, and content is recycled all over the web, this problem is only going to get worse.
Disclaimer: I’m yet to back this up with any real testing, so don’t blame me if you duplicate your own website and find yourself having duplicate content problems. I wouldn’t even consider this tactic unless you were having problems with high PR sites scraping your RSS feed.
Full of How to Optimization of Google ( SEO on Page , Incoming Links, Sucess Ranking in Google )

rajupatel544 · May 8, 2009

Hello, thanks for sharing this good tips

stanneon · May 8, 2009

I have read about that, and hey! I just found tool that lets me chack if the content is a duplicate content. The Plagiarism Checker

bobwilson37 · May 15, 2009

Thank you very much for your post. It informs a lot.
Keep sharing.

Robdale · May 15, 2009

Impressive article, Thanks for writing such a wonderful article.

YooHee · May 20, 2009

i think using this new code from google can really help to stop duplicating content of a site

surreypcsupport · May 20, 2009

anhblog.net said:
Duplicate content
Duplicate content is loosely defined as being several copies of the same content in different parts of the web. It causes a problem when search engines spider all the copies, but only want to include a single copy in the search engine index to maintain relevancy. Generally, all other copies of this ‘duplicate’ content are ignored in the search results, bunged into the supplemental index, or perhaps not even indexed at all.

Doing the right thing
Fundamentally, search engines want to do the right thing and let the original author rank in the search engines for the content they wrote. We have just said that only one copy of the content can rank in the search engines, and search engines probably want this to be the original.
When you search for something, and some Wikipedia content appears in the search results, you want to see the copy from wikipedia.org, not one of the thousands of copies of the content on various rubbish scraper sites.
I believe that search engines want to give the search result / traffic to the original author of the content.
Determining the original author of the content
Consider the following 4 copies of the same article on different domains. How is Google to know which is the original copy ?
Duplicate content example
The above example shows 4 copies of the same content. The date indicates the date the content was first indexed, and the PageRank bar indicates, um, PageRank. Let’s assume for this simplified example that PageRank is an accurate measure of link strength / domain authority / trust etc. The smaller pages pointing to each larger page represent incoming links from other websites.
* Document 1 was first indexed a couple of weeks after the other copies, so as a search engine you might decide that this is not the original because it wasn’t published first.
* Documents 2 and 3 have the same good PR, were first indexed about the same time, and have the same number of incoming links.
* Document 4 was indexed slightly after documents 2 and 3, and it also has less PR and less links, so as a search engine you might conclude this is not the best copy to list either.
As a search engine, we are stuck deciding between document 2 and document 3 as to which is the original / best copy to list. At this point, Google is likely to take it’s best guess and leave it at that, which will see the original author “penalised” on many occasions.
Enter the cheesy scraper sites
Let’s recycle that same example, but this time we are going to add an “author credit” link to the bottom of document 4. Document 4 could be considered a cheesy low PR, low value, scraper site, but one that was kind enough to provide a link back to the original document.
Duplicate content example 2
All of a sudden, there is a crystal-clear signal to the search engines that document 2 is the original.
When there are a collection of identical pages out there on the web and it’s hard to decide who is the author - it’s likely that search engines look at how those copies link between each other and use that data to determine an original.
All other things being equal, this seems like a logical assumption to make.
Duplicating your own content
So, if you know your content is being duplicated on scraper sites - I’m saying you can prevent getting penalised by making sure some of the scrapers provide a link back to your original document.
If none of the scrapers are polite enough to do this, then I’m suggesting you should create your own scraper site, scrape your own content, and provide a link back to yourself.
As RSS feeds become more popular, and content is recycled all over the web, this problem is only going to get worse.
Disclaimer: I’m yet to back this up with any real testing, so don’t blame me if you duplicate your own website and find yourself having duplicate content problems. I wouldn’t even consider this tactic unless you were having problems with high PR sites scraping your RSS feed.
Full of How to Optimization of Google ( SEO on Page , Incoming Links, Sucess Ranking in Google )

Talking of duplicate content I just found this exact same information in the following locations:

Duplicating content to avoid duplicate content | RagePank SEO

Duplicating Content to Avoid Duplicate Content | FedEx's Blog

Blogger and Tips Site Details & Statistics - Blog Top Sites

Maybe you should practice what you preach!

anhblog.net · May 20, 2009

YooHee said:
i think using this new code from google can really help to stop duplicating content of a site

Which one is that ? Can you explain something :goodjobsign:

surreypcsupport · May 20, 2009

anhblog.net said:
Duplicate content
Duplicate content is loosely defined as being several copies of the same content in different parts of the web. It causes a problem when search engines spider all the copies, but only want to include a single copy in the search engine index to maintain relevancy. Generally, all other copies of this ‘duplicate’ content are ignored in the search results, bunged into the supplemental index, or perhaps not even indexed at all.

Doing the right thing
Fundamentally, search engines want to do the right thing and let the original author rank in the search engines for the content they wrote. We have just said that only one copy of the content can rank in the search engines, and search engines probably want this to be the original.
When you search for something, and some Wikipedia content appears in the search results, you want to see the copy from wikipedia.org, not one of the thousands of copies of the content on various rubbish scraper sites.
I believe that search engines want to give the search result / traffic to the original author of the content.
Determining the original author of the content
Consider the following 4 copies of the same article on different domains. How is Google to know which is the original copy ?
Duplicate content example
The above example shows 4 copies of the same content. The date indicates the date the content was first indexed, and the PageRank bar indicates, um, PageRank. Let’s assume for this simplified example that PageRank is an accurate measure of link strength / domain authority / trust etc. The smaller pages pointing to each larger page represent incoming links from other websites.
* Document 1 was first indexed a couple of weeks after the other copies, so as a search engine you might decide that this is not the original because it wasn’t published first.
* Documents 2 and 3 have the same good PR, were first indexed about the same time, and have the same number of incoming links.
* Document 4 was indexed slightly after documents 2 and 3, and it also has less PR and less links, so as a search engine you might conclude this is not the best copy to list either.
As a search engine, we are stuck deciding between document 2 and document 3 as to which is the original / best copy to list. At this point, Google is likely to take it’s best guess and leave it at that, which will see the original author “penalised” on many occasions.
Enter the cheesy scraper sites
Let’s recycle that same example, but this time we are going to add an “author credit” link to the bottom of document 4. Document 4 could be considered a cheesy low PR, low value, scraper site, but one that was kind enough to provide a link back to the original document.
Duplicate content example 2
All of a sudden, there is a crystal-clear signal to the search engines that document 2 is the original.
When there are a collection of identical pages out there on the web and it’s hard to decide who is the author - it’s likely that search engines look at how those copies link between each other and use that data to determine an original.
All other things being equal, this seems like a logical assumption to make.
Duplicating your own content
So, if you know your content is being duplicated on scraper sites - I’m saying you can prevent getting penalised by making sure some of the scrapers provide a link back to your original document.
If none of the scrapers are polite enough to do this, then I’m suggesting you should create your own scraper site, scrape your own content, and provide a link back to yourself.
As RSS feeds become more popular, and content is recycled all over the web, this problem is only going to get worse.
Disclaimer: I’m yet to back this up with any real testing, so don’t blame me if you duplicate your own website and find yourself having duplicate content problems. I wouldn’t even consider this tactic unless you were having problems with high PR sites scraping your RSS feed.
Full of How to Optimization of Google ( SEO on Page , Incoming Links, Sucess Ranking in Google )

Talking of duplicate content I just found this exact same information in the following locations:

http://www.ragepank.com/articles/137/duplicating-content-to-avoid-duplicate-content/

http://anhblog.net/search-engine-optimatization/duplicating-content-to-avoid-duplicate-content/

http://www.blogtopsites.com/sitedetails_14019.html

Maybe you should practice what you preach!

surreypcsupport · May 20, 2009

anhblog.net said:
Duplicate content
Duplicate content is loosely defined as being several copies of the same content in different parts of the web. It causes a problem when search engines spider all the copies, but only want to include a single copy in the search engine index to maintain relevancy. Generally, all other copies of this ‘duplicate’ content are ignored in the search results, bunged into the supplemental index, or perhaps not even indexed at all.

Doing the right thing
Fundamentally, search engines want to do the right thing and let the original author rank in the search engines for the content they wrote. We have just said that only one copy of the content can rank in the search engines, and search engines probably want this to be the original.
When you search for something, and some Wikipedia content appears in the search results, you want to see the copy from wikipedia.org, not one of the thousands of copies of the content on various rubbish scraper sites.....

Talking of duplicate content I just found this exact same information in the following locations:

http://www.ragepank.com/articles/137/duplicating-content-to-avoid-duplicate-content/

http://anhblog.net/search-engine-optimatization/duplicating-content-to-avoid-duplicate-content/

http://www.blogtopsites.com/sitedetails_14019.html

Maybe you should practice what you preach!

anhblog.net · May 20, 2009

surreypcsupport said:
Talking of duplicate content I just found this exact same information in the following locations:

http://www.ragepank.com/articles/137/duplicating-content-to-avoid-duplicate-content/

http://anhblog.net/search-engine-optimatization/duplicating-content-to-avoid-duplicate-content/

http://www.blogtopsites.com/sitedetails_14019.html

Maybe you should practice what you preach!

Sorry. Search in Google ( main of google ) with sentence Duplicating Content to Avoid Duplicate Content my article in tops. But how can i avoid people copy that article to republish

surreypcsupport · May 20, 2009

anhblog.net said:
But how can i avoid people copy that article to republish

I wouldn't worry about that. You are duplicating the content yourself by creating this thread.

anhblog.net · May 20, 2009

surreypcsupport said:
I wouldn't worry about that. You are duplicating the content yourself by creating this thread.

No. This topic is not version of duplicate because i have linked to original version in my blog. You should know what is copy without author link and copy with author link.

surreypcsupport · May 20, 2009

anhblog.net said:
No. This topic is not version of duplicate because i have linked to original version in my blog. You should know what is copy without author link and copy with author link.

The bots will still see it as duplicate. And therefore there is a risk you are downgrading your original content by creating this duplication. So any value you are creating through the backlink could be lost.

surreypcsupport · May 20, 2009

All I did was search for the first paragraph in quotes and got those three links I posted up in that order.

BestBlogs · Jun 15, 2009

Thank you for this post buddy

advancenet · Jun 15, 2009

surreypcsupport said:
All I did was search for the first paragraph in quotes and got those three links I posted up in that order.

Thanks for your post. As I see most of the SEO service providers offer 100s of articles, blogs etc. posting for any particular website and charge the site owner an amount. I think they are doing the same thing. Posting same blog or article to 100s of sites/directories.
Posting of same article / blog to many sites will downgrade the ranking of that particular site. Am I right? Thanks

The Most Active and Friendliest
Affiliate Marketing Community Online!

Duplicating Content to Avoid Duplicate Content

anhblog.net

New Member

rajupatel544

New Member

stanneon

New Member

bobwilson37

New Member

Robdale

New Member

YooHee

New Member

surreypcsupport

New Member

anhblog.net

New Member

surreypcsupport

New Member

surreypcsupport

New Member

anhblog.net

New Member

surreypcsupport

New Member

anhblog.net

New Member

surreypcsupport

New Member

surreypcsupport

New Member

BestBlogs

New Member

advancenet

New Member

Similar threads

The Most Active and Friendliest Affiliate Marketing Community Online!

Duplicating Content to Avoid Duplicate Content

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

Similar threads

The Most Active and Friendliest
Affiliate Marketing Community Online!