I heard the duplicate content penalty applies to only dupe content on your site but not to scraped content taken from the web and put on your site.
Is this true or does it apply to both?
Whether Google penalizes this or not, it is still duplicate content. Why would you scrape content from the web anyway? Is this something like a Twitter feed, or are you plagiarizing content?
There's two things you might be getting mixed up with here.
What people typically refer to as the "duplicate content penalty" isn't an actual penalty at all, it's just down to badly targeted googlejuice. Let's say that you have a page that can be accessed at example.com or example.com/index.htm or www.example.com or www.example.com/index.htm. If you don't have a canonical tag set and you don't do any visible rewriting or matching of the URL, there's a danger that Googlebot treats those four URLs as four separate pages. Then it might be the case that a quarter of your incoming links point to each one – when that's the case, you're dividing your link juice four separate ways. The result of that is that none of the formats are getting the full benefit of all the available link juice, so are not likely to rank as well as they would do if you were channelling all link juice into a single URL.
The second issue is about plagiarised, stolen or scraped content. If Google believes that the content on your site has been illegitimately copied from another website then it is likely to actively penalise you, and probably blacklist your site altogether.
The real reason why many websites get into trouble is that their admins only look for shortcuts and easy methods of publishing contents on their pages and therefore, copying materials from other sites and pasting them on theirs. Since the recent Google updates, a lot of websites having many indexed pages have got into troubles, the ones whose contents where mostly duplicates from other original sources.
Try to write your very own original contents and articles or even if you have to buy some valuable materials from copywriters, do that because presence of unique materials and contents on your site will pay off in the long term. Duplicate content seems to be one of the most recent elements which is considered heavily by Google to either give rewards to the websites or filter their ranks and lower their positions.
I had some penalties, when my contet show up on another web site.
article was copied 1-2 days after writen, and on google my article "dissapeared"
on this page you can easy check do you have duplicate content
Please do not do this your rank will fall down dramatically due to google panda. The era of duplicate content has put to end with this. Please keep yourself away from it.
Your post is definitely food for thought...
Will Google Panda have an effect on their own Google Search? Can we expect in future only to be shown a list of what they consider the top five search items because the rest are considered duplicates?
Also how about Google Adsense? When surfing I can guarantee to see the same adverts repeated on numerous sites. Does this not count for duplicate content?
This topic is now archived. It is frozen and cannot be changed in any way.