I am taking care of the large “news” website (500k pages), which got massive hit from Panda because of the duplicated content (70% was syndicated content). I recommended that all syndicated content should be removed and the website should focus on original, high quallity content.
However, this was implemented only partially. All syndicated content is set to NOINDEX (they thing that it is good for user to see standard news + original HQ content). Of course it didn’t help at all. No change after months. If I would be Google, I would definitely penalize website that has 80% of the content set to NOINDEX a it is duplicated. I would consider this site “cheating” and not worthy for the user.
What do you think about this “theory”? What would you do?
For example, if the editor of The Example Times wants to ensure that the article she is using with permission from The Example Gazette doesn’t get included in Google News, she would implement the following code in that article page’s HTML: <meta name="Googlebot-News" content="noindex">
I’d probably follow your suggestion and delete the old syndicated content, then 404 the old pages. I’m assuming most of those pages are quite old so they’d have very little value anyway.
If you make content noindex, you are penalizing your self, by telling SEs not to index those pages. SEs generally judge content on a page by page basis, not the whole site.