Search Engine Indexing

Hi everyone,
Our website has 104,000 indexed pages on Google however a good chunk of these indexed pages are empty breadcrumb URL’s with no content on the pages. There is no 404 error or redirect on these pages, just our HTML template without any visible content. Also, when we remove an item from our website it stays on Google and directs to the empty page. I believe this is where the root of the problem occurs. Should we be telling the SERPs to also de-index our pages when we remove items from our website and if so, how can this be done sooner rather than later?

We are using a custom built CMS from scratch. I am not a programmer but I can pass on the information to our programmer who can perhaps put some more work into the breadcrumbs or robots.txt file if that’s what is required to resolve this.

With these empty pages removed from Google’s index we are hoping that the overall SEO will improve on our website too.
Thanks,

I know that you can remove pages from google using Google webmaster tools. You could also use a robots.txt file to not allow search engines to index certain parts or pages of your site.

Is there a way to automate these Google submissions? I am noticing forums out there become indexed within minutes after posting a new thread. When that thread is removed it is also removed from Google pretty quickly too it seems. Is this only because the forum is a high traffic website and Google puts more priority on real time indexing for that website?

I will go about the manual Google Webmaster method for now but hoping to get it automated in the future for instances like these.

The best plan is to try to reprogram the CMS so that requests for non-existent pages get met with a 404 error, and that pages that get removed are given a redirect to a suitable alternative, or are marked as ‘gone’, because as long as they are giving 200 A-OK Google will continue to index them.

Thanks -
What is the most effective way to mark pages as ‘gone’? Just a simple 404 page, a robots.txt file, something in the <head> tag, or another way? Actually come to think of it each page is generated through PHP so a file directory will not exist for multiple robots.txt files.

Google won’t like that one little bit, and I won’t be surprised if sooner or later that leads to a really low serp ranking.
I would think belt and braces approach with a block in your robots file for all pages with little or not content, as well as redirects would be your best option. On the double wouldn’t be a bad idea.

Set permission of all your CMS files to no-follow and then count the index pages… this would be the original number of indexed pages for your site…

From a search engine perspective, they all recognise 404 errors, so you should make sure that your server is sending a 404 error in the HTTP header.

However, from a user perspective you’ll want something a bit more long-term, whether it be 301/302 redirects or a dynamic 404 page that hints towards similar pages.

What are those tools I am not familiar …Could you please further explain?

robots.txt
Google Webmaster Tools
The answers are pretty easy to find if you try!

Nowadays Google is moving towards Manual Links. As if there are some empty pages, it may also see those as a Keyword stuffing. First of all make those links nill and then remove those pages because if they redirects their than its also harmful for you. I must say you have very good website as you said lots of pages are indexed so you need to do this ASAP.