Hiding Sites from Search Engines

Hiya,

I have a development site where clients can view their sites in progress. I need to make sure that these sites aren’t available to search engines until they are ready to go live. I’ve read up on robot.text files but am confused. Is this a separate web page? If so, what exactly does the page consist of? I gather it’s not just a piece of code which I put on my index page.

If anyone has time, please could you do me a quick idiots guide to hiding sites from search engines, and then removing the code when I want to publish.

Many thanks!

Badger.

You’d probably need more than a robots.txt (which you place in the root) and most likely would have:

User-agent: *
Disallow: /

That might help hide a linked site from good robots though I assume you do not publicly link to this site test site anyway from the internet? If you do have links then something more than a robots exclusion would be needed in case; wetware or liveware accidentally started linking to the pages from other sites.

Although google have said that they can find parts of a site that has no inbound links, it takes time for this to happen. So as long as there are NO other web pages linking to your new development section, search engines are unlikely to find the new pages during the time you are working on a site. I once made a test page to see if this was true, and it was something like six months before it appeared in google with no inbound links to it. And that was possibly just luck, as a second test page never got into google even after a year. Even with a link to a new site, it can be many months before google finds all the site’s pages - they seem to index just the home page on their first visit, then on their next visit try just one or two of the links on it on their next visit, and so on.

When developing a site, I usually just create a folder within my own web site for the new site, and when it’s done either point the domain name at the folder (if only I will ever be maintaining the site), or move it to a new location. Nothing has ever appeared in google before completion and going live.

I think this guide probably covers everything you need to know. If/when you decide you are ready for a particular directory to be indexed, you simply delete the appropriate line in the robots.txt file.

Thanks all, that’s brilliant.

That’s what I always thought as well.

But, a few weeks ago, I registered a new domain and applied it to a new site. I didn’t give the URL to anyone or post it anywhere. But within 24 hours the site was visited by Googlebot. One day later, the site was in Google’s index. And now, after three weeks, the site has received about a dozen visitors. And all this time I’ve been the only one who knows the URL (well, except for the hosting company and the domain registrar).

Based on this experience, I wouldn’t rely on the site not showing up, just because you don’t publicise the link. As Dr John rightly says, it’s unlikely you’ll get any significant traffic. But if you really want to hide the site, Robots.TXT is surely the way to go.

Mike

Thanks Mikl. It’s not as if I’m doing anything I don’t want people to know about! I just don’t like the thought of people being able to see my unfinished work. It’s looking like Robots.txt is what I need.

Thanks for this, it helped me as well. Was curious about the robot.txt :slight_smile: