How can I filter googlebot sessions?

I am using a simple php script to estimate the number of online users by counting the number of apache session files in my tmp directory. Every now and then the # of users spikes up and remains inflated for several days before returning to normal. I checked today during another inflated period, and it looks like the bulk of the sessions are coming from google’s crawler (crawl-66-249-71-56.googlebot.com).

So my questions:

  • Is there a reason there are over 100 sessions coming from the google crawler?
  • Why does this happen periodically over a span of several days and then go away?
  • What can I do to filter this information?

Thanks.

Because the bot is not sending back the session identifier cookie you sent it, your site is creating a new session for every page it requests. 100 sessions just means Google has requested 100 pages of your site in less than the time it takes for garbage collection to purge the old sessions (usually 24 minutes).

Your site is being recrawled, so that Google can keep its search results fresh. It will return periodically forever unless you tell it not to with a robots.txt rule.

You could not start a session when googlebot is the agent making the request. This would mean changing your website code where the session is initiated.