Identifying non-reproducible problems, maybe not in my code... ideas?

A couple of weeks ago my client’s web site became unstable. “Unstable” means several things:

  • The server appears to lose the contents of the $_SERVER and/or $_SESSION variables when certain pages are loaded. This is recurrent but not reproducible (the same user can do the same thing several times and have a problem some times but not others).
  • My client has a reproducible problem at one point in the site. I haven’t figured out what is happening; it seems that the site starts to load the page it should, then stops part way through and loads a different one. I cannot get the system to exhibit this problem at all. I’ve been trying to debug it by adding code to write data to the application’s log, then calling the client and asking her to reproduce the problem for me.

The instability affects both our staging and production sites, although I’m not sure whether the symptoms are exactly the same. These sites run on the same dedicated virtual server operated by a major hosting company.

I’ve been trying to locate the problem with no luck. Yesterday my client told me to restore the site to its last known good state.

I did that to the staging site last night. While testing, I encountered some unexplained, transient problems which looked suspiciously like the same type of instability. We agreed to restore the production site this morning anyway. While testing, I encountered one unexplained, transient problem which looked very much like the same type of instability.

I’m starting to think seriously that something is wrong with the site itself, not with my code.

I’d like to involve the hosting service at this point, but I don’t expect them to be receptive to a request that amounts to “I can’t find anything in my code so it must be you, figure it out and fix it please!” I don’t have proof that it’s the server, and I don’t know how I might get some.

I’m looking for suggestions on how to approach this problem. Apart from calling the host, I’m fresh out of ideas.

If the site isn’t high traffic you should ask the client to reproduce the errors and note the exact times when they occurred, then search through the raw server logs (both access log and error log) around those times for clues. It is possible that there is an issue that the hosting company has, if you want to discard this as a possible reason, deploy to an alternate host using an alternative subdomain e.g www2 that is behind an .htacess or other authentication barrier and then get them to test that.

Hi,

Do you have any other location that you can push the site onto and see if you get the same behaviour. If you don’t then it is not likely your code or it is some upgrade/patch that your ISP has applied to your servers that poorly interacts with your site’s code. The best way is to determine if another host is problem free. You then have a basis to discuss with your ISP that it works over here, and it used to work on our servers with you, so something has changed on your end and we need to get it fixed.

Have you looked at the Apache/IIS and system logs? Have you watched for JavaScript errors when the site loads? Have you seen what headers are being sent using firebug or Dragonfly?

Ask your host if they recently changed/upgraded the servers - Apache, MySQL, PostGre… Ask if they have made any routing changes that may affect your site. See if they have recently upgraded the kernel of the servers (if they are Linux).

Steve

Installing the site on another host does seem to be the best approach. I suggested it to the client and she thought it was “premature,” but I think I need to change that perception if the problem’s own development doesn’t do so. I’m checking whether one of the people working on this project has access to a suitable host, and if not I’ll find us one.

The only time I came across anything remotely similar was when I used APC - it only affected a small part of the site, but it was conflicting with another caching mechanism I had build into a web app. Might be a dead end, but maybe the hosters have enabled some kind of mem-cache somewhere?

Cups, your post sent me off in two directions, both possibly useful.

First, I’ve never heard of APC, and now that I’ve looked it up I have only a vague idea what it does. It seems to concern PHP intermediate code, not data, and since I don’t try to modify my intermediate code (in fact I have no idea how I could), it’s not clear what I might be doing that could create conflicts. Can you give me any insight?

Second, you made me think of something I’m doing that might be a problem: modifying the contents of $_POST in my scripts.

When I thought of doing that, I saw that it was a simple way to solve a problem that would otherwise require a lot of work, so I didn’t wonder whether it was good practice. I just tried it, and it worked, so I did it. Maybe I shouldn’t have. Do you know, does anyone here know anything about that?

Cups, your post sent me off in two directions, both possibly useful.

Both possibly red herrings of course :wink:

…modifying the contents of $_POST in my scripts.

When I thought of doing that, I saw that it was a simple way to solve a problem that would otherwise require a lot of work, so I didn’t wonder whether it was good practice. I just tried it, and it worked, so I did it. Maybe I shouldn’t have. Do you know, does anyone here know anything about that?

There have been past discussions here about meddling with globals like the $_POST array, and I think most impressions were it is seen as a Bad Thing, mostly because you cannot guarantee every $_POST var has actually passed through your “cleansing gateway” - if that is indeed what you had done.

I have to say that many of those discussions went hand in hand with people incorrectly “pre-escaping” data which was likely headed for a database, and that the escaping mechanisms chosen were usually the wrong ones in any case.

In these days of PDO and prepared statements all it does is add a level of complexity which serves to only trick oneself.

I know only too well this is true, because I thought it was a good idea once, and it took years to eradicate its effect from some old code-bases.

Cups you mean the eradication of manually escaping data and not PDO and prepared statements which now do this ‘database’ escaping for you right?

Steve

Instead of resorting this kind of thing in the past:


array_map($_POST, "mysql_real_escape_string");

the general tendency is nowadays to use the far more explicit:


$PDO->bindParam(1, $_POST['incoming']);

When you have to access a $_POST[‘incoming’] var, you do not want to be scratching your head wondering if it was really sanitized or not.

There was also a lot of extract() going on, which resulted in us wondering if $incoming had been cleansed, filtered or escaped or where it had originated from. Leave the global arrays as they are then when you need to access them instead of just seeing $_POST[‘incoming’] your brain actually sees:

ALERT $_POST[‘incoming’] ALERT

I am only using this SQL case as an example which of course does not typify everyone’s code in those days, I am just saying it was more prevalent a few years ago, and the discussions can still be unearthed.

Sorry my previous reply was unclear.

Thanks @Cups; ,

That is a very clear explanation, I appreciate you clarifying your thoughts :slight_smile: