lemon_juice — 2013-10-07T06:33:13-04:00 — #1
On most of my sites I have a simple logging system of all PHP page requests that I save to the database to have something easier to work with than Apache logs. I collect data such as URL, IP, host name, user agent, sometimes username, session ID, etc. I also collect the time the page took to render. The way I measure time is like this: at the start of the script (common index.php) I store microtime(true) in a global variable and then in my shutdown function I take another measurement and store the time duration. This works well in most cases but this measurement may also include the time the remote client took to download the page. While this knowledge is useful I'd also like to have a measurement of 'pure' PHP execution time excluding the network data travel.
Most of the pages are rendered with template engine (for example Smarty) but there are also pages that are rendered by a separate class (for example, XML reports, generated PDF pages, downloads, etc.). So there's really no uniform way of sending output to the client and there's no one particular final echo statement or ob_get_contents which I could precede with a microtime() measurement to know how much time PHP spent on executing the CPU intensive tasks of the script. Smarty uses output buffering so HTML pages are sent in one go, which makes the measurement accurate in most of the cases, however sometimes the network transfer times can add up to it.
I'm wondering, is there some universal way to measure the script execution time while ignoring the time it takes to send the page to a remote client?
cpradio — 2013-10-07T08:43:28-04:00 — #2
I'm not sure why you think that time would include the downloading of the data to the end user. That's Apache's job and thus it would be difficult for your application to contain that (wouldn't it?)... So unless you have any CURL calls or are accessing databases across the network, the network communication stored within that time would be very minimal.
cpradio — 2013-10-07T08:44:39-04:00 — #3
Another thought (that may prove or disprove my claim)
You can always use time wget <url> to see how it matches up with the time your log states it took. If there is a deviation between the two, yours does not contain the time the client needs to download and load the page.
lemon_juice — 2013-10-07T09:17:18-04:00 — #4
I'm sure in some cases the time includes the downloading of the data, there's no question about it since I've studied the logs enough to see what is going there.
Normally, when you output the page using echo the data gets buffered to some predefined length and then it is sent to the browser and so on and on (that's why you can immediately send output to the browser using flush() or ob_flush() ). Then the PHP script will wait for each chunk of data to be received by the client. Sure, it's Apache's job but here PHP is running under Apache and it's PHP's job to tell Apache what to send :).
But in my case the situation is better because on most pages the template engine buffers all output before sending it to the client, moreover zlib compression is enabled on the server, which further prevents from sending data in chunks so the measurements stay pretty accurate (fractions of a second).
However, I've noticed that scripts that output large XML files (around 3MB) run for quite a while (even >30 seconds) and data transfer is included in the time. I know because for those I have a separate log file where I count how much the generation script runs before it starts outputting and the difference is quite large. For example, generating XML takes 4 seconds (measured up to just before the final echo) while the whole script duration (measured up to the shutdown function execution) takes 20-30 seconds. There's really nothing else that could take so long apart from the sending data to the client.
I've also seen some rare cases when PHP scripts of standard (buffered) html pages took very long to complete. I had a few page requests by some bot from China which took 300 seconds as measured up to the shutdown function. And there was nothing abnormal in my PHP scripts, they didn't do any processing during that time (my hosting provider confirmed there was no CPU usage at all), they simply hanged waiting for the response from the client until the server timeout was reached and ended.
So I'm pretty sure sending data is included in the global measurement, at least in some of the cases. Of course, we are not talking about any CURL requests and the like - there are none in the scripts I'm taking into account.
cpradio — 2013-10-07T09:31:49-04:00 — #5
I see the use of echo/ob_flush as part of your scripts execution, sure apache may start to send that output to the user for "actual download" but the compiling of that output and the memory/streams developed from it are still part of your script's execution.
That still "shouldn't" include the actual processing of that data being sent from apache to the user's browser...
tomb — 2013-10-07T10:04:33-04:00 — #6
I agree with cpradio. It would be a very poor design decision if the PHP Dev team made the script pause every time you called echo for the network activity to finish before proceeding with the script.
To test it, run the same script from the browser and the command line and you shouldn't see any difference.
lemon_juice — 2013-10-07T15:48:36-04:00 — #7
I don't know if I understood what you were trying to say in the first sentence. What I meant was that Apache may start to send the output to the browser while the PHP script is still running and actually waiting for Apache to finish, while Apache is waiting for the browser to receive the data, which in effect results in PHP waiting for the browser. In such cases 'echoing' is included in script's execution time. See my reply to Tom below.
Well, but in fact this is how PHP works - almost. I didn't say the PHP script waits every time after calling echo for the network activity to finish but it does so after the output buffer is filled. And the butter gets filled many times if the output is large - in such cases PHP waits for the network after sending the content of each buffer. Of course, I'm talking about default behaviour - zlib compression and the like will surely change how it works.
Okay, since you both sounded so convincing I had to really test it :). Here is my test script:
ini_set('zlib.output_compression', '0'); // on my server gzip is on by default so I turn it off
$str = str_repeat('A ', 3*1024*1024);
$start = microtime(true);
echo round(microtime(true) - $start, 3);
This effectively sends 6 MB of data to the browser. Opening the script in the browser results in measurement of about 25 seconds on my 2Mbit/s connection. Fetching the script with wget from the same server results in 0.142 s. measurement. So echo CAN take a lot of time waiting for the browser to receive the data.
cpradio — 2013-10-07T19:48:04-04:00 — #8
I get strange results with that test....
Output of microtime 0.063
Command Line (php buffer.php):
Output of microtime 0.576
Command Line (php buffer.php > test.out):
Output of microtime 0.01
wget (wget http://localhost/buffer.php | cat buffer.php):
Output of microtime 0.032
Command line is 9 times slower (probably because it is using stdout)!
I'm still not convinced that what you say is happening, is truly happening... so I found that if you put %D in your apache LogFormat, it tells you how long the request took in microseconds! COOL!
So let's see what apache says for wget http://localhost/buffer.php:
Output from cat buffer.php: 0.031
Output from tail access.log (microseconds in bold):
::1 - - [07/Oct/2013:19:37:48 -0400] "GET /buffer.php HTTP/1.1" 200 6291739 "-" "Wget/1.13.4 (linux-gnu)" 0/56520
Seems the request via apache far exceeds what PHP states and more so lines up with php from the command line.
Okay, let's increase this:
$str = str_repeat('A ', 10*1024*1024);
php from the command line (stdout): 1.898
php from the command line (to file): 0.032 (time php buffer.php > test.out reports 0.260)
apache access.log: ::1 - - [07/Oct/2013:19:42:19 -0400] "GET /buffer.php HTTP/1.1" 200 20971804 "-" "Wget/1.13.4 (linux-gnu)" 0/108153
browser (Chrome): 0.217
apache access.log: ::1 - - [07/Oct/2013:19:43:16 -0400] "GET /buffer.php HTTP/1.1" 200 20703 "http://localhost:12710/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.76 Safari/537.36" 0/237888
As you can see, Apache, wget, and the command line all have additional overhead. The command line actually has the overhead "built in", which is strange and it has a bit more overhead when using time php buffer.php; but it literally is 1.853 versus 1.941 so it is hardly measurable. I still believe this to be due to using stdout (which my data seems to support).
Nonetheless, I still think you are not seeing anything from apache in your time output. I just can't fathom a scripting language knowing when apache is working or apache halting/pausing the script execution whilst it feeds a partial buffer output to the user. That just doesn't make sense.
lemon_juice — 2013-10-09T04:47:24-04:00 — #9
If you test this script in the browser make sure gzip is turned off, otherwise the echo will execute very fast because the output needs to be buffered and compressed first before sending it to the browser. I was able to reproduce the same behaviour (about 25 seconds) on 3 different Apache servers. Sometimes the echo executed in a fraction of a second like in your case so it was enough for me to disable gzip support in my browser and then it changed to 25 seconds. I believe apart from gzip there can be other optional mechanisms buffering the whole output and making echo fast, however this is not the default behaviour.
As you noticed wget and command line may have additional overhead depending on where you direct the output but even a small overhead proves my point that echo waits for the remote client to finish receiving output otherwise you would see the same fraction of a second regardless of how you invoke the script, it's just that sending data to command line on the same server is much faster then sending it to a remote browser.
In my server logs I also don't see the times including the echo but I believe it is because server logs report CPU time. Anyway, this doesn't change anything here.
So what am I seeing then? The script timed the echo to take 25 seconds, how else can you explain this? As far as I remember PHP has always worked like that - waiting for the browser to receive the next chunk of data. And yes, you can prevent the PHP script from finishing at all by pressing the stop button in your browser. Whether this makes sense or not is another subject, however it allows for progressive rendering of a webpage with a lot of content, at least to some extent. This behaviour is often overriden by the use of compression methods so we don't notice it much nowadays and in many cases it is not practical anyway.
Anyway, this discussion sidetracked to me trying to prove that the blue colour I see is blue while I'd like to hear some suggestions as to how to time the script without including outputting to network. I already know that zlib compression is a solution that works, however sometimes there are browsers/clients/bots that do not support gzip and in such cases I notice the network being added to the time.
cpradio — 2013-10-09T05:43:52-04:00 — #10
Sorry, I think my point wasn't clear, or I didn't accurately indicate my point. wget's microtime output from PHP was less than what Apache reported it to take in the logs. Thus the overhead was outside of PHP and was in Apache territory.
You'd have to add the %D to your LogFormats to see the execution time Apache records for your script (which includes its' sending of the output of your script to the client).
Honestly, I think you are seeing PHP fill its stream, request more memory allocation, filling it again, and repeat. That is costly and can easily happen with large output. The only solution to those (that I know of) is gzip or some other compression library for your output. At my old job we used .NET but we had one specific page that took 10 seconds to load when all of the others took 1-2 seconds. gzip dropped it down to 5 seconds, and the other 3 seconds were found to be related to poorly written code (it had nearly 600 SQL queries that I condensed down to 12).
If you profile your script, what does it show as the execution time? Versus the log in Apache? All of the tests I performed indicated that using wget, apache, or CLI with stdout, all produced additional overhead above the microtime reported by the script itself, so their handling of the output added on to the scripts execution time. I really think you are seeing the stream size fill, expand, and then fill again. That would create a "pause" effect because PHP can't write any more output until the stream size is increased. If it is doing that repeatedly, than that makes it take even longer. But I honestly still can't fathom this being an Apache issue. Do you have access to nginix or lightHttpd? Does it happen there too?
lemon_juice — 2013-10-09T06:23:09-04:00 — #11
Ok, I see now. Yes, some additional overhead outside of PHP can be expected.
Unfortunately, I don't have root access to the server so I'm not able to change the log format. There is execution time in the log but the sending output is not included there so it must be a different log format.
I could try profiling the code but I don't think it would make sense because there are certain signs that prove 100% the echo time to be dependent on the network connection. PHP filling its stream and requesting memory allocation does not sound possible - well, it's possible it happens but why so long? Processing 6MB of data in 25 seconds is just incredibly slow, I might believe it if it was 20 years ago. Besides, why would turning on gzip suddenly shorten it to 0.0xx seconds? 25 seconds is just the time it takes to download 6MB over 2 Mbit/s connection plus some minimal overhead, and this is exactly what is happening here.
Also, the echo statement will execute much faster than 25 seconds if I press the stop button in my browser. In such a case PHP will detect the stop button and terminate the script, calling the shutdown function if there is any. When you move the final time calculation to the shutdown function and save the result to a file you can actually test it and see that the script execution time depends on the network and can be stopped at any time by the remote client. Then any statement after the echo will not be executed at all. So it is NOT only a pause effect - there are real pauses happening and also real script termination caused by the remote client. And this can be all measured by microtime() in PHP.
I don't have nginx or lighthttpd so I can't test it. But this behaviour certainly depends on the server, it's even mentioned here. Of course, this page is about flush() behaviour, however by default PHP automatically flushes output after the buffered is filled. And I don't notice this phenomenon on my windows testing computer, which is in line with what the PHP documentation says (Several servers, especially on Win32, will still buffer the output from your script until it terminates before transmitting the results to the browser.).
Edit: I cannot be sure whether it's PHP or Apache that auto-flushes output to the browser in chunks. However, this is what is happening for all practical purposes.