Can I determine how much data was transferred using php?

kreut · April 14, 2013, 2:39pm

Hello,

I use shared hosting with a limited number of bytes that can be transferred per month. I was hoping to replicate the figures using php’s filegetsize() function; every time a user lands on a page, I can get the file size and then add it to my database, and then sum the per page transfers. However, after thinking about it, I’m wondering if the computation is that straightforward. In particular, where would external css and js files fit into this? Aren’t both of these cached in a user’s browser? And if so, would the data from these files then just be transferred once (or until the cache is cleared)?

Thanks!

-Eric

rpkamp · April 14, 2013, 5:27pm

Do you have access to the server’s access logs? If so, the easiest way would be to just parse those.

If you don’t have access to those, indeed you can’t predict how many assets will be downloaded by the client because you can’t “see” from PHP whether the client already has it cached or not.

OR you need to run everything through PHP, even all static assets, and use If-Modified-Headers to respond with 403 if the file hasn’t changed. See an example here http://stackoverflow.com/questions/1038638/caching-image-requests-through-php-if-modified-since-not-being-sent. This does put extra strain on PHP though and will probably slow down your site a bit as well. I would only use this if all else fails.

Are the limits very tough that you need to know this? Doesn’t the hoster have a panel where you can see your data traffic, like cacti, awstats, etc?

kreut · April 14, 2013, 6:45pm

Thank you for the detailed response. Basically, I’m going to be running a site where User A and User B (representing different) will be assessed a fee based on how much traffic flows through their “section” of the site (there’s a potential for high volume). I was considering on running the website through Amazon’s cloud services to ensure scalability. Would the “parse the server logs” option be a possibility were I to run my website through them? (And, this would be the first time that I run a php application through Amazon, so this may be a super noob question…)

Avram · April 14, 2013, 10:33pm

If URL rewriting is not enabled on your website already, you could maybe rewrite everything (html, css, js, images, … but not php files) to custom PHP script that would cause server to load everything through that page. For example, if yourwebsite.com/index.html is requested, rewrite will load yourwebsite.com/serve.php?path=/index.html, then index.html requests /style.css and rewrite loads /serve.php?path=/style.css and so on… Then, serve.php would:

a) check if file exists and if so
b) read it’s size, add it to the database
c) set correct headers (cache control, content type: text/html, text/css, application/x-javascript, image/jpeg, image/png, …)
d) readfile(‘./’.trim($_GET[‘path’], ‘/’));
e) die;

Be careful around d) though, as someone could easily request /serve.php?path=/serve.php or another critical file, so you should always (double) check what is in the ?path=

kreut · April 14, 2013, 10:58pm

Your solution makes sense to me and seems pretty clever! However, wouldn’t viewing the server logs be a lot easier as suggested earlier by ScallioXTX?

Do you have access to the server’s access logs? If so, the easiest way would be to just parse those.

I now think that I can just use PHP to go through the log each night, finding which files were accessed, and sum up the bytes transferred directly from the log. I found this http://beginlinux.com/blog/2010/07/apache-web-server-logs/ to be a helpful start.

Avram · April 15, 2013, 8:23am

Of course you can do that, provided that you have permissions to read access log file. Note that this file can be huge sometimes, as it logs every server request

kreut · April 15, 2013, 12:38pm

I appreciate the additional input!