Benchmarking Class

I’ve been dealing with a lot of long running scripts lately, so I started writing up a class to handle my benchmarking. Its not complete yet but you can see where I’m headed with it. Would you structure it any differently?


class benchmark {
    private $startTime;
    private $endTime;
    private $totalOps;
    
    private $opsCompleted;
    private $intervalOpsCompleted;
    
    private $benchInterval; //how far back in seconds the bench should look to calc
    private $lastBench; //time last bench was calculated
    private $benchResults;
      
    function __construct($totalOps) {
        $this->opsCompleted = 0;
        $this->intervalOpsCompleted = 0;
        $this->benchInterval = 60;
        $this->benchResults = array();
        
        if($totalOps) {
            $this->totalOps = $totalOps;
        }
    }
    
    public function startBench() {
        if(!isset($this->startTime)) {
            $this->startTime = new DateTime("now");
            $this->lastBench = new DateTime("now");
        } else {
            throw new Exception("Bench has already been started in this object.");
        }
    }
    
    public function endBench() {
        if(!isset($this->endTime)) {
            $this->endTime = new DateTime("now");
        } else {
            throw new Exception("Bench has already been ended in this object.");
        }
    }
    
    public function op() {
        $this->opsCompleted++;
        $this->intervalOpsCompleted++;
        
        //if dif between now and lastBench >= benchInterval, add benchmark
        $now = new DateTime("now");
        $intervalDiff = $now->diff($this->lastBench);
        $intervalDiff = $intervalDiff->format('%s');
        if($intervalDiff > $this->benchInterval) {
            $this->lastBench = new DateTime("now");
            $this->intervalOpsCompleted = 0;
            array_push($this->benchResults, $this->getStatus());
        }
        
        return $this->getStatus();
    }
    
    public function getStatus() {
        $now = new DateTime("now");
        //calculate overall ops per second
        $overallDiff = $now->diff($this->startTime);
        if($overallDiff->format('%s') == 0) {
            $overallOpsPerSecond = 0;
        } else {
           $overallOpsPerSecond = $this->opsCompleted / $overallDiff->format('%s');
        }
        
        //calculate interval ops per second
        $intervalDiff = $now->diff($this->lastBench);
        if($intervalDiff->format('%s') == 0) {
            $intervalOpsPerSecond = 0;
        } else {
           $intervalOpsPerSecond = $this->opsCompleted / $intervalDiff->format('%s');
        }
        return array(
            "startTime" => $this->startTime,
            "endTime" => $this->endTime,
            "elapsedTime" => $overallDiff,
            "opsCompleted" => $this->opsCompleted,
            "overallOpsPerSecond" => $overallOpsPerSecond,
            "benchInterval" => $this->benchInterval,
            "intervalOpsCompleted" => $this->intervalOpsCompleted,
            "intervallOpsPerSecond" => $intervalOpsPerSecond
        );
    }
}

EDIT: Whats the best way to handle optional arguments, often seen passed in arrays in the newer PHP classes. Currently I have an unused $totalOps arguement which spits out an ugly warning when not supplied.

function __construct($totalOps = null)

For large number of arguments arrays could be used for clarity, because when there are many of them it becomes hard to remember the right order.

As to the class it looks fine but it would also be good to see a simple sample of how it is going to be used. I can see that you are using a lot of DateTime object manipulations - my own experience with them is that they are SLOW… Not slow in the sense that you have to worry about using them a few times but creating new DateTime objects in a loop can be troublesome. I once was building a forum where each post date was converted from UTC to user’s time zone and when there were more than 100 comments I noticed overhead of more than a second (I don’t remember exact time but it was noticeable). What helped was to store one DateTimeZone object in a static variable and reuse it for every conversion. If you are going to time only long running scripts it’s not going to be a problem but if you want to time many short iterations of code then your benchmarking class will add significant overhead and skew the results - then I’d suggest switching to simple time()/microtime().

Yeah, I can confirm the datetime thing too.

About a year ago I had a script that was running very slowly. I was doing something similar to what Lemon Juice described above (I can’t remember the exact details now, but I do remember it involved a loop that created datetimes for some entities). I wondered why it was running so slowly so used kcachegrind to analyse the performance (btw - you may want to check out kcachegrind for benchmarking - it’s a bit of a pain to setup, but its AWESOME), and it revealed that the datetime creation was a major bottleneck. In normal usage you might not notice it so much, but it is worth baring in mind.

Edit: check out this video of kcachegrind in action: https://www.youtube.com/watch?v=8hGHnU8lKFY

First thing I would do is change DateTime to microtime(true). Since the whole point is to measure elapsed time, may as well do it as accurately as possible. And since PHP now supports closures, you could even pass callback functions that are to be measured.

Ideally, I think I would want a benchmarking class to work something like this:

$suite = new BenchmarkSuite('Echo vs Print');

$suite->add(new Benchmark('echo', function () {
    echo 'Hello, World!';
}));

$suite->add(new Benchmark('print', function () {
    print 'Hello, World!';
}));

// the onCycle event could be invoked after each benchmark
$suite->onCycle(function (Benchmark $benchmark) {
    // using a Benchmark object in a string context could call the magic method __toString,
    // which could use the internal data to create a useful representation of the results.
    // example: echo x 451,106 ops/sec ±0.36% (95 runs sampled)
    fwrite($logFileHandle, $benchmark);
});

Not everythiing here is done as an object at first, so i will want to be able to use it like so:


$b = new benchmark();
while($result = $sql->fetch()) {
if(!isset($start) {
$b->startBench
$start = true;
}
// do come stuff with my result that's pretty intensive

$bench = $b->op

//display $bench on cmd line
}

So I’ve updated the class a little bit and wanted to get some input on something that I’m seeing:


<?php
class benchmark {
    private $startTime;
    private $endTime;
    private $totalOps;
    
    private $opsCompleted;
    private $intervalOpsCompleted;
    
    private $benchInterval; //how far back in seconds the bench should look to calc
    private $lastBench; //time last bench was calculated
    private $benchResults;
    
    function __construct($args) {
        $this->opsCompleted = 0;
        $this->intervalOpsCompleted = 0;
        $this->benchInterval = 60;
        $this->benchResults = array();
        
        if(isset($args['totalOps'])) {
            $this->totalOps = $args['totalOps'];
        }
    }
    
    public function startBench() {
        if(!isset($this->startTime)) {
            $this->startTime = microtime(true);
            $this->lastBench = microtime(true);
        } else {
            throw new Exception("Bench has already been started in this object.");
        }
    }
    
    public function endBench() {
        if(!isset($this->endTime)) {
            $this->endTime = microtime(true);
        } else {
            throw new Exception("Bench has already been ended in this object.");
        }
    }
    
    public function op() {
        $this->opsCompleted++;
        $this->intervalOpsCompleted++;
        
        //if dif between now and lastBench >= benchInterval, add benchmark       
        if(microtime(true)-$this->lastBench > $this->benchInterval) {
            $this->lastBench = microtime(true);
            $this->intervalOpsCompleted = 0;
            //array_push($this->benchResults, $this->getStatus());
        }
        return $this->getStatus();
    }
    
    public function getStatus() {
        //calculate overall ops per second
        $overallDiff = microtime(true)-$this->startTime;
        if($overallDiff == 0) {
            $overallOpsPerSecond = 0;
        } else {
           $overallOpsPerSecond = $this->opsCompleted / $overallDiff;
        }
        
        //calculate interval ops per second
        $intDiff = microtime(true) - $this->lastBench;
        if($intDiff == 0) {
            $intervalOpsPerSecond = 0;
        } else {
           $intervalOpsPerSecond = $this->opsCompleted / $intDiff;
        }
        if(isset($endTime)) {
            $endTime = date('F j, Y, g:i a',$this->endTime);
        } else {
            $endTime = null;
        }
        return array(
            "startTime" => date('F j, Y, g:i a',$this->startTime),
            "endTime" => $endTime,
            "elapsedTime" => $overallDiff,
            "opsCompleted" => $this->opsCompleted,
            "overallOpsPerSecond" => $overallOpsPerSecond,
            "benchInterval" => $this->benchInterval,
            "intervalOpsCompleted" => $this->intervalOpsCompleted,
            "intervalOpsPerSecond" => $intervalOpsPerSecond
        );
    }
    
    public function printStatus() {
        $status = $this->getStatus();
        return "Ops Completed: $status[opsCompleted] | Overall Ops/Sec $status[overallOpsPerSecond] | Interval Ops/Sec $status[intervalOpsPerSecond]";
    }
}

It seems that when a new interval starts, the ops per second starts out pretty high and then works it’s way slowly down over the interval. So on a process that I’m usually seeing 450-500 a second, the new interval will start out around 1-2k a second and work its way down. Any idea why I’m seeing this behavior? I don’t think it represents an accurate benchmark.

Thank you very much for this, I wound up changing my designs to avoid DateTime anytime possible now. I am still working on my benchmarking class however. I feel its a good excercise and I believe I’ll get to the point where it will be more of a use to me than 3rd party tools can be.

You’re probably best off using Xdebug. This will give you a full profile of your code. Not only can you profile individual functions but you can catch problems you wouldn’t otherwise… i’ve often realised that some of my loops are going a little strange when noticing that methods are called far more than I was expecting. You can also get some proper statistical analysis and see exactly where bottlenecks are so you know where it’s worth focusing on optimisation and where it isn’t.

Kcachegrind uses output from xdebug to do its thing.

I might eventually throw xdebug on my machine, but I still need a class that can be used in production for large batch monitoring.

You don’t use xdebug? Oh man, make this a priority, and you really won’t regret it.

I use phpstorm as my IDE, and combined with an xdebug plugin for firefox, I’m now able to set breakpoints in my code and I can literally inspect what happens to every variable on a line by line (step through) basis. You can follow the whole flow of your program and see how all variables, arrays and objects change at each line of code. Without xdebug I’d be lost half the time. Seriously, make this a priority to get setup and running, and once you get there you’ll never turn back :slight_smile:

+1, and it would be very useful in debugging your benchmark code only adding to the irony as once you are done with debugging, you could profile your benchmark code that is there to profile your actual code… :slight_smile:

Hey man, I heard you like profiling, so I’ve installed a profiler to profile your profiler :slight_smile:

Lol. I might be taking it a little far. But eventually on prod scripts, I’ll set a minimum level to watch for. If it dips below that for x amount of time, or the ETA is outside of my max allowed, it will send out a notification so I can go beat people for slowing down my processes :slight_smile: