it isn’t really something you can explain to someone because procedural style programming ‘CAN’ do it. Oop doesn’t do it better but differently
I find that to be the case, indeed. Object composition as I described in my previous post is certainly possible with procedular programming, but it requires a lot of thought, and is almost counterintuitive. But it CAN be done. I also agree that OOP is a way of thinking. Myself, I find it extremely hard to write software in a procedural style, because I think in a somewhat object-oriented manner (and not just when it concerns programming).
Okay, let’s talk about databases ;).
The Database interface I gave in my previous post is actually more or less the interface (give or take a few methods) I use every day. As you can see, there is no way to process some kind of query result. However when I do this:
$database = new MyDatabase('localhost', 'test'); // MySQL implementation
$database->connect('username', 'password');
$result = $database->query('SELECT * FROM table');
That $result-thingie is not a simple variable, but it’s an instance of class QueryResult, which looks like this:
class QueryResult
{
// Create a new object; this is done by class Database
function QueryResult(&$database, $resultId) { }
// Clear the query result
function clear() { }
// Was the query executed successfully?
function isSuccess() { }
// Get the error message in case of an error
function getErrorMessage() { }
// Get the total number of rows in the result (for a SELECT)
function getRowCount() { }
// Get a specific row in the result
function getRow($index) { }
}
Given an object of class QueryResult, I can traverse all rows in a SELECT-query, and find out if the query executed okay. If I wanted to, I could add methods specific for INSERT-, DELETE- and UPDATE-queries (“How many rows were affected?”, “What is the last used insertion ID?”), but I haven’t had the need for that yet.
Now, classes Database and QueryResult are both interfaces, so they don’t do anything by themselves. If I want to use some specific database, I have to implement both classes, as in MyDatabase and MyQueryResult for MySQL, and PgDatabase and PgQueryResult for PostgreSQL.
The philosophy behind both interfaces is that every class should do what it does best. Class Database knows about connecting to databases and executing queries on it, so why bloat it with code to process the results from those queries? That’s a whole different ballgame, so that’s why I wrote the QueryResult interface. If you compare this to PEAR, you’ll see the following:
- Class DB_common (and thus classes DB_mysql, DB_pgsql, and so on) have many methods. It can be used to connect to databases, execute queries, process the results in these queries, and handle transactions (and that’s not all!).
- There is a class DB_result, but it’s not meant to be subclassed. When using this class to process query results, it internally calls the query handling methods in the database class.
Now what if I want to do something like this (and I do regularly):
// Select all books
$books = $database->execute('select * from book order by title');
// Select all authors
$authors = $database->execute('select * from author order by lastname');
// Handle both the $books and $authors QueryResults
Executing queries one after another and processing their results at a later time is easy and straightforward. More importantly, it requires no extra coding in the classes to make sure it works. Also, once I’ve got the results from some query in a variable ($authors or $books), I don’t need the Database object anymore! This is unlike PEAR, where class DB_* must be used to access the various rows in the result. Additional administration is required in the DB_*-classes to make sure rows from the right query are selected, and in my code I have to pass my database object around a lot, just to be able to process query results.
Given the query result $books above, I can print booktitles like this:
for ($i = 0; $i < $books->getRowCount(); $i++)
{
$row = $books->getRow($i);
echo $row['title'] . "<br>\
";
}
This loop is simple, and there is no reference to the database object whatsoever, which makes sense: what has a database connection to do with a result-set from a query?
In my previous post, I told about class Loop, which implements loops on iterators. Iterators are higher-level, generic objects. All iterators have the same interface, so I can use the same loop for arrays, strings, trees and query results. The iterator for QueryResult-objects looks like this:
class QueryIterator
{
var $queryResult;
var $current;
var $size;
function QueryIterator(&$queryResult)
{
$this->queryResult = &$queryResult;
$this->size = $queryResult->getRowCount();
$this->reset();
}
function reset()
{
$this->current = 0;
}
function next()
{
$this->current++;
}
function isValid()
{
return $this->current < $this->size;
}
function &getCurrent()
{
return $this->queryResult->getRow($this->current);
}
}
Again, you see that class QueryIterator doesn’t need the Database class to do its work. Not only does this make more sense, it also makes software more layered: I don’t need to pass my Database object to many objects or functions. Only the highest level code needs acccess to the database. The code at lower levels doesn’t, so they don’t need the Database object, so I don’t need to pass it.
To print book-titles, I can now write this:
for ($it = new QueryIterator($books); $it->isValid(); $it->next())
{
$row = &$it->getCurrent();
echo $row['title'] . "<br>\
";
}
In this case, this adds little or no benefit to traversing the query results directly from the QueryResult object. This changes if you take into account that I can now layer my code even more: by creating an iterator and passing this on to code at a lower level, that level needs to know nothing about the kind of data is traversing. More importantly, as all iterators have the same interface, I can reuse that code over and over again. A perfect example is class Loop:
class Loop
{
// static member!
function run(&$iterator, &$manipulator)
{
$index = 0;
$iterator->reset();
if ($iterator->isValid())
{
$manipulator->prepare();
}
for ( ; $iterator->isValid(); $iterator->next())
{
$current = &$iterator->getCurrent();
if ($index)
{
$manipulator->between($index);
}
$manipulator->current($current, $index++);
}
if ($index)
{
$manipulator->finish($index);
}
return $index;
}
}
The one and only method ‘run’ is extremely simple. It gets passed an iterator and implements the iteration loop, so that I don’t have to write it ever again. At certain steps in the iteration, it calls methods on a manipulator, that must be passed as the second argument. The manipulator must implement these four methods:
- prepare(): is called right before the first item in the iteration is processed
- between(): is passed in between every two items, thus not before the first or after the last
- current(): is called for every item
- finish(): is called right after the last item in the iteration is processed
If I were to print book-titles, I now have to write a manipulator. To make life easy, I can write this manipulator as a subclass of class LoopManipulator, that implements all four required methods as empty methods. For example:
class BookTitlePrinter extends LoopManipulator
{
function current(&$row, $index)
{
echo $row['title'], "<br>\
";
}
}
Now why is this better than the simple loop I had before? I’ll get into that later, after I’ve shown the complete code for printing booktitles:
// Layer 1: create a database connection
$database = new MyDatabase('localhost', 'test');
$database->connect('username', 'password');
// Layer 2: select the books from the database
$result = $database->query('select * from book order by title');
// Layer 3: set up the iterator and the manipulator
$it = new QueryIterator($result);
$manip = new BookTitlePrinter;
// Layer 4: print the book titles
Loop::run($it, $manip);
All layers are shown in one block of code, but in large systems they typically aren’t. Note that all code in layer ‘n’ only requires access to code in layer ‘n - 1’.
If you examine the code above more closely, you’ll see that the variables $result, $it and $manip are all temporary: once the loop is completed, they aren’t needed any more. The question then is: why create them at all? Thanks to the design of the various classes I can write the code above like this:
$database = new MyDatabase('localhost', 'test');
$database->connect('username', 'password');
Loop::run(
new QueryIterator($database->query('select * from book order by title')),
new BookTitlePrinter
);
No more temporary variables, and simpler code as a result. On the other hand, this only makes sense if all 4 layers are placed at the same part of the code. Anyway, it does show how the same classes can be used to write layered code as well as compact code.
To answer the question “Why is using manipulators a good idea?”:
- There is now a clear separation between the algorithm (the loop) and the behavior of the algorithm (printing booktitles). In other words: object composition.
- The algorithm (the loop) is implemented just once in a generic way, instead of many times, specialized for specific problems.
- As the manipulator is a class, I can reuse it by writing subclassing or wrapping.
- An intelligently written manipulator can be reused for other loops.
- Layering is supported ‘out of the box’
To end this post, I’d like to remind you that the example I’ve used here is pretty simple, and therefore may lead you to think that the various classes are bit ‘over the top’. In larger systems they certainly aren’t, I can vouch for that! Also, by using a set of compact and efficient classes instead of large, bloated ones (PEAR), I find no impact in execution speed whatsoever. But that’s just my experience of course
Vincent