PHP Function of the Week - pg_insert - Moreover a lesson on PHP's largest pitfall

Michael_Morris1 · March 6, 2013, 4:39pm

[CENTER]Array of PHP functions shuffle and pop
Tell us which function we’ll study this stop…

[fphp]pg_insert[/fphp][/CENTER]

Uhm, no.

Ok, here’s the deal, PHP has a LOT of functions floating around, some dating back to PHP 3. Many are useful, some are dubious, and some are outright harmful to use in modern code because there are better options out there. pg_insert and it’s brethren in the pg_* family of functions (ah, the days before namespaces) are in that last group.

Use [fphp]PDO[/fphp]. But rather than flatly give that fiat let me explain what PDO is and why you should use it.

PDO was introduced by PHP 5.2 - maybe a little before that - and was a PECL library for some time before being coopted to the PSL (PHP Standard Library). By itself it provides a uniform interface for access to database systems. An argument for PDO is not an argument against your favorite database engine - the whole point of PDO is that you can use it with your favorite database engine. For central library authors it means they can provide solutions without worrying overmuch about which database the end user decides to deploy.

PDO’s most powerful ability is the concept of prepared statements and bound data. Prepared statements help to avoid SQL injection hacks by sending the query data and the variables it is acting on in separate steps. Also prepared statements allow the database engine to preserve and reuse prepared statements.

Can PDO replicate pg_insert? No, but PNL\Database can, and it’s built on PDO. I’m going to present it now as a preview to the rest of that framework (Hey what can I say, it’s my pet) and as a way of presenting an implementation of PDO and an extension of it.

You may have used database classes before or even wrote one. After all, passing connection resource identifiers and query identifiers is a bit of a pain in the tail. PDO saves us the trouble of doing that, but it’s not without a few shortcomings and having some shortcuts to some of its more cryptic commands won’t hurt. PDO is divided into a database core class - PDO - and a statement class - PDO_Statement. To have a truly flexible extension of PDO we must address both.

We’ll begin with the code that extends PDO. The code that follows is PHP 5.4+ safe, so be careful with the bleeding edge there.


<?php
namespace PNL;

/**
 * PNL core database extends the PDO library.
 * @author Michael
 *
 */
class Database extends \\PDO {
	protected $tierCollator = null;
	protected $treeCollator = null;

One nice thing about namespaces - we don’t have to worry with obscure names for things. PDO is in PHP’s core namespace, so it really can’t have a straightforward name like Database without backward compatibility breaks. PNL\Database doesn’t have this problem. As for the collators and what they do - we’ll worry with them for another day or maybe later in the thread but I will say they are kinda awesome.

	
	/**
	 * CONSTRUCT. Convert the configuration array into a DSN and pass that down to
	 * PDO.
	 * @param array $config
	 */
	public function __construct( array $config ) {
		assert (
			((isset($config['database']) && $config['database']) ||
			(isset($config['dsn']) && $config['dsn'])) &&
			isset($config['user']) && isset($config['password'])
		);

[fphp]assert[/fphp] might be the most powerful underused function in PHP. Use it - a well placed assert can avoid hours of bug hunting. Here we are asserting that we have some sort of config.

	
		if (isset($config['dsn'])) {
			// If a DSN is set use it without question.
			parent::__construct($config['dsn'], $config['user'], $config['password']);
		} else {
			// Otherwise hash the vars we were given into a DSN and pass that.
			$params = array(
				'driver' => isset($config['driver']) ? $config['driver'] : 'mysql',
				'database' => $config['database'],
				'user' => $config['user'],
				'password' => $config['password'],
				'server' => isset($config['server']) ? $config['server'] : 'localhost',
				'port' => isset($config['port']) ? $config['port'] : '3306'
			);
		
			try {
				// Start underlying PDO library.
				parent::__construct("{$params['driver']}:dbname={$params['database']};host={$params['server']};port={$params['port']}",
					$params['user'],
					$params['password']
				);
			} catch ( \\PDOException $e ) {
				throw new ConnectivityException($e);
			}
		}

One criticism I have of PDO’s design is creating an instance of the class implicitly calls a connect request on the target database. In my mind that should be a separate step to make testing easier even if it makes end use a little harder. Constructors should get the object to a ready for use state, and that’s all they should do. Then again, the argument can be made that connecting is part of that ready for use state.

		
		// Set the error mode and the two most frequently used collators.
		$this->setAttribute(self::ATTR_ERRMODE, self::ERRMODE_EXCEPTION);
		$this->tierCollator = isset($config['tierCollator']) ? $config['tierCollator'] : __NAMESPACE__.'\\\\TierCollator';
		$this->treeCollator = isset($config['treeCollator']) ? $config['treeCollator'] : __NAMESPACE__.'\\\\TreeCollator';

PDO allows us to choose whether it kicks errors or throws exceptions when a query doesn’t parse. I prefer exceptions so that a try/catch approach can be used to the data handling. Our last statement in the construct is the trickiest.


		// Now set the standard behaviors of the PNL Framework
		$this->setAttribute(self::ATTR_STATEMENT_CLASS, array(isset($config['statement']) ? $config['statement'] : __NAMESPACE__.'\\\\Statement', array($this)));		
	}

The statment class PDO uses is a settable attribute. PNL goes further to make it a configurable one ($config ultimately comes from a config.ini file). If a statement isn’t specified we default to the framework’s own which will be covered later in this post. The final array is the arguments that will be passed to that statement. PNL’s Statement object receives a reference to the database that spawned it.

	
	public function __get( $var ) {
		if ($var == 'tierCollator') {
			return $this->tierCollator;
		} else if ($var == 'treeCollator') {
			return $this->treeCollator;
		} else {
			 trigger_error( "Database::__get: No Access to {$var} or it does not exist", E_USER_NOTICE);
				return null;
		}
	}

	public function __set ( $var, $val ) {
		if ($var == 'tierCollator' || $var == 'treeCollator') {
			if (!class_exists($val)) {
				throw new Exception("Collator must be set to a valid class");
			}
			$this->$var = $val;
		} else {
			 trigger_error( "Database::__set: No Access to {$var} or it does not exist", E_USER_NOTICE);
				return null;
		}
	}
}

PDO has no public variables. The collators can be fetched or set. Their own public properties and modes are accessed in this manner.

So much for the database class. There isn’t much here because PDO already does a lot and there isn’t much I can add to it to make it any better. Now to the statement object. Most of what I have to say about it is in the comment text.


<?php
namespace PNL;

/**
 * PNL core database extends the PDO library to make it more chaining friendly. The
 * result and results methods handle most of the more common fetch cases with PDO.
 * That said, all PDO functionality remains exposed.
 *
 * @author Michael
 *
 */
class Statement extends \\PDOStatement {
	
	/**
	 * The database object that created us.
	 * @var Database
	 */
	protected $db = null;
	
	protected $key = null;
	
	/**
	 * Protected is NOT an error here, the PDOStatement object is a PHP internal
	 * construct and for whatever reason it is protected instead of being public
	 * like all other PHP constructors. Thank you Zend Engine team - this makes
	 * it impossible to test this object without also testing the Database object.
	 *
	 * @param PDO $db
	 */
	protected function __construct( $db ) {
		$this->db = $db;
	}

	/**
	 * Replacement for the execute statement under most circumstances, but parse
	 * returns the statement object itself to allow for chaining.  Also, the
	 * input array does not have to have leading colons on the key names. If present
	 * they are left alone, but any string key without a lead colon will have one
	 * appended.
	 *
	 * @param array input parameters for query.
	 * @see PDOStatement::execute()
	 * @return this
	 */
	public function parse( $params = null ) {
		if (is_array($params) && !hasAllNumericKeys($params)) {
			$params = $this->prepareArray($params);
		}
		
		$this->execute( $params );
		
		return $this;
	}

The sharp eyed will not hasAllNumericKeys isn’t a PHP core function - it is a PNL core function. This is what is known as a dependency, and tracking these things is what keeps programmers up late at night coding something that mysteriously broke. So, as an aside let’s look at that function…


function hasAllNumericKeys( array $array = array() ) {
	foreach( array_keys($array) as $key) {
		if (!is_numeric($key)) {
			return false;
		}
	}
	
	return true;
}

Tiny little bugger. In a Class based world the trend has been in PHP to move all functions into classes. I believe that in a few corner cases that’s a mistake. This little function has nothing to it that calls out “I’m a database function” as opposed to “I’m a controller function”. It’s task as straightforward as its name, so it exists as a separate function from the class in the \PNL namespace where any class of the framework might reference it. Is this a dependency nightmare waiting to happen?

Answer - it could be. Indeed, that ‘call from anywhere approach’ that you can see a lot of in PHP 4 programs is what makes them so tangled. But there are ways to ensure testability with a function like this. First - it has no internal state. That is, it doesn’t remember anything from one call to the next about what it did. Second, it has no global references. global is the single most evil statement in PHP - and the use of global variables will in short order insure that a program becomes utterly untestable. Third, it returns what it returns and doesn’t affect anything else. The array it receives is not altered in any way. The function could perhaps take the array by reference, but even that’s a mistake because the PHP engine already preserves memory by deferring copying the value until an alteration will occur, which with this function, will not.

With these guidelines followed universal functions can be used. There won’t be a lot of them though - PHP already has most of these straightforward situations covered.

We now return to the statement object

	
	/**
	 * Prepares an array for use as a query parameter set.
	 *
	 * @param array $params
	 * @return array
	 */
	protected function prepareArray( $params ) {
		$return = array();
		
		// No pass by reference here because key values will be changing.
		foreach ( $params as $key => $value ) {

			if (!is_numeric($key) && strpos($key, ':') !== 0) {
				$key = ':'.$key;
			}
			
			if (is_array($value)) {
				throw DatabaseException('Array Values not permitted');
			}
			
			$return[$key] = $value;
		}

		return $return;
	}
	
	/**
	 * A chainable #closeCursor.
	 * @see PDOStatement::closeCursor()
	 * @return this
	 */
	public function closeQuery() {
		$this->closeCursor();
		return $this;
	}
	
	/**
	 * A chainable #bindValue;
	 * @see PDOStatement::bindValue()
	 * @return this
	 */
	public function bindVal($parameter, $value, $data_type = \\PDO::PARAM_STR ) {
		parent::bindValue($parameter, $value, $data_type);
		return $this;	
	}

Chainable means you can write code in a sentence form like this

$db->prepare($sql)->bindVal('col1', $col1)->bindVal('col2', $col2)->bindVal('col3', $col3)->results();

You’ll see this coding style a lot more often in Java and Java Script than PHP, but I believe that’s due to lack of support for the technique in frameworks and core code moreso than in comprehension problems. If anything, I find statement chains much easier to read than statement series.

	
	/**
	 * Bind an array to the statement.  If the keys are named you must use named
	 * placeholders in your statement.  If the keys are not named you must use
	 * question mark placeholders.

	 * @param array $array
	 */
	public function bindArray( $array ) {
		$array = $this->prepareArray($array);
		
		foreach ( $array as $key => $value ) {
			$this->bindValue( is_numeric($key) ? $key+1 : $key, $value, is_int($value) ? \\PDO::PARAM_INT : \\PDO::PARAM_STR );
		}
		
		return $this;
	}

This is the function that replicates what pg_insert can do in this manner…

$db->prepare("INSERT INTO myTable ( col1, col2, col3 ) VALUES ( :col1, :col2, col3 )")->bindArray($values)->parse();

And now to a couple of icing functions.


	/**
	 * Return a single value, or a single row, as determined by your query structure.
	 * @return string
	 */
	public function result() {
		return $this->columnCount() == 1 ?
			$this->fetchColumn() :
			$this->fetch( \\PDO::FETCH_ASSOC );
	}	
	
	/**
	 * The return of this function is influenced by your query structure.
	 * If your query only has one column of results, that column is returned.
	 * If your query has two columns of results, the first column is returned as the key
	 * and the second column is returned as the value.
	 * If there are three or more columns to your query the results are indexed by the first
	 * field of the query and then grouped on that field.
	 *
	 * @return array
	 */
	public function results() {
		return $this->columnCount() == 1 ?
			$this->fetchAll( \\PDO::FETCH_COLUMN ) :
			$this->fetchAll(\\PDO::FETCH_GROUP|\\PDO::FETCH_UNIQUE|(  $this->columnCount() == 2 ? \\PDO::FETCH_COLUMN : \\PDO::FETCH_ASSOC));
	}
}

Results isn’t without limitations. For the most part it assumes you’re going to be getting keyed results. If this isn’t the case then it’s appropriate to fall back to PDO’s fetch and fetchAll statements.
Not that there’s anything wrong with PDO’s fetch statements - but these aliases are quicker to read and use in most cases. There are two more functions in the class that deal with the collators, but as they are way outside of scope for a forum post, and since they are still buggy (which is the reason the framework hasn’t been released yet) I’ll omit them.

I hope all of this has been informative and useful.

Lemon_Juice · March 6, 2013, 11:47pm

I don’t see anything powerful in prepared statements in most use cases. And the concept of prepared statements doesn’t belong to PDO - it only adds a nice little convenience to be able to use named placeholders instead of ? characters.

I have tried prepared statements and to me they only make my code less readable. It’s still fairly acceptable when there are few bound parameters but if there are many of them so that the SQL statement takes more than one screen and then it is followed by the binding part which must match then the code becomes segmented. I must admit it is nice to see the clean and neat :vars in SQL but then we have to follow all those :vars with appropriate bindings - why make the code segmented? I find the dirty old-style string concatenation to be easier to maintain in the long run because of the linear nature of code - when I look at my SQL I can immediately see where each field value comes from. However, I don’t see anything powerful in binding values - this is messy at worst and an unnecessary alternative way of doing the same thing at best.

I think PDO promotes overuse of prepared statements. Their purpose is to increase performance when the same statement with different parameters needs to be executed many times so that the db server can cache the execution plan and then simply accept only values for that statement. Then binding values also makes sense because PHP doesn’t need to pass the values from variables - once they are bound there’s no need to pass anything. Other than that I think prepared statements provide no benefit and make the code longer and a little bit more complicated. They are not necessary to guarantee security as there are other ways to do it.

In fact I think pg_insert is a very nice function. I don’t mean to say that it should be used because it’s experimental but the concept is good: to simplify a very common db task. You can insert all values into a table with a single statement - isn’t that great? Technically, PDO is very good but alas it lacks methods to access the db easily and often adds complexity where it is not necessary. Instead, it tries to be like a java-style library where when you want to perform a simple task (like formatting a number) you need to go through instantiating 3 objects… PDO is nice to learn OOP on - but when I’m coding I’d rather send all my data to something like pg_insert and be done with it than prepare my statements and bind values. I would like to see (pg_)insert added to PDO! But I’m 99% certain this won’t happen

This is not a criticism of the code samples and explanations you provided - actually it’s a nice little tutorial. I"m just sharing my personal opinion on the matter. All this unnecessary hassle only in the name of security. There are so many inconveniences authorities throw on us in saying it’s for our ‘security’… But I digress :). Just learning to escape properly is all that is needed for security.

cpradio · March 7, 2013, 12:54am

I’ll be the first to admit, that I was dead set against PDO, then for my latest project I decided I’d bite the bullet and use PDO exclusively to put my money where my mouth is (so to speak). Surprisingly I found that I could abstract 98% of the Database interactions I needed to the point, the developer didn’t even know he was generating SQL statements behind the scenes. The OO nature of PDO allowed me to hide all but 2% of the SQL generation (the 2% was rare edge cases that likely won’t get used often). It took some time but in the end, I ended up with an API that felt natural and the owner of the project (who isn’t a programmer) can even write API calls to get the data he wants.

Now sure, you could do this with the typical mysql_* and mysqli_* functions and hide them within a class, but in essence you just created PDO in doing that. It was actually a very redeeming feeling for me to be able to see PDO in that light, as a way to further abstract your data access and database interaction rather than viewing it in the typical ways I used the mysql_* and mysqli_* functions. It gave me much more clarity as to what could be done when approached with a new mindset versus looking at just putting it in where my calls already exist mimicking what is already happening in a more verbose manner.

Just wanted to share that experience, as I found it enlightening. I kind of wish I could share the end result but the client loved it so much, he bought the rights to it from me, so I can’t. I’d have to reproduce it which would take a few months (plus it was heavily developed around their system and its structure – so it isn’t a one fit all solution).

Lemon_Juice · March 7, 2013, 7:18am

Yeah, abstracting SQL is another thing and I don’t doubt you can do great things with PDO. But let’s focus on the topic we have here - inserting new rows. Can you share some insight how you made database inserts nicely abstracted and in what respect PDO proved superior over mysqli? Of course, as far as you are allowed to

TomB · March 7, 2013, 9:45am

In my opinion, abstracting PDO and then still forcing people to use SQL is entirely redundant and pointless. You’re far, far, better off with a simple ORM that allows you to escape the mess of horribly non-OO inflexible SQL code in your application logic. And whether that ORM uses SQL (or PDO, or MySQLI) is irrelevant. pg_insert. Meh, this is better:


$user = $mapper->create();
$user->name = 'Tom';
$user->website = 'http://r.je';
$mapper->save($user);

Using an abstracted PDO within the mapper is equally pointless, the point of the abstraction of PDO is to make SQL simpler to write, but the mapper prevents you needing to write SQL in the first place.

cpradio · March 7, 2013, 10:20am

Yes, I was able to hide the SQL generation for inserts, updates, deletes, and so forth. Inserts were similar to what TomB describes, but not quite the same either.

Lemon_Juice · March 7, 2013, 10:31am

TomB, you are absolutely correct, a simple ORM is a very good option here and I also don’t think it matters in this case which db library is used behind the scenes - that’s why I’m wondering what benefits cpradio got from PDO as opposed to Mysqli. If his code for inserts is similar to this then I don’t really see any special role of PDO here (unless the goal is to target multiple database types, then it gets a bit easier).

My point is - even when you craft your own db abstraction class then somewhere on a deeper level the code needs to generate SQL. It can be done either by escaping values with a dedicated function or by using prepared statements. I simply have doubts that PDO’s prepared statements are powerful in anyway for this usage - just a different way of doing the same thing with more lines of code. And in this case it’s most probably used only once when coding the abstraction library and then we forget about it using the ready-made mechanisms.

cpradio · March 7, 2013, 10:40am

Lemon_Juice:

TomB, you are absolutely correct, a simple ORM is a very good option here and I also don’t think it matters in this case which db library is used behind the scenes - that’s why I’m wondering what benefits cpradio got from PDO as opposed to Mysqli. If his code for inserts is similar to this then I don’t really see any special role of PDO here (unless the goal is to target multiple database types, then it gets a bit easier).

My point is - even when you craft your own db abstraction class then somewhere on a deeper level the code needs to generate SQL. It can be done either by escaping values with a dedicated function or by using prepared statements. I simply have doubts that PDO’s prepared statements are powerful in anyway for this usage - just a different way of doing the same thing with more lines of code. And in this case it’s most probably used only once when coding the abstraction library and then we forget about it using the ready-made mechanisms.

For one, I got a lot of performance out of it since there was a major need for bulk inserts and bulk updates (where prepared statements helped a lot), then the ability to easily extract the values from the queries and auto bind them later provided a great deal to the abstraction, the user didn’t need to know the table structure, the query syntax, the names of the parameters to bind their values to, etc. They simply needed to know the API which hide just about everything. Barely anything was noticeable in a select fashion, as that data was usually cached and only ran once against the database for days up to a week.

I just found it far easier to build upon a library, such as, PDO easier from an abstraction and API standpoint than I ever did with the mysqli OO and procedural implementations (not sure why entirely, but it felt like it was more accessible to me). Sure you can accomplish it with mysqli, but it didn’t feel as natural to me to extend upon (that could also be because over the last few years, I’ve picked up on a lot more techniques I’ve learned from other languages and how they abstract their data access and so I was able to use those same ideas/implementations).