Separation of database access between objects and their subordinates

Here’s something I often ponder over: If an object contains other objects, which is responsible for grabbing database information for the child objects?

eg we have a Product class, which contains a $supplier of type Supplier class. If we instantiate the Product from an ID, would we also grab the details for its Supplier, or would we wait and see if the Supplier needs to be got? I’m talking in general here, as my system (and most system) are obviously much larger than indicated by this code snippet:

First off, the Supplier does need to be able to get it’s own information from the database:


class Supplier
{
	private $supplierID;
	public $name;//probably define some setters & getters instead of using public
	public $emailAddress;
	
	public __construct($supplierID = null)
	{
		if(is_numeric($supplierID))
		{
			$this->supplierID = $supplierID;
			$this->load();
		}
	}
	
	private function load()
	{
		$database_result = something_database_function("SELECT s.supplierID, s.supplierName, s.emailAddress FROM suppliers s  WHERE s.supplierID=".$this->supplierID);
		
		$this->supplierName = $database_result->supplierName;
		$this->emailAddress = $database_result->emailAddress;
	}
}

When we instantiate the Product from an ID, we could also grab the Supplier:


class Product
{
	private $productID;
	private $productName;
	private $supplier; //instance of Supplier class
	
	public __construct($arg = null)
	{
		if(is_numeric($arg))
		{
			$this->productID = $arg;
			$this->load();
		}
	}
	
	private function load()
	{
		$database_result = something_database_function("SELECT p.productID, p.supplierName, s.supplierID, s.supplierName, s.emailAddress FROM products p JOIN suppliers s ON p.supplierID = s.supplierID WHERE p.productID=".$this->productID);
		
		$this->productName = $database_result->productName;
		
		//create supplier
		//not that we could also allow the Supplier construct to accept these values. Or failing that, make some setter methods in the supplier
		$this->supplier = new Supplier();
		$this->supplier->supplierName = $database_result->supplierName;
		$this->supplier->emailAddress = $database_result->emailAddress;
	}

	public function getSupplier()
	{
		return $this->supplier;
	}
}

Or we could just load the product information and only grab the supplier information when needed:


class Product
{
	private $productID;
	private $name;
	private $supplier;
	private $_loaded_supplier = false;
	
	public __construct($arg = null)
	{
		if(is_numeric($arg))
		{
			$this->productID = $arg;
			$this->load();
		}
	}
	
	private function load()
	{
		$database_result = something_database_function("SELECT p.productID, p.supplierName, p.supplierID FROM products p WHERE p.productID=".$this->productID);
		
		$this->productName = $database_result->productName;
		$this->supplierID = $database_result->supplierID;
	}
	
	public function getSupplier()
	{
		if($!this->_loaded_supplier){
			$this->supplier = new Supplier($this->supplierID);
			$this->_loaded_supplier = true;
		}
		return $this->supplier;
	}
}

These classes are used in many parts of the system, so it’s hard to know now how they will be used. There’s also the situations where a list of (eg) products will be grabbed from the database with a single query, and Products being passed their instance data directly, rather than each one grabbing from the database. How would you handle a reference to a missing class property $product->some_property_we_havent_populated? Throw an error? return null? detect that this hasn’t been given the data and grab it on the fly?

Either is acceptable, or an entirely different approach can be used (see below).

All of what you listed is acceptable.

Now, I’ve always taken a different approach, is it right or wrong? Maybe. But I like it for a few reasons. First, my data access does not exist in my objects. I have separate classes for data access. Why? Because I like to try and make only 1 query to the database for a given action. Maybe this is my .NET background, maybe I’m just crazy (well at least a lot of us were crazy, as this is how we did it at my old job).

The first thing I did was identify the ways in which I needed to retrieve data for my objects. Viewing a Product page, viewing Search results, the Shopping Cart page, etc. What was easily obvious about all of these retrieval processes and each one differed from the other (just slightly, but they were definitely different). So I’d write a query for each, store them in their own Data Access class (or if they were similar enough, store them in the same class as two different functions/methods).

Then I would consider what indexes needed to be associated to the table based on these queries. To this end, I never had to worry about an object getting its data for a second time because the constructor went out and queried the data, or a property would go out and query the data. I knew exactly what data was being sent back for the given page/action/request because it had a dedicated data access for it.

Again, I’m not going to say one way is wrong and another is right, I and many others took a different approach and it worked and scaled very well for us. We also required that all connections be closed as soon as possible, so we didn’t permit passing connections from method to method or object to object, so you can see why the approach we took was favored over other ones. Connections are expensive, the queries once connected usually are not expensive (unless you have a poorly indexed table; or poorly written query).

If all the products are stored in the same table, and you add a new field, say “colour”, then someone tries to access $product->colour then if it is “not populated” but does exist, you will get a null value, or whatever the default value is in your table.colour

OR do you mean :

If all the products are stored in the same table, and someone tries to access $product->shape which does not exist you have to decide how to handle that, silently ignore it and carry on (returning those fields which do exist)? Throw an err/exception etc.

If your products can potentially be stored in multiple tables perhaps depending on product types, then the challenge is going to be slightly different.

If your “end-user” is another developer then the exception/error might be a suitable response, if the end-user is a non-developer via a website who may be filtering on data depending on a GET string (which can be fiddled with, or be an old url link) then you might want to silently ignore and carry on displaying what you can.

This seems to suggest that load() be a bit smarter and respond to $ids being an array:
eg.


$arr = array(23,99,876);
$ilelds = array('colour', 'shape');

$kettles = new Product($arr, $fields);
$kettles->load();

Or, create a loadMulti() method.

Then you are then possibly heading into the realms of creating an SQL query string builder.

Once you crack that, you can then refactor that code out into its own class which you can possibly use for both Supplier and Product and others.

Then, when you switch to using PDO, or sqlite or any other rdbms you only alter one class.

Making a generic SQL query string builder class which interacts with one single table is quite easy to do it probably doing the work of a Data Mapper.

The problem arises when you want to do JOINS – which you will be doing, one trick is to leave the door open so that you can create complex sql statements and just passThru($sql) to get the data you want.

If you know you are going to be doing a lot of JOINS then either forget about a query builder, or face the fact that your Product class will contain lots of methods like this:


function loadProductWithSupplier($ids, $fields = null){}

The best solution for your problem will, as ever, turn on your situation such as depending whether you are engineering a CRUD operation for managing data, or, whether you are simply creating simple and fast read-only data views.

cpradio, this sounds interesting, could you post a short example of the Data Access class you are talking about and how you would use it? Let’s take an example of data similar to what the OP presented: we have a Product class and each product has a Supplier and also has Accessories. Now I understand you say that the Product constructor fetches the data using the Data Access class, right? Let’s say we need to fetch Product data from the database in two different scenarios:

  1. Public web page, where we only need Product and its Accessories.
  2. Admin page where we need Product, Accessories and Supplier.

(also in each case different fields need to be fetched from the db)

How do you differentiate between these two cases so that you use sql queries optimized for the given task? Do you pass a parameter to the Product constructor? It would be great if you could illustrate that with sample code!

Sorry, that didn’t come out exactly the way I wanted it to (or at least my original intention is easily mis-understood). I meant to say, my objects constructors would not try and access the data access, though I guess they could if they knew which scenario you needed…

I can build something later today (hopefully), but here is the gist

Typically I would have three layers, My Objects, Business Layer, and Data Access Layer.

So given the above, the Data Access Layer would have either two classes or 1 class with two methods to retrieve the two scenarios described.
Scenario 1, would select the product and its accessories as a data set (result whatever) back to the Business Layer which would perform any validation and convert that data set to the appropriate objects and return them to the callee.

Scenario 2, would do the same calling a different method/class for both Business Layer and Data Access Layer resulting in the Product, Accessories and all Supplier information.

Granted given how closely similar these two are (assuming Supplier is a limited amount of data, name, website, description), I would tend to have Scenario 1 use the same methods/classes as Scenario 2 and drop Scenario 1 altogether, yes it would return more data than it needed, but it is more maintainable that way.

Now when the data is significantly different, Cart Products versus Production and Accessories, then the separation is necessary, as the Cart may only need the name, url, price, and stock of the product, and wouldn’t need the description, accessories, etc that may make a larger result set returned (or cause paging on the table).

Ok, there’s no hurry and thanks for the explanation :). What I’m most interested in is where and how exactly you call the Data Access methods depending on various scenarios. For example, if you have a getSupplier() method in your Product object (or maybe you don’t have and it’s something different?) then how does the getter know which scenario (eventual sql code) needs to be executed. You also mentioned you don’t have to worry about object getting its data for the second time - how is that achieved? I do it with a cache but your approach seems to not need one.

Thanks.

Yeah, I’ll work on that. In the mean time another helping answer, my objects don’t have methods that run any Data Access which is why I don’t have to worry about it. Hopefully in a few hours I can post some code that resembles this technique (granted I primarily have only used it in .NET, but I’m sure I can work up a PHP example).

Okay, here is the general idea (there is a lot of work I didn’t do, such as establishing a real connection or a lot of validation/refactoring for efficiency type stuff. Don’t focus on the fact that the methods/functions are static (that was just to reduce my coding a slight bit).

In essence, your Objects are more like DTOs (Data Transfer Objects). They may contain some useful utility functions for the item, but for the most part they are just a way to transfer that data from a controller to a view, or whatever path you may need to take.

Objects:

<?php
namespace Objects;

class Product
{
	private $productId;
	public function GetProductId()
	{
		return $this->productId;
	}
	public function SetProductId($value)
	{
		if (is_int($value))
		{
			$this->productId = $value;
		}
	}

	private $name;
	public function GetName()
	{
		return $this->name;
	}
	public function SetName($value)
	{
		$this->name = $value;
	}

	private $supplier;
	public function GetSupplier()
	{
		return $this->supplier;
	}
	public function SetSupplier($value)
	{
		$this->supplier = $value;
	}

	private $accessories;
	public function GetAccessories()
	{
		return $this->accessories;
	}
	public function SetAccessories($value)
	{
		$this->accessories = $value;
	}
	public function AddAccessory($accessoryId, $accessoryName)
	{
		if (!is_array($this->accessories))
		{
			$this->accessories = array();
		}

		$this->accessories[] = new Accessory($accessoryId, $accessoryName);
	}
}

class Accessory
{
	public $accessory_id;
	public $name;

	public function __construct($accessory_id, $name)
	{
		$this->accessory_id = $accessory_id;
		$this->name = $name;
	}
}

class Supplier
{
	public $supplier_id;
	public $name;

	public function __construct($supplier_id, $name)
	{
		$this->supplier_id = $supplier_id;
		$this->name = $name;
	}
}
?>

Business Layer: (you will notice I am constantly overwriting the product id, name, supplier id and name, this is because I was lazy in my example)

<?php
namespace Business;

class Product
{
	public static function GetProductInfoAndAccessories($productId)
	{
		if (is_int($productId))
		{
			$results = \\DataAccess\\Product::GetProductInfoAndAccessories($productId);
			if ($results != null)
			{
				// process results into object
				$product = new \\Objects\\Product();
				foreach ($results as $row)
				{
					$product->SetProductId($row['product_id']);
					$product->SetName($row['product_name']);
					$product->AddAccessory($row['accessory_id'], $row['accessory_name']);
				}

				return $product;
			}

			return null; // or throw an exception
		}
		else
		{
			throw new Exception("Must provide a valid product id");
		}
	}

	public static function GetProductInfoAccessoriesAndSupplier($productId)
	{
		if (is_int($productId))
		{
			$results = \\DataAccess\\Product::GetProductInfoAccessoriesAndSupplier($productId);
			if ($results != null)
			{
				// process results into object
				$product = new \\Objects\\Product();
				foreach ($results as $row)
				{
					$product->SetProductId($row['product_id']);
					$product->SetName($row['product_name']);
					$product->AddAccessory($row['accessory_id'], $row['accessory_name']);
					$product->SetSupplier(new \\Objects\\Supplier($row['supplier_id'], $row['supplier_name']));
				}

				return $product;
			}

			return null; // or throw an exception
		}
		else
		{
			throw new Exception("Must provide a valid product id");
		}
	}
}
?>

Data Access: (the Base would typically have a Close function as well to destroy/unset the connection variable)

<?php
namespace DataAccess;

class Product extends Base
{
	public static function GetProductInfoAndAccessories($productId)
	{
		self::GetConnection()->prepare("SELECT p.product_id, p.name AS product_name, a.accessory_id, a.name AS accessory_name 
                  FROM Products AS p LEFT JOIN Accessories AS A ON p.product_id = a.product_id WHERE p.product_id = ?");
		return self::GetConnection()->execute(array($productId));
	}

	public static function GetProductInfoAccessoriesAndSupplier($productId)
	{
		self::GetConnection()->prepare("SELECT p.product_id, p.name AS product_name, a.accessory_id, a.name AS accessory_name, s.supplier_id, s.name AS supplier_name 
                  FROM Products AS p LEFT JOIN Accessories AS A ON p.product_id = a.product_id LEFT JOIN Suppliers AS S on p.product_id = s.product_id WHERE p.product_id = ?");
		return self::GetConnection()->execute(array($productId));
	}
}

class Base
{
	private static $connection;
	public static function GetConnection()
	{
		if (self::$connection === null)
		{
			self::$connection = ... // define a new connection
		}

		return self::$connection;
	}
}
?>

Page needing Product and Accessories (don’t forget, Business\Product validates/filters the $_GET item

$productInfo = \\Business\\Product::GetProductInfoAndAccessories($_GET['product_id']);
// Do whatever you need to do with $productInfo; pass it to your view, whatever, it is your DTO

Page needing Product, Accessories, and Suppliers

$productInfo = \\Business\\Product::GetProductInfoAccessoriesAndSuppliers($_GET['product_id']);
// Do whatever you need to do with $productInfo; pass it to your view, whatever, it is your DTO

Again, not saying this is a right or wrong way, just a different approach that can/did work well for the applications we maintained at my old job.

cpradio, thanks for your explanation, that was a very detailed and clear description. I can see similarities in your method to what I often do - my object structure is based on Propel ORM (some older version, now they may have refactored their code) where each object (\Object) has a corresponding ‘peer’ object (\Business) with static methods similar to yours. So the Product object is like your \Objects\Product and the ProductPeer object is like \Business\Product and there I can put methods like GetProductInfoAccessoriesAndSuppliers(). The major difference is I don’t use a separate Data Access class, I perform all of that in the peer (\Business) methods. This is convenient when writing the code because all fetching code is in one place but there is less clean separation of tasks - I don’t feel I’m missing this separation, however I haven’t tried it yet so I can’t say if that’s wouldn’t be beneficial. Certainly having a separate Data Access class would allow me to use the same methods for fetching data in different scenarios, for example when I don’t need to feed the data into objects and plain array would suffice.

The thing is that I can actually do the same thing as you in that I do not have to worry about fetching data for the second time except I didn’t always take advantage of this possibility probably because I wanted to do it too properly, in other words fetching as little as possible and then fetching more chunks of data on demand. For example, I first fetch the Product object that would contain data from the product table only and I don’t think at this stage what other related data I might need later. So whenever a method like GetSupplier() is called that method has to worry about fetching the data from the database. My reasoning was that if an object has GetSupplier() method then it needs to always work, no matter how the object was instantiated. In your case GetSupplier() will not work if you don’t call the appropriate method in your Business class so you need to think beforehand what data you will need and call the right business method. In an ideal world I wouldn’t need to think about it beforehand but I can see in practice it’s a good idea to do so at least from performance point of view - fitting relational database access into objects is not possible if you want both the [full] convenience of the objects and optimised database access.

Thanks, that was inspiring and I may now lean more towards the way you fetch objects since it makes sense and I can already do it in my current setup, it’s just a question of being aware of it and putting it to use.

BTW, are your objects writable? Can you fetch a Product object, change something in it and then call $product->save() to save the changes in the database? I’m asking because when you have specialised methods for fetching in your Business Layer and some of them don’t fetch all of the data then save() might not work properly because some fields in the object are empty and may be unintentionally deleted on saving. Sure, there are ways to deal with it but it makes the whole system more complicated.

And that is a valid point, I just dislike how chatty that scenario is, constantly trying to get data that could have been established once earlier. Think about it, you are not trying to get the data across pages, you are still trying to load the same freakin’ page but just realized you need data you didn’t have, so you go back out to the database to get it. In my opinion, this style lacks requirements/clarity as to what the application and each page in the application is supposed to do. It isn’t “wrong”, it just doesn’t seem efficient and could lead to a lot of chattiness. Just my very strong opinion :slight_smile:

Yes. You would could update the product properties and then pass it to a \Business\Products::Save($product) method (actually you can have a $product->save() that then calls \Business\Products::Save($product) too), the idea is your data goes to the Business layer first, the business layer validates all properties, etc.

You can also take it a step further and develop a “isDirty” indicator for each property (or a framework really that the \Objects\Product class could extend) that would track what properties were changed. Then you know what properties need to be updated/inserted/added/deleted, etc. (I haven’t developed this framework yet; though now it seems like a really neat challenge, so maybe in my spare time I might try it).

This is the biggest problem. The original question of Should a product fetch its own data? Probably not is the answer but that’s a case of ActiveRecord vs DataMapper which has been discussed many times before. Separation of concerns implies DataMapper is the better pattern (and I agree).

The problem as you say, comes with joins but your solution presents even more problems: The SoC benefits from DataMapper are lost as soon as you start giving products/supplies access to the database as it tightly couples them to it. Once a product is coupled to the database, any flexibility it once had is lost.

The answer is to handle joins outside the data objects. On the face of it, the solution is:


$product = $productMapper->findById(123);
$supplier = $supplierMapper->findById($product->supplierId);

However, this creates very brittle code due to lack of encapsulation and the need of repetition. Anywhere this code is reused it needs to be repeated. What if “supplierId” becomes supplier_id. Messy! This code is clearly an OO layer over the top of relational data.

In an ideal world you should be able to call $product->getSupplier() from anywhere and it work automatically, but at the same time keeping it so that $product does not have a direct dependency on the database.

I won’t go over old ground as there was a very good and in-depth discussion on how this can be achieved here: http://www.sitepoint.com/forums/showthread.php?687271-New-PHP-Data-Mapper-Library

In this other thread you present this example:

echo $user->orders[0]->items[0]->product->manufacturer->name;

It’s very convenient OOP but how many sql statements are executed under the hood? Also, how many orders are retrieved from the db and how many items of the first order are retrieved? Here we just need the first order and the first item of the first order.

Because it’s using offsetGet the implementation can do anything. In this case (assuming that it’s using a database) it would be able to run a query that used LIMIT 1 OFFSET $offset.

One SQL statement is used per mapper. However, this is often faster than joins.

The main point here, though, is that they don’t have to be rows in the DB, they could be coming from a web service. The users could be in a database and the products could be in an XML file or totally different database/server for instance. This offers far greater flexibility than tying all your data to a single source.

Aren’t you trying to solve a problem that has already been solved by the Doctrine 2 ORM? I would highly recommend reading the following articles:

The thing you really need to ask yourself is whether you are looking for a generic framework or just a series of patterns to follow. If you are looking to build a generic framework for handling this layer than you are probably better off using something like Doctrine. However, if you would like the code to be as straight-forward and efficient but as possible than a solution like cpradio presented is a good option. Really that solution is nothing more than a data mapper/orm with an extra layer for mapping the the result set to entities. Nothing is automated though like it would be in doctrine, which can be a huge advantage or disadvantage depending on different circumstances.

The problem with doctrine is it is very complex and a beast to understand and maintain. Powerful of course but it is huge. Not creating a generic automated mapper will always be more code efficient and readable though it requires writing everything from scratch or just about. I have nothing against doctrine myself just find it large and unnecessary at times.

Hey @aaarrrggh ;, have you had any personal experience with Doctrine 2 ORM? I’d like to hear about your personal experience with it (if any). This thread has a lot of potential to be a good resource for many different users, a personal story can easily help sell an ORM.

My personal experience with ORM’s and Frameworks is limited. I have never had the need to use them in PHP, and in the .NET world our company had a strict rule of not using third-party products unless your team had time to vet them (so we would end up creating our own without any bells and whistles, like I did in my prior example – first time I wrote that in PHP too).

When it comes down to it, I still like to roll my own versus using a third party solution for two reasons, 1) I get to learn a lot by having to re-invent the wheel, and 2) Most of the third party solutions I’ve researched using have a lot more to them than what I would need, so why bundle it with an already growing codebase? I still believe ORMs and third party Frameworks are good, for some, but not necessarily for everyone.

@hessodreamy ; Lots of food for thought here, if there is any one aspect that sounds interesting to you, please ask more questions on how it may be suitable to your project (or not suitable). Out of all of the replies here, there is a LOT of good information, different tactics, etc.

Have you checked out Doctrine 2? It’s very different to Doctrine 1. You actually use plain old php objects to map out your domain logic and these classes end up looking very similar to the classes we see in this thread. About the only significant difference is that you don’t have any sql in them (generally, although you can do this if you want to for some reason - although this kinda defeats the purpose of a DBAL). You use meta information in the class to map out the persistance layer, and from there it is dealt with for you. You can end up doing stuff like this:


$comments = $blogPost->getComments();

foreach ($comments as $comment){
 $comment->getText();
 $comment->getUsername();
}

Based on the relations you’ve set up in your own objects.

On top of this you don’t have complex issues to worry about, such as worrying about what will happen if say a comment is updated while you’re working on a blog post with related comments (all tracked automatically for you). It also does lazy loading, so it won’t load the comments in the above instance automatically unless you explicitly need to call the getComments() method.

It also has some exceptionally good stuff for database migrations, and is the first ORM I’ve ever used that works so well with any data source that I’ve actually used this facility in pracitice. In my current application, all of the unit testing is done in a sqlite database, while the production and staging server and my development environment all use mysql - I have had to change literally nothing in my objects at all to make this work (it’s literally a one line thing in a config file that basically says "if in the testing environment, load through sqlite instead).

Overall I’ve found it excellent, not particularly hard to get your head around and it produces massive benefits if you want to map your business domain logic to a persistance layer of any kind.

Here’s another article about the principle behind domain driven design: http://net.tutsplus.com/tutorials/domain-driven-design/

I would highly recommend learning this, because it will prevent you from re-inventing the wheel time and time again. Don’t mistake Doctrine 2 for Doctrine 1 or for Active Record based ORMS. This is on a whole other level, and I believe it solves the original poster’s problems.

Hey CP - yeah, I was using Doctrine 2 just today actually.

As I said in the above post - it’s really great, because you end up simply writing plain old php objects. The only thing that makes them different to standard php objects in anyway is that you attach certain meta tags to fields that helps the Doctrine 2 orm do it’s thing.

I may write something up here later to explain this more fully. Perhaps I could give you some examples from the stuff I’ve been doing recently. I can tell you I’ve been using Doctrine 2 inside the Zend Framework 1 based on this tutorial video that I found: http://www.zendcasts.com/unit-testing-doctrine-2-entities/2011/02/ (Even though the video is Entitled “Unit Testing Doctrine 2 Entities”, it goes beyond that - shows you how to set it up in ZF1, and also shows you how to set it up so you end up using a sqlite db for unit testing while maintaining a mysql database for production/development and staging).

I’m currently doing some work (as always), but in a few weeks I’ll have 2 weeks off due to paternity, so perhaps I could write up an example of something here for people to look at?

It’s really a great tool. It saves you so many headaches when it comes to setting up the legwork for your domain level logic.

That is a strange approach when you use SQL Server, as you say each connection is very expensive. But perhaps it worked well for the type of application you guys worked on.

On a side note, connections in MySQL is very cheap (compared to SQL Server) so this is not a worry here.

The problem with an ORM is that you sacrifice database/server speed/efficiency for the ability to write faster code. Since PHP also compile this data on each run, it also make an ORM less effective in this language compared to Java and C#.

Personally we prefer a plain DOA approach, where we have a class for each table in the database, and then separate methods depending on what data that is required and if it pull or insert, as this allow us to optimize the queries to the fullest. It also makes it very easy to pull a cache solution down on the queries as well, or change to a different database type.

I think the speed argument with regards to the ORM issue is overstated in many cases. Also, Doctrine 2 has methods for dealing with caching built in, so it can actually be far more efficient than writing standard mysql code.

I haven’t really experienced too many issues with regard to speed in practice. In general, when people try to write their own implementations for this sort of thing they tend to end up hitting brick walls and writing all kinds of messy hacks to deal with issues further down the line.

I think I may write up a mini tutorial at some point if I can find the time…