Towards A Better Active Record: retrieving related data objects from main object

Lemon_Juice · May 13, 2013, 10:36pm

I’m trying to improve my implementation of active record to make it easier for unit testing. So far I have done this to update a row, a classic example:


$account = AccountPeer::getByPK(12);
$account->access_date = time();
$account->save();

The problem with this is that the $account data object has dependency on the database in order to perform the save operation and the dependency is not injected but looked for. Additionally, the class needs to extend a base class with methods that do the actual insert/update. I have found an article The Problem With Active Record that explains the problem in detail and suggests a solution. So now I have converted the logic and I do this:


$accountTable = new AccountTable($db);  // (I may use DI in the future instead of new, but this is not important now)
$account = $accountTable->getByPK(12);
$account->access_date = time();
$accountTable->save($account);

Now the row data object ($account) is kept separate from the table object ($accountTable), I pass the database dependency to the table object and all is fine - at least to me this is a good separation according to this article by Miško Hevery - the data object is holding pure data and does not need any dependencies.

But now I want to add more features to get related objects from $account. Suppose each account belongs to a user so I may want to get the user. This is the most logical way to me:


$accountTable = new AccountTable($db);
$account = $accountTable->getByPK(12);

// $user is supposed to be an object corresponding to a row in the user table
$user = $account->getUser();

As you can see the account object needs to retrieve the user object. In order to do this it needs to contact the database but how can it do it if it has no knowledge of a database? I could do this:


$user = $account->getUser($db);

but it looks hackish. What would be the best way to achieve this? Should I break the principle of having a pure data object and inject database dependency into it? What would the getUser() method look like?


class Account {
  public function getUser() {
    $userTable = new UserTable($db);
    $user = $userTable->getByPK($this->user_id);
    return $user;
  }
}

as you can see I need the $db object here, how do I pass it in?

Lemon_Juice · May 14, 2013, 9:15am

So far the only solution I’ve come up with is using a dependency injection container to construct the required object (in my example above $db was the required object but now I’ve realized I don’t really need $db, I might be better off having UserTable object):


class Account {
  public function getUser() {
    $dic = new DIContainer;
    $userTable = $dic->create("UserTable");
    $user = $userTable->getByPK($this->user_id);
    return $user;
  }
}

When I’m thinking about how to unit test this method I could make my DI container accept a different configuration for testing so that I can mock $userTable if necessary. What do you think?

BTW, after some searching it turns out this is called Repository Pattern, so User/Account is the domain object (holding data) while UserTable/AccountTable is the repository which deals with saving and retrieving data. The problem I’m facing here is that for fetching foreign objects I need to have some link between the two in a way that’s testing-friendly.

TomB · May 14, 2013, 10:58am

Technically this is Datamapper Vs ActiveRecord.

Take a look here: http://www.sitepoint.com/forums/showthread.php?888279-Separation-of-database-access-between-objects-and-their-subordinates
and here: http://www.sitepoint.com/forums/showthread.php?687271-New-PHP-Data-Mapper-Library&p=4640186&viewfull=1#post4640186

Your proposed solution isn’t ideal as it still couples the account to the data source, and your application code should not be aware of the DI container. With a proper DataMapper you have the data object which contains the data, and the mapper which fetches the data and knows of the source. The advantage of this is that the data source can be changed and all the application logic still works correctly. I’ve given this example before but it’s quite good. One of the clients where I work wanted to use their internal stock control system to handle the products on their website, rather than having to maintain the products on them both. Rather than a messy import/export script or data conversion, we simply adjusted the mapper to connect to their stock control system rather than the database and all the existing logic for the shopping cart just worked.

The path you want to head down is one of separation of concerns. As with a couple of topics recently, the way to do this is to completely disconnect the data objects/application logic from the data source. Maintain proper encapsulation and this is fairly simple. If you look at the second link I posted, I proposed a solution which is somewhere between AR and DataMapper.

edit: To answer your specific question, you’d do this:


class Account { 
   private $userTable;

   public function __construct(UserTable $userTable) { 
        $this->userTable = $userTable;
    }

  public function getUser() { 
     $user = $this->userTable->getByPK($this->user_id); 
    return $user; 
  } 
}

Lemon_Juice · May 14, 2013, 1:22pm

I knew I was getting closer to Datamapper but didn’t realise this was it…

TomB:

Take a look here: http://www.sitepoint.com/forums/showthread.php?888279-Separation-of-database-access-between-objects-and-their-subordinates
and here: http://www.sitepoint.com/forums/showthread.php?687271-New-PHP-Data-Mapper-Library&p=4640186&viewfull=1#post4640186

Your proposed solution isn’t ideal as it still couples the account to the data source, and your application code should not be aware of the DI container. With a proper DataMapper you have the data object which contains the data, and the mapper which fetches the data and knows of the source. The advantage of this is that the data source can be changed and all the application logic still works correctly. I’ve given this example before but it’s quite good. One of the clients where I work wanted to use their internal stock control system to handle the products on their website, rather than having to maintain the products on them both. Rather than a messy import/export script or data conversion, we simply adjusted the mapper to connect to their stock control system rather than the database and all the existing logic for the shopping cart just worked.

The path you want to head down is one of separation of concerns. As with a couple of topics recently, the way to do this is to completely disconnect the data objects/application logic from the data source. Maintain proper encapsulation and this is fairly simple. If you look at the second link I posted, I proposed a solution which is somewhere between AR and DataMapper.

edit: To answer your specific question, you’d do this:
class Account { 
   private $userTable;

   public function __construct(UserTable $userTable) { 
        $this->userTable = $userTable;
    }

  public function getUser() { 
     $user = $this->userTable->getByPK($this->user_id); 
    return $user; 
  } 
}  

Thanks, I was banging my head against the wall because I thought the domain object should know nothing about the mapper because I’ve read that technically that is what pure Datamapper is. Now having read the other threads I see you have relaxed the rules slightly and allow the mapper to be passed into the domain object - if we can do this then this changes the whole thing.

But now if Account has a dependency how do we create a new Account? Now it’s not as easy as simply doing new Account():


$userTable = new UserTable($db);
$account = new Account($userTable);
$account->login = "Lemon";
$account->password = "123";
$account->user = $someUser;

$accountTable = new AccountTable($db);
$accountTable->save($account);

and if there were more related objects (not only User) then I’d have to add more arguments into the constructor and in order to handle that I’d need to create a DI container for that? Now the idea of keeping domain objects newable has to be abandoned, I suppose.

Also, according to your example I would also pass AccountTable to Account so that I could do save directly on the data object:


class Account { 
   private $userTable;
   private $accountTable;

   public function __construct(AccountTable $accountTable, UserTable $userTable) { 
        $this->userTable = $userTable;
        $this->accountTable = $AccountTable;
    }

  public function getUser() { 
     $user = $this->userTable->getByPK($this->user_id); 
    return $user; 
  }

  public function save() {
    $this->accountTable->save($this);    
  }
}

Is this only syntactic sugar for being able to do $account->save() or does it have some other implications?

TomB · May 14, 2013, 1:46pm

Thanks, I was banging my head against the wall because I thought the domain object should know nothing about the mapper because I’ve read that technically that is what pure Datamapper is. Now having read the other threads I see you have relaxed the rules slightly and allow the mapper to be passed into the domain object - if we can do this then this changes the whole thing.

Well, I’d make a generic DataMapper interface (Or whatever you’d like to call it) and type hint that rather than a specific mapper as it’s more flexible, but as you say, this technically goes against the DataMapper definition in its strictest sense. As with all these things, how much this is a problem is dependent on the project at hand. Once you start adding features beyond simple save() methods such as relationships, you almost certainly need to extend your domain objects from a base class, which really disqualifies it from being a DataMapper in its purest sense. However, as Lachlan and I were contemplating in that thread, the fact that a domain object knows that it can be persistent isn’t a huge problem as long as it doesn’t know how the persistence layer works. There’s nothing wrong with it knowing it’s going to be stored, but it shouldn’t know that it’s going to be stored in a database.

As was stated in the thread I linked to, that implementation is somewhere between DataMapper and ActiveRecord. And in my opinion is a good compromise. Yes you lose some of the portability and flexibility a DataMapper offers; with a pure DataMapper implementation, the domain objects know nothing about the mapper, they don’t know they’re persistent objects and have better encapsulation but using a pure DataMapper implementation, you sacrifice a lot of useful AR features such inbuilt relationships (being able to do use $user->address->city; without going through another DataMapper in the client code or having to have mappers fetch entire object trees, for example).

It depends what you need from the project. From an OO purist and theoretical standpoint, that approach isn’t as flexible as it could be, from a practical perspective, however, the benefits outweigh the drawbacks by quite some margin. My main reasoning for using that middle-ground hybrid approach is that a pure DataMapper implementation requires far too much configuration if you want to fetch object trees.

Is this only syntactic sugar for being able to do $account->save() or does it have some other implications?

I wouldn’t ever pass two different mappers to a domain object for that purpose. Why do you have a user and an account table?

Jeff_Mott · May 14, 2013, 4:15pm

You may be interested in peeking under the hood of the Doctrine project. Doctrine version 1 is an active record ORM, and [URL=“http://docs.doctrine-project.org/projects/doctrine-orm/en/latest/tutorials/getting-started.html#what-is-doctrine”]Doctrine version 2 is a data mapper ORM. In particular, for fetching related objects, you can see how [URL=“http://docs.doctrine-project.org/projects/doctrine-orm/en/latest/reference/advanced-configuration.html#proxy-objects”]Doctine approached that problem with proxy objects.

Lemon_Juice · May 14, 2013, 9:27pm

TomB:

Well, I’d make a generic DataMapper interface (Or whatever you’d like to call it) and type hint that rather than a specific mapper as it’s more flexible, but as you say, this technically goes against the DataMapper definition in its strictest sense. As with all these things, how much this is a problem is dependent on the project at hand. Once you start adding features beyond simple save() methods such as relationships, you almost certainly need to extend your domain objects from a base class, which really disqualifies it from being a DataMapper in its purest sense. However, as Lachlan and I were contemplating in that thread, the fact that a domain object knows that it can be persistent isn’t a huge problem as long as it doesn’t know how the persistence layer works. There’s nothing wrong with it knowing it’s going to be stored, but it shouldn’t know that it’s going to be stored in a database.

As was stated in the thread I linked to, that implementation is somewhere between DataMapper and ActiveRecord. And in my opinion is a good compromise. Yes you lose some of the portability and flexibility a DataMapper offers; with a pure DataMapper implementation, the domain objects know nothing about the mapper, they don’t know they’re persistent objects and have better encapsulation but using a pure DataMapper implementation, you sacrifice a lot of useful AR features such inbuilt relationships (being able to do use $user->address->city; without going through another DataMapper in the client code or having to have mappers fetch entire object trees, for example).

It depends what you need from the project. From an OO purist and theoretical standpoint, that approach isn’t as flexible as it could be, from a practical perspective, however, the benefits outweigh the drawbacks by quite some margin. My main reasoning for using that middle-ground hybrid approach is that a pure DataMapper implementation requires far too much configuration if you want to fetch object trees.

Yes, this compromise makes sense, I wouldn’t want to lose easy fetching of related data, either. I don’t really care about absolute purity of patterns as long as they do well what they do.

Sorry, my example is not the best, this is a one-to-many relationship, a system where a user can have many accounts (so every account has a user). So the question is which mapper/mappers should the domain object require?

an argument list of mappers corresponing to each foreign key? In our example there is only UserTable mapper but what if there were more foreign relationships? More arguments in the contructor?
a generic/root mapper that can handle all mapper operations?

Thanks, this might be a good idea but I think if I can get away without having proxy objects without problems then I’d prefer that solution (I admit I haven’t yet studied fully their advantages). However, Doctrine’s new domain objects are instantiated in a nice way using new:


$product = new Product();
$product->setName($newProductName);

$entityManager->persist($product);

This is clear and concise and I like it.

Tom, how would the same be accomplished in your mapper? Product would probably require mapper objects in the constructor due to relationships so how would you instantiate the object? Would you use a DI container to create the Product object with its proper dependencies?