Loading collections of objects from another collection of objects

Well I am trying to use PDO to load a collection of comments/replies from a collection of posts, which becomes somewhat tricky. The data hierarchy may very well look like this: Post: 1(Comment: 1, 2, 3), Post 2(Comment 4, 5), Post 3(Comment 6, 7, 8, 9), in this case comment with ID 1, 2 and 3 belong to post ID 1.

I know this can very well lead to N+1 problem using ORM, so I am trying to figure out an alternative. Does it make sense to use SQL’s IN clause? Lets say if I load all the posts first, and then all the comments related to these posts. I can use concatenation to easily compile a string of IDs and write an SQL statement like this:

$postIDs = "(1, 2, 3)"; //the values are concatenated by looping from post IDs in a collection.
$sql = "SELECT * FROM comments WHERE post IN {$postIDs} ORDER BY dateposted desc";
$stmt = $pdo->prepare($sql);
$stmt->execute();
$comments = $stmt->fetchAll(PDO::FETCH_CLASS, "Comment");

So the question is, do you think this is a good way to load collections of objects with related collections of object? It does solve the N+1 problem, but how is the performance of IN clause as compared to a simple select by ID statement?

Out of curiosity…

What is the purpose of getting comments for multiple posts?

Scott

The purpose is to load comments for all the posts made by the user. On the user profile page, there is a tab called Posts & notes in which you can see the users posts. Other users can make comments to the users posts, therefore you have multiple comments for each post, and multiple posts for each user.

Ahh ok. A user profile posting system. Couldn’t the posts and comments just all be posts, like making the comments children posts of the actual posts? In other words, all posts and comments (child posts) are within the same profile_post table?

Scott

So… thoughts.

1: Dont select *.
2: Why are we selecting bulk in order to create a hierarchy of unlike things?
3: $comments = fetchall doesnt give you a hierarchy, it gives you a bunch of comments in their own thing. You’ve still got to process the $comments array to shard them out into the objects, which reintroduces your N loop. Unsurprisingly, in order to get an unassociated group of things attached to the things that they attach to is always an N loop. Because thats how it works. If you wanted a one-and-done, you’d need a non-SQL database.

In theory it’s possible, and in this way I will be able to just load all the posts from the user and attach child posts as a collection to each parent post. Or maybe, I will just not load any comments for the posts by default, and use AJAX to selectively load some comments when the user clicks on ‘Show Comment’ button under a post(since the user may not need to see all comments, only comments he/she is interested in a post). I will think about it, it may be a better idea in this case.

Well select * is just an illustration, since I am too lazy to write all the column names in the example code(it’s unnecessary for anyone to know the database structure of my site anyway). It’s not production code, there’s no need to worry.

I understand what you are saying, I do have to manually attach each comment to their containing post, which is not easy. But in this case, how do you write a single SQL statement that loads all comments for all posts for the given user? I know a JOIN statement may work, but I only know how to join data with one to one relationship(like to join User and UserProfile tables for the given user ID, in which a user has only one user profile). I have no idea how to join when the relationship becomes one to many.

You dont.

Here’s a thought though… if you’ve selected posts which the user has authored, why are you loading ALL comments from that post for display on the USER’s profile page?

If I dont, how do I avoid the N+1 problem?

Well this… Isnt it obvious, the other users browsing this user’s profile may want to see the comments below each post he/she makes. Have you used facebook? If yes, you should understand why. The user makes a post to his/her own profile page(actually other users can also post there, but it’s of little difference), and other people may like it, comment it or share it. Below each post made by the user, you see its comments and replies. Here is an example:

This screams to me of premature optimisation. MySQL is very fast at handling ID based lookups, I’m not sure N+1 is as big a problem as you’re making it out to be an as StarLion pointed out, even if you reduce the number of queries you still need the nested loop.

Okay so you’re not making a Profile, you’re making a Wall.

If you were making a USER profile… you’d only show things the USER made. Cause… yaknow, thats what the USER has done. Try looking at your own profile on this site.

How do you solve the N+1 problem? Theoretically, if Posts and Comments have the same fields, you could UNION the results together, but since the second query’s selects would depend on the first query, you’d have to subquery on the second query. Also sorting things would be impossible, so you’d still be walking through the results in a multiple way to construct the result… so… You dont solve the problem. It’s inherent to what you want to do.

And that’s not necessarily a bad thing - it’s simply what has to be done.

Well I am not quite sure. I do believe a running PHP loops are much faster than running MySQL Queries. I can limit the number of posts to display on user profile to be 15 with pagination, and restrict the number of comments to display for each post to be 5 initially(until user clicks on a button to trigger an AJAX request, which loads more comments for a given post this button binds to), then I will have a total of 90 records(15 posts and 75 comments). In this case, I will need a total of 15+1 = 16 trips to database if I do not optimize and solve the N+1 problem. I think, a for loop on 90 records is faster than 16 database trips, or am I wrong?

Well yeah it surely looks like a wall. Another example will be a group system, in which users who have joined the group can make posts to the group’s index page, and subsequently they can make multiple comments in the same post. I am sorry for the confusion about user profile though, didnt realize it makes things harder for you to understand.

My current idea to solve the N+1 problem is to load all posts first, and then comments for the posts using IN operator(and of course, limit the number of records to retrieve so it doesnt get out of hand). I will use PDO’s FETCH_GROUP mode to group comments by posts, therefore it will be easier to loop through the comment collection and add them one by one to post collection. Of course, at this point I have no idea how much time this will cost, but I find it safe to assume that running such a loop in PHP is still faster than going to database N+1 times.

I think this solution is perfectly fine, I’ve used it a few times and works well for limiting the number of queries. IN() is like a series of OR’s and MySQL will use the index so the performance will be very good. I know that some old MySQL versions were not optimized for OR but it was long time ago.

PHP alone will of course be faster, but with prepared statements the difference is going to be less, keep in mind that a query with joins will take longer to execute than a simple primary key based lookup, and that a resultset from a join will have more data (if you’ve selected columns from both tables, the data from the first table is repeated for every row of the joined table) so you get higher memory usage and transfer time between the database and PHP script.

Personally I’d benchmark it and see how it affects your particular usage.

How will you get the IDs that you’ll need to supply to IN without a trip to the database?

Which is doing the N loop…surprise! it still exists. You can hide it inside other functions, but it’s still being done.

I see, thats definitely a good point, I forgot the fact that each database trip also takes different time to execute, depending on the complexity of the query. I think its worth benchmarking, and maybe even to post an article about it at some point.

Well of course I will need a trip to the database, this cannot be avoided. But instead of going N+1 trips, I need only 2 trips. First trip to retrieve posts for a user, and second trip to retrieve comments for all posts with an IN operator.

Also, is your hierarchy enforced to a single-tier? Cause Facebook’s system isnt (You can even see that inside your example picture.)

To be more clear; from your picture:

SportsCenter makes a Post (ID = 1)
Harry Avila makes a Comment on PostID 1 (CommentId = 1)
SportsCenter makes a Comment in response (CommentId = 2).

What is the relation between CommentId 2 and CommentId 1?

Well it will be two-tier, kinda like the forum’s topic and post system(in my case, profile post = forum topic, and profile comment = forum post).

I’d bet money, Facebook’s comments come from a single comment table.

Scott

Maybe I’m not understanding the problem correctly, but is there a reason you can’t just do a join?

SELECT * FROM posts
LEFT JOIN comments on comments.post_id = posts.id

In terms of performance, this is about as fast as you’re going to get, if only for the reason that it gets everything in just one query, and trips to the database are expensive. But it will make more work for you in the PHP layer, because the post rows will appear duplicated, once for every associated comment.

Well I do think LEFT JOIN is a good idea, but the duplicate content from post rows concerns me. I was wondering though, if I just select post IDs rather than other post columns(aka title, author, content, dateposted, etc) from post table, and retaining all columns from comment table, is it gonna be a better solution?

//SQL to fetch posts
SELECT * FROM posts WHERE userid = :userid

//SQL to fetch comments
SELECT post.postid, comments.commentid, comments.userid, comments.message, comments.dateposted  FROM comments
LEFT JOIN comments on comments.postid = posts.postid 
WHERE post.userid = :userid 

In this case, I make a trip to the database to acquire all the posts first, and instantiate these post objects, assigning them to an identity map/collection indexed by post ID. Then I make a second trip to the database with your SQL statement, with only postID from post table and all columns from comment table. My Repository class will then use the identity map to relate each comment to its containing post with such lines subsequently when the comment object is being created:

//postid is the foreign key table in comment table, referencing the containing post for each comment record.
$post = $postCollection[$commentData['postid']];
$comment->setPost($post);
$post->addComment($comment);

Because the identity map stores post by ID, the time complexity for this look up is O(1), and the overall time complexity is O(n) since I have N comments to load still. The downside is that I have to go to database 2 times(still way better than N+1), and may need to manually construct the collection or identity map for post. But I feel its a small price to pay, as I expect there will be a lot more comments than posts. If I load duplicate post records, I will have to create each Post object for Comment object anyway, it may take more time than to fetch existing post object with given ID from a collection I load earlier. What do you think?