Storing Hierarchical Data in a Database

It’s a tricky thing, jerry - I’m actually using the tree as a table in my DB, then referencing other tables to contain the data for each tree row. Somehow, I’m thinking this’d be easier in XML. But hey! This way’s SO much more fun!

Sorry, I wasn’t very clear on that last comment - there are two separate tree structures in my DB, and some of the rows in each tree are tangentially connected to others. It’s a pain.

I am asking because [RDLDB [url=“http://rdldb.remiya.com/index.php?id=0_3”]doc] is hierarchy-related. And I have never felt the need of having right or left and still have never been wondering which child comes first. The first in is the first out (The first child is first, the second is second, the third is third, etc).
That is why I was wondering why would anyone use right and left. And why would anyone need so many computations.

In the quick view I took of RDLDB, you probably wouldn’t need right and left values. There are places where it’s appropriate, this may or may not be one.
The point of this whole exercise seems to have been a hypothetical romp down alternative lane. Rather than recursing your way down a tree structure to access subnodes, you can retrieve all subnodes at a single call via this modified tree recursion trick. That’s all it is - a trick.
There are calculations, and a number of hoops to be jumped through - the relative value is yet do be determined, by my research. I do see that there is an advantage to this - if nothing else, it changed the way I view tree structures. If you don’t need the ability to move nodes, don’t care about time spent in recursion calling a DB yet again for a node’s children, then this is totally irrelevant.
Just my 0.02

The adjacency list has some problems when data is duplicated when using names in place of id values. For example, the items:

fruit-green-apple-green

and

fruit-ripe-apple-green

may not produce expected results. This problem does not exist if row id’s are used instead of names.

Here is a comparison between retrievals on the two methods:

retrieve descendents:
left-right: SELECT title FROM tree WHERE lft > {left value} AND rgt < {right value}
adjacency list: SELECT title FROM tree WHERE ancestorId > {current id} AND ancestorId < ???

retrieve ancestors:
left-right: SELECT title FROM tree WHERE lft < {left value} AND rgt > {right value}
adjacency list: SELECT title FROM tree WHERE ancestorId < {current id} AND ancestorId > ???

The value in the left-right list is that there is not only a starting point - as in ascendency lists - but there is also a stopping point.

I don’t see how - in a simple search without joins - an adjacency list can find any relatives (outside of one level up or down) without multiple searches. It doesn’t matter if there is no use for the left-right structure beyond speeding up the search, since that is the whole goal. To use a hierarchical adjacency list may be better for insertions and human data reading, but it is not better for the most common website function - searching for data.

There is a disadvantage if there are more than a few updates. I do not think this could be used, for example, for storing chat-room messages.

Even still, the left-right can be improved by using the idea of pointers instead of actual numbers. That way, there would be no need for actual numbers to be updated. I am not quite sure if there is anything like this in MySQL.

Or else another table column could be used during insertions. The table would store all insertions along with their insertion row until a certain time, and then do all insertions at once. Still not dynamic, though.

Btw, one could also have a look at pear’s nested set class…

Your diagram with the arrows is much more understandable than JC’s “worm and tree” metaphor!

This is what i looking for thanks.

I’ve used for storing trees the adjacency list method, but after your article I will use the tree traversal method. I really like it, thanks!

Without reading any tutorials I came up with the adjancey method of storing trees. After reading this I’m going to have to change my method. Excellent tutorial even though it’s made my head hurt!

Will have to play with it a bit!

Cheers

Great article! Also, for those who want it, I needed to store the tree and keep everything alphabetized (I was storing category names in a hierarchy). To do so, I used the following (assume a POSTed form with a ‘Parent’ and ‘New Category’ field):

        //get the id of the parent to which we're going to add a child
        $parent = $_POST['parent'];
        $new = $_POST['new'];
        
        //this'll work on a where by ID eventually...
        $sql = "SELECT id, lft FROM categories WHERE name = '$parent'";
        echo "SQL: " . $sql . "&lt;br&gt;";
        $result = mysql_query($sql) or die(mysql_error());
        $row = mysql_fetch_assoc($result) or die(mysql_error());
        
        //get the left value of the parent to which we need to add a child
        $oldleft = $row['lft'];
        //and the id of the parent node
        $parentid = $row['id'];
        
        //before any updates occur, get the left value we're going to
        //use for the new node.
        $sql = " SELECT MIN(lft) as newleft FROM categories "
              ." WHERE (lft&gt;$oldleft AND name &gt; '$new' AND parent = $parentid)"
              ." OR (lft&gt;$oldleft AND parent != $parentid)";
        $result = mysql_query($sql) or die(mysql_error());
        $row = mysql_fetch_assoc($result) or die(mysql_error());
        
        //our new node's left value
        $newleft = $row['newleft'];
        
        //move all the existing nodes (right and left) that fall
        //after this guy up by 2. The comparisons for name and parentid
        //are needed to alphabetize everything, otherwise we could
        //just do the update where rgt&gt;$oldleft and be done.
        $sql = " UPDATE categories SET rgt=rgt+2 "
              ." WHERE "
              ." (rgt&gt;$oldleft AND name &gt; '$new' AND parent = $parentid) "
              ." OR "
              ." (rgt&gt;$oldleft AND parent != $parentid)";
        echo "SQL: " . $sql . "&lt;br&gt;";
        $result = mysql_query($sql) or die(mysql_error());
                                                
        $sql = " UPDATE categories SET lft=lft+2 "
              ." WHERE "
              ." (lft&gt;$oldleft AND name &gt; '$new' AND parent = $parentid) "
              ." OR "
              ." (lft&gt;$oldleft AND parent != $parentid)";
        echo "SQL: " . $sql . "&lt;br&gt;";
        $result = mysql_query($sql) or die(mysql_error());

        //go ahead and put the new node in place
        $sql = " INSERT categories(name,lft,rgt,parent) "
              ." VALUES ('" . $new . "'," . ($newleft) . "," . ($newleft+1) . ",$parentid)";
        echo "SQL: " . $sql . "&lt;br&gt;";
        $result = mysql_query($sql) or die(mysql_error());  

You’ll notice the updates changed slightly, and we had to find for ourselves the left and right values of the new child node. Please note, I have done about 5 minutes of testing – if there are any bugs, feel free to post back…

simple question really…
How do i query only fruit and meat.

I just need to get the child of a certain tree. not the entire tree.

The only way i see was to include the parentID on the columns.
then again, i just want to query n’th decendants.

like from the picture, i want to query up to 2nd generation of food, which should not include cherry and banana.

any help is appriciated.

*im back again. stumping…

This article explains (in MySql, but i’m sure the logic for other dbs would be similar) how to do a number of things with the lft rgt system: http://dev.mysql.com/tech-resources/articles/hierarchical-data.html

You may get only every second level by requiring lft to be even or odd, but that’s it.

To get children only or do other depth-related queries you’ll need to maintain another column with depth.

What I did is implement the adjacency list method as a DB function. This way you still have its advantages (elegant design and code, only one update/insert) without the performance penalty: You need only one DB-query for any (sub-)tree - and it is the query overhead that sucks CPU cycles, not the function.
PostgreSQL was capable of recursive functions, not MySQL. This may have changed with the latest 5.x version, but I have not (yet) seen any examples online.
However, I don’t see why anyone should stay away from Postgres once he has digged into DBs that far.

My apologies if this has already been suggested earlier, but this thread is too long for me to go back looking through all the posts.

To store hierarchial data in a database I would recommend using The Nested Set Model. This link also has some very useful sql scripts on extracting hierarchy information from the database.

I have posted some sql scripts on how to move hierarchial nodes.

Very nice article. It was a good read. I found this PHP tree class that has add and delete features and it is pretty interesting. There is no documentation so you might have to dig into the code to find out how to use it.

http://www.husainshabbir.com/2005/11/10/modified-preorder-tree-traversal-php-code/

another possible schema for tree database is like this

tree_table(id, parent, leaf, depth)

The depth attribute is very handy for getting data.

We have developed a method for handling unlimited hierarchies without the use of primary keys but using the RDBMS backend. The response time is faster than any conventional datbase architecture would involve

Great article helps me a lot :slight_smile: