Hierarchical data

markbrown4 · April 23, 2010, 2:51am

Am I right in thinking that this is a really bad idea for a tree with say 100,000 records because you need to update them all if you remove/add a item in the middle?

I’m thinking about the best way to store a Position hierarchy for a large organisation.

Cheers,

DaveMaxwell · April 23, 2010, 1:28pm

No, it wouldn’t be a bad idea. You only need to update the node in the middle

So if for example, you have the following structure:

[LIST]
[*]Company (1)

Division1 (2)
[LIST]
Sub Division 1 (3)
[LIST]
Employee 1 (4)
Employee 2 (5)
Employee 3 (6)
[/LIST]
Sub Division 2 (7)
[LIST]
Employee 4 (8)
Employee 5 (9)
Employee 6 (10)
[/LIST]
Sub Division 3 (11)
[LIST]
Employee 7 (12)
Employee 8 (13)
Employee 9 (14)
[/LIST]
[/LIST]
Division2 (15)
[LIST]
Sub Division 4 (16)
[LIST]
Employee 10 (17)
Employee 11 (18)
Employee 12 (19)
[/LIST]
Sub Division 5 (20)
[LIST]
Employee 13 (21)
Employee 14 (22)
[/LIST]
[/LIST]
Division3 (23)
[LIST]
Sub Division 6 (24)
[LIST]
Employee 15 (25)
Employee 16 (26)
[/LIST]
Sub Division 7 (27)
[LIST]
Employee 17 (28)
Employee 18 (29)
[/LIST]
Sub Division 8 (30)
[LIST]
Employee 19 (31)
Employee 20 (32)
[/LIST]
Sub Division 9 (33)
[LIST]
Employee 21 (34)
Employee 22 (35)
[/LIST]
[/LIST]

[/LIST]And your table structure is something like this: Node (NodeID, NodeName, ParentNodeID)

So your data would be essentially (showing down through sub division2)
1, Company, 0
2, Division1, 1
3, Sub Division 1, 2
4, Employee 1, 3
5, Employee 2, 3
6, Employee 3, 3
7, Sub Division 2, 3
8, Employee 4, 7
9, Employee 5, 7
10, Employee 6, 7

Now, say they decide to re-organize and move sub-division 6 to Division 2, and the employees of Sub Division 9 are moving to Sub Division 8 and Sub Division 9 is being eliminated. You could do all this in three sql statements:


-- move the sub-division
UPDATE Node SET ParentNodeID = 15 WHERE NodeID = 24
 
-- move the employees
UPDATE Node SET ParentNodeID = 30 WHERE ParentNodeID = 33
 
-- delete the sub-division
DELETE FROM Node WHERE NodeID = 33

To me, that’s efficient. Is there a more efficient structure you’re thinking of? How would you structure it?

r937 · April 23, 2010, 4:48pm

dave, you’ve described how to maintain the adjacency list model (recognizable by the use of a “parentid” column) whereas the difficulties/inefficiencies associated with updating are usually mentioned for the nested set model (recognizable by the use of “left” and “right” columns)

the article linked to seems to dismiss the adjacency model in favour of the nested set model

i personally have never experienced the need to move away from the adjacency model

DaveMaxwell · April 23, 2010, 6:51pm

Ugh! I guess I should have read the whole article. It was from 2003, so I didn’t read it all that closely. Just skimmed the beginning of it…

OK, I’ve used both. The nested set model is one which in theory is very good and is theoretically more efficient (though I’d argue that theory as well based on experience), but in practical application is a royal pain to maintain for exactly the reason you listed. If you remove a branch or even move it, it becomes a nightmare. PLUS it’s harder for the human brain to wrap it’s mind around when it’s trying to ensure it’s working correctly. You see this used in taxonomies a lot, but other than that, I haven’t seen it used heavily.

Like Rudy said, I’ve not found a case where the adjacent list model isn’t easier to understand, implement and just slightly less efficient in terms of performance - though again, I’ve seen the adjacent model blow the nested set out of the water in testing.

markbrown4 · April 23, 2010, 9:49pm

Perfect, sorry Dave I should have clarified which model I was referring to.

The nested set model seems more and more impractical the bigger the table gets.

Thanks gents,