What are the options for storing hierarchical data in a relational database?
Good Overviews
Generally speaking you're making a decision between fast read times (for example, nested set) or fast write times (adjacency list). Usually you end up with a combination of the options below that best fit your needs. The following provides some in depth reading:
- One more Nested Intervals vs. Adjacency List comparison: the best comparison of Adjacency List, Materialized Path, Nested Set and Nested Interval I've found.
- Models for hierarchical data: slides with good explanations of tradeoffs and example usage
- Representing hierarchies in MySQL: very good overview of Nested Set in particular
- Hierarchical data in RDBMSs: most comprehensive and well organized set of links I've seen, but not much in the way on explanation
Options
Ones I am aware of and general features:
- Adjacency List:
- Columns: ID, ParentID
- Easy to implement.
- Cheap node moves, inserts, and deletes.
- Expensive to find level (can store as a computed column), ancestry & descendants (Bridge Table combined with level column can solve), path (Lineage Column can solve).
- Use Common Table Expressions in those databases that support them to traverse.
- Nested Set (a.k.a Modified Preorder Tree Traversal)
- Popularized by Joe Celko in numerous articles and his book Trees and Hierarchies in SQL for Smarties
- Columns: Left, Right
- Cheap level, ancestry, descendants
- Volatile encoding - moves, inserts, deletes more expensive.
- Requires a specific sort order (e.g. created). So sorting all descendants in a different order requires additional work.
- Nested Intervals
- Like nested set, but with real/float/decimal so that the encoding isn't volatile (inexpensive move/insert/delete)
- Have to deal with real/float/decimal representation issues
- A more complex matrix encoding variant adds the benefit of ancestor encoding, like materialized path for "free"
- Bridge Table (a.k.a. Closure Table: some good ideas about how to use triggers for maintaining this approach)
- Columns: ancestor, descendant
- Stands apart from table it describes.
- Can include some nodes in more than one hierarchy.
- Cheap ancestry and descendants (albeit not in what order)
- For complete knowledge of a hierarchy needs to be combined with another option.
- Flat Table
- A modification of the Adjacency List that adds a Level and Rank (e.g. ordering) column to each record.
- Expensive move and delete
- Cheap ancestry and descendants
- Good Use: threaded discussion - forums / blog comments
- Lineage Column (a.k.a. Materialized Path, Path Enumeration)
- Column: lineage (e.g. /parent/child/grandchild/etc...)
- Limit to how deep the hierarchy can be.
- Descendants cheap (e.g.
LEFT(lineage, #) = '/enumerated/path'
) - Ancestry tricky (database specific queries)
- Multiple lineage columns
- Columns: one for each lineage level, refers to all the parents up to the root, levels down from the items level are set to NULL
- Limit to how deep the hierarchy can be
- Cheap ancestors, descendants, level
- Cheap insert, delete, move of the leaves
- Expensive insert, delete, move of the internal nodes
Database Specific Notes
MySQL
Oracle
- Use CONNECT BY to traverse Adjacency Lists
PostgreSQL
- ltree datatype for Materialized Path
SQL Server
- General summary
- 2008 offers HierarchyId data type appears to help with Lineage Column approach and expand the depth that can be represented.
Answer :
This is kind of a question that is still interesting even after all big three vendors implemented a recursive
WITH
clause. I'd suggest that different readers would be pleased with different answers.- Comprehensive list of references by Troels Arvin.
- For the lack of competition, introductory textbook by Joe Celko "Trees and Hierarchies in SQL for Smarties" can indeed be considered a classic.
- Review of various tree encodings with emphasis to nested intervals.
http://stackoverflow.com/questions/4048151/what-are-the-options-for-storing-hierarchical-data-in-a-relational-database