原文鏈接:www.pilishen.com/posts/an-in…
此文檔是 nestedset-無限分類正確姿勢的擴展閱讀
本文翻譯自維基百科Nested set model
nested set model(嵌套集合模型)是一種在關系型數據庫中表示nested sets(嵌套集合)?的特殊技術。[nested sets]通常指的是關系樹或者層級關系。這個術語是由?Joe Celko清晰的提出來的,還有人使用不同的術語來描述這一技術。
誘因
該技術的出現解決了標準關系代數和關系演算以及基于它們的SQL操作不能直接在層次結構上表示所有期望操作的問題。層級可以用parent-child relation (父子關系)術語來表示 - Celko稱之為?[adjacency list model],但是如果可以有任意的深度,這種模型不能用來展示類似的操作比如比較兩個元素的層級或者確定一個元素是否位于另一個元素的子層級,當一個層級結構是固定的或者有固定的深度,這種操作必須通過每一層的?relational join?(關系連接)來實現。但是這將很低效。這通常被稱為物料清單問題。
通過切換到圖形數據庫,可以很容易地表達層次結構。另外在一些關系型數據庫系統中存在并提供了這種關系模型的解決方案:
- 支持專門的層級結構數據類型,比如SQL的hierarchical query?facility(層級查詢工具)。
- 使用層級操作擴展關系型語言,比如?nested relational algebra。
- 使用transitive closure擴展關系型語言,比如SQL的CONNECT語句;這可以在parent-child relation 使用但是執行起來比較低效。
- 層級結構查詢可以在支持循環且包裹關系的操作的語言中實現。比如?PL/SQL,?T-SQL?or a?general-purpose programming language
當這些解決方案沒被提供或不容易實現,就必須使用另一種方法
技術
嵌套集模型是根據樹遍歷來對節點進行編號,遍歷會訪問每個節點兩次,按訪問順序分配數字,并在兩次訪問中都分配。這將為每個節點留下兩個數字,它們作為節點兩個屬性存儲。這使得查詢變得高效:通過比較這些數字來獲得層級結構關系。但是更新數據將需要給節點重新分配數字,因此變得低效。盡管很復雜但是可以通過不使用整數而是用有理數來改進更新速度。
例子
在衣服庫存目錄中,衣服可能會更加層級機構來分類:
處于層級結構頂端的Clothing分類包含所有的子類,因此它的左值和右值分別賦值為1和22,后面的值即這里的22是展現的所有節點總數的兩倍。下一層級包含Men's和Women's兩子類,各自包含必須被計算在內的層級。每一層的節點都根據它們包含的子層級來給左值和右值賦值。如上表所示。
表現
使用nested sets 將比使用一個遍歷adjacency list的儲存過程更快,對于天生缺乏遞歸的查詢結構也是更快的選擇。比如MySQL.但是遞歸SQL查詢語句也能提供類似“迅速查詢后代”的語句并且在其他深度搜索查詢是更快,所以也是對于提供這一功能的數據庫的更快選擇。例如?PostgreSQL,[5] ?Oracle,[6] ?and?Microsoft SQL Server.[7]
缺點
The use case for a dynamic endless database tree hierarchy is rare. The Nested Set model is appropriate where the tree element and one or two attributes are the only data, but is a poor choice when more complex relational data exists for the elements in the tree. Given an arbitrary starting depth for a category of 'Vehicles' and a child of 'Cars' with a child of 'Mercedes', a foreign key table relationship must be established unless the tree table is naively non-normalized. Attributes of a newly created tree item may not share all attributes with a parent, child or even a sibling. If a foreign key table is established for a table of 'Plants' attributes, no integrity is given to the child attribute data of 'Trees' and its child 'Oak'. Therefore, in each case of an item inserted into the tree, a foreign key table of the item's attributes must be created for all but the most trivial of use cases. If the tree isn't expected to change often, a properly normalized hierarchy of attribute tables can be created in the initial design of a system, leading to simpler, more portable SQL statements; specifically ones that don't require an arbitrary number of runtime, programmatically created or deleted tables for changes to the tree. For more complex systems, hierarchy can be developed through relational models rather than an implicit numeric tree structure. Depth of an item is simply another attribute rather than the basis for an entire DB architecture. As stated in?SQL Antipatterns:[8]
Nested Sets is a clever solution – maybe too clever. It also fails to support referential integrity. It’s best used when you need to query a tree more frequently than you need to modify the tree.[9]
The model doesn't allow for multiple parent categories. For example, an 'Oak' could be a child of 'Tree-Type', but also 'Wood-Type'. An additional tagging or taxonomy has to be established to accommodate this, again leading to a design more complex than a straightforward fixed model. Nested sets are very slow for inserts because it requires updating left and right domain values for all records in the table after the insert. This can cause a lot of database stress as many rows are rewritten and indexes rebuilt. However, if it is possible to store a forest of small trees in table instead of a single big tree, the overhead may be significantly reduced, since only one small tree must be updated. The?nested interval model?does not suffer from this problem, but is more complex to implement, and is not as well known. It still suffers from the relational foreign-key table problem. The nested interval model stores the position of the nodes as rational numbers expressed as quotients (n/d).?[1](//www.sigmod.org/publications/sigmod-record/0506/p47-article-tropashko.pdf)
變體
使用上面描述的nested set modal 在一些特定的樹遍歷操作上有性能限制。比如根據父節點查找直接子節點需要刪選子樹到一個指定的層級如下所示:
SELECT Child.Node, Child.Left, Child.Right
FROM Tree as Parent, Tree as Child
WHEREChild.Left BETWEEN Parent.Left AND Parent.RightAND NOT EXISTS ( -- No Middle NodeSELECT *FROM Tree as MidWHERE Mid.Left BETWEEN Parent.Left AND Parent.RightAND Child.Left BETWEEN Mid.Left AND Mid.RightAND Mid.Node NOT IN (Parent.Node AND Child.Node))AND Parent.Left = 1 -- Given Parent Node Left Index
復制代碼
或者:
SELECT DISTINCT Child.Node, Child.Left, Child.Right
FROM Tree as Child, Tree as Parent
WHERE Parent.Left < Child.Left AND Parent.Right > Child.Right -- associate Child Nodes with ancestors
GROUP BY Child.Node, Child.Left, Child.Right
HAVING max(Parent.Left) = 1 -- Subset for those with the given Parent Node as the nearest ancestor復制代碼
當查詢不止一層深度的子節點的時候,查詢將更加的復雜,為了突破限制和簡化遍歷樹,在模型上增加一個額外的字段來維護樹內節點的深度:
在這個模型中,找到指定父節點的緊跟直接子節點可以使用下面的SQL語句實現:
SELECT Child.Node, Child.Left, Child.Right
FROM Tree as Child, Tree as Parent
WHEREChild.Depth = Parent.Depth + 1AND Child.Left > Parent.LeftAND Child.Right < Parent.RightAND Parent.Left = 1 -- Given Parent Node Left Index
復制代碼