In computer science, join-based tree algorithms provide a unified framework for designing parallel self-balancing binary search trees using a single join operation. This operation combines two balanced trees and a key into a new balanced binary tree whose in-order traversal concatenates the left tree, key, and right tree. When applied to search trees that maintain a total ordering on keys, it requires all keys in the left tree to be smaller than the key, and all keys in the right tree greater. Join-based algorithms are applicable across multiple balancing schemes, including AVL trees, red–black trees, weight-balanced trees, and treaps.
History
The join operation was first defined by Tarjan2 on red–black trees, which runs in worst-case logarithmic time. Later Sleator and Tarjan 3 described a join algorithm for splay trees which runs in amortized logarithmic time. Later Adams 4 extended join to weight-balanced trees and used it for fast set–set functions including union, intersection and set difference. In 1998, Blelloch and Reid-Miller extended join on treaps, and proved the bound of the set functions to be O ( m log ( 1 + n m ) ) {\displaystyle O(m\log(1+{\tfrac {n}{m}}))} for two trees of size m {\displaystyle m} and n ( ≥ m ) {\displaystyle n(\geq m)} , which is optimal in the comparison model. They also brought up parallelism in Adams' algorithm by using a divide-and-conquer scheme. In 2016, Blelloch et al. formally proposed the join-based algorithms, and formalized the join algorithm for four different balancing schemes: AVL trees, red–black trees, weight-balanced trees and treaps. In the same work they proved that Adams' algorithms on union, intersection and difference are work-optimal on all the four balancing schemes.
Join algorithms
The function join ( t 1 , k , t 2 ) {\displaystyle (t_{1},k,t_{2})} considers rebalancing the tree, and thus depends on the input balancing scheme. If the two trees are balanced, join simply creates a new node with left subtree t1, root k and right subtree t2. Suppose that t1 is heavier (this "heavier" depends on the balancing scheme) than t2 (the other case is symmetric). Join follows the right spine of t1 until a node c which is balanced with t2. At this point a new node with left child c, root k and right child t2 is created to replace c. The new node may invalidate the balancing invariant. This can be fixed with rotations.
The following is the join algorithms on different balancing schemes.
The join algorithm for AVL trees:
function joinRightAVL(TL, k, TR) (l, k', c) := expose(TL) if h(c) ≤ h(TR) + 1 T' := Node(c, k, TR) if h(T') ≤ h(l) + 1 return Node(l, k', T') else return rotateLeft(Node(l, k', rotateRight(T'))) else T' := joinRightAVL(c, k, TR) T := Node(l, k', T') if h(T') ≤ h(l) + 1 return T else return rotateLeft(T) function joinLeftAVL(TL, k, TR) /* symmetric to joinRightAVL */ function join(TL, k, TR) if h(TL) > h(TR) + 1 return joinRightAVL(TL, k, TR) else if h(TR) > h(TL) + 1 return joinLeftAVL(TL, k, TR) else return Node(TL, k, TR)Where:
- h ( v ) {\displaystyle h(v)} is the height of node v {\displaystyle v} .
- expose ( v ) {\displaystyle {\text{expose}}(v)} extracts the left child l {\displaystyle l} , key k {\displaystyle k} , and right child r {\displaystyle r} of node v {\displaystyle v} into a tuple ( l , k , r ) {\displaystyle (l,k,r)} .
- Node ( l , k , r ) {\displaystyle {\text{Node}}(l,k,r)} creates a node with left child l {\displaystyle l} , key k {\displaystyle k} , and right child r {\displaystyle r} .
The join algorithm for red–black trees:
function joinRightRB(TL, k, TR) if TL.color = black and ĥ(TL) = ĥ(TR) return Node(TL, ⟨k, red⟩, TR) else (L', ⟨k', c'⟩, R') := expose(TL) T' := Node(L', ⟨k', c'⟩, joinRightRB(R', k, TR)) if c' = black and T'.right.color = T'.right.right.color = red T'.right.right.color := black return rotateLeft(T') else return T' function joinLeftRB(TL, k, TR) /* symmetric to joinRightRB */ function join(TL, k, TR) if ĥ(TL) > ĥ(TR) T' := joinRightRB(TL, k, TR) if (T'.color = red) and (T'.right.color = red) T'.color := black return T' else if ĥ(TR) > ĥ(TL) /* symmetric */ else if TL.color = black and TR = black return Node(TL, ⟨k, red⟩, TR) else return Node(TL, ⟨k, black⟩, TR)Where:
- h ^ ( v ) {\displaystyle {\hat {h}}(v)} is the black height of node v {\displaystyle v} .
- expose ( v ) {\displaystyle {\text{expose}}(v)} extracts the left child l {\displaystyle l} , key k {\displaystyle k} , color c {\displaystyle c} , and right child r {\displaystyle r} of node v {\displaystyle v} into a tuple ( l , ⟨ k , c ⟩ , r ) {\displaystyle (l,\langle k,c\rangle ,r)} .
- Node ( l , ⟨ k , c ⟩ , r ) {\displaystyle {\text{Node}}(l,\langle k,c\rangle ,r)} creates a node with left child l {\displaystyle l} , key k {\displaystyle k} , color c {\displaystyle c} , and right child r {\displaystyle r} .
The join algorithm for weight-balanced trees:
function joinRightWB(TL, k, TR) (l, k', c) := expose(TL) if w(TL) =α w(TR) return Node(TL, k, TR) else T' := joinRightWB(c, k, TR) (l1, k1, r1) := expose(T') if w(l) =α w(T') return Node(l, k', T') else if w(l) =α w(l1) and w(l)+w(l1) =α w(r1) return rotateLeft(Node(l, k', T')) else return rotateLeft(Node(l, k', rotateRight(T')) function joinLeftWB(TL, k, TR) /* symmetric to joinRightWB */ function join(TL, k, TR) if w(TL) >α w(TR) return joinRightWB(TL, k, TR) else if w(TR) >α w(TL) return joinLeftWB(TL, k, TR) else return Node(TL, k, TR)Where:
- w ( v ) {\displaystyle w(v)} is the weight of node v {\displaystyle v} .
- w 1 = α w 2 {\displaystyle w_{1}=_{\alpha }w_{2}} means weights w 1 {\displaystyle w_{1}} and w 2 {\displaystyle w_{2}} are α-weight-balanced.
- w 1 > α w 2 {\displaystyle w_{1}>_{\alpha }w_{2}} means weight w 1 {\displaystyle w_{1}} is heavier than weight w 2 {\displaystyle w_{2}} with respect to the α-weight-balance.
- expose ( v ) {\displaystyle {\text{expose}}(v)} extracts the left child l {\displaystyle l} , key k {\displaystyle k} , and right child r {\displaystyle r} of node v {\displaystyle v} into a tuple ( l , k , r ) {\displaystyle (l,k,r)} .
- Node ( l , k , r ) {\displaystyle {\text{Node}}(l,k,r)} creates a node with left child l {\displaystyle l} , key k {\displaystyle k} and right child r {\displaystyle r} .
Join-based algorithms
In the following, expose ( v ) {\displaystyle {\text{expose}}(v)} extracts the left child l {\displaystyle l} , key k {\displaystyle k} , and right child r {\displaystyle r} of node v {\displaystyle v} into a tuple ( l , k , r ) {\displaystyle (l,k,r)} . Node ( l , k , r ) {\displaystyle {\text{Node}}(l,k,r)} creates a node with left child l {\displaystyle l} , key k {\displaystyle k} and right child r {\displaystyle r} . " s 1 | | s 2 {\displaystyle s_{1}||s_{2}} " means that two statements s 1 {\displaystyle s_{1}} and s 2 {\displaystyle s_{2}} can run in parallel.
Split
To split a tree into two trees, those smaller than key x, and those larger than key x, we first draw a path from the root by inserting x into the tree. After this insertion, all values less than x will be found on the left of the path, and all values greater than x will be found on the right. By applying Join, all the subtrees on the left side are merged bottom-up using keys on the path as intermediate nodes from bottom to top to form the left tree, and the right part is asymmetric. For some applications, Split also returns a boolean value denoting if x appears in the tree. The cost of Split is O ( log n ) {\displaystyle O(\log n)} , order of the height of the tree.
The split algorithm is as follows:
function split(T, k) if (T = nil) return (nil, false, nil) else (L, m, R) := expose(T) if k < m (L', b, R') := split(L, k) return (L', b, join(R', m, R)) else if k > m (L', b, R') := split(R, k) return (join(L, m, L'), b, R')) else return (L, true, R)Join2
This function is defined similarly as join but without the middle key. It first splits out the last key k {\displaystyle k} of the left tree, and then join the rest part of the left tree with the right tree with k {\displaystyle k} . The algorithm is as follows:
function splitLast(T) (L, k, R) := expose(T) if R = nil return (L, k) else (T', k') := splitLast(R) return (join(L, k, T'), k') function join2(L, R) if L = nil return R else (L', k) := splitLast(L) return join(L', k, R)The cost is O ( log n ) {\displaystyle O(\log n)} for a tree of size n {\displaystyle n} .
Insert and delete
The insertion and deletion algorithms, when making use of join can be independent of balancing schemes. For an insertion, the algorithm compares the key to be inserted with the key in the root, inserts it to the left/right subtree if the key is smaller/greater than the key in the root, and joins the two subtrees back with the root. A deletion compares the key to be deleted with the key in the root. If they are equal, return join2 on the two subtrees. Otherwise, delete the key from the corresponding subtree, and join the two subtrees back with the root. The algorithms are as follows:
function insert(T, k) if T = nil return Node(nil, k, nil) else (L, k', R) := expose(T) if k < k' return join(insert(L,k), k', R) else if k > k' return join(L, k', insert(R, k)) else return T function delete(T, k) if T = nil return nil else (L, k', R) := expose(T) if k < k' return join(delete(L, k), k', R) else if k > k' return join(L, k', delete(R, k)) else return join2(L, R)Both insertion and deletion requires O ( log n ) {\displaystyle O(\log n)} time if | T | = n {\displaystyle |T|=n} .
Set–set functions
Several set operations have been defined on weight-balanced trees: union, intersection and set difference. The union of two weight-balanced trees t1 and t2 representing sets A and B, is a tree t that represents A ∪ B. The following recursive function computes this union:
function union(t1, t2) if t1 = nil return t2 else if t2 = nil return t1 else (l1, k1, r1) := expose(t1) (t<, b, t>) := split(t2, k1) l' := union(l1, t<) || r' := union(r1, t>) return join(l', k1, r')Similarly, the algorithms of intersection and set-difference are as follows:
function intersection(t1, t2) if t1 = nil or t2 = nil return nil else (l1, k1, r1) := expose(t1) (t<, b, t>) = split(t2, k1) l' := intersection(l1, t<) || r' := intersection(r1, t>) if b return join(l', k1, r') else return join2(l', r') function difference(t1, t2) if t1 = nil return nil else if t2 = nil return t1 else (l1, k1, r1) := expose(t1) (t<, b, t>) := split(t2, k1) l' = difference(l1, t<) || r' = difference(r1, t>) if b return join2(l', r') else return join(l', k1, r')The complexity of each of union, intersection and difference is O ( m log ( n m + 1 ) ) {\displaystyle O\left(m\log \left({\tfrac {n}{m}}+1\right)\right)} for two weight-balanced trees of sizes m {\displaystyle m} and n ( ≥ m ) {\displaystyle n(\geq m)} . This complexity is optimal in terms of the number of comparisons. More importantly, since the recursive calls to union, intersection or difference are independent of each other, they can be executed in parallel with a parallel depth O ( log m log n ) {\displaystyle O(\log m\log n)} .5 When m = 1 {\displaystyle m=1} , the join-based implementation applies the same computation as in a single-element insertion or deletion if the root of the larger tree is used to split the smaller tree.
Build
The algorithm for building a tree can make use of the union algorithm, and use the divide-and-conquer scheme:
function build(A[], n) if n = 0 return nil else if n = 1 return Node(nil, A[0], nil) else l' := build(A, n/2) || r' := (A+n/2, n-n/2) return union(L, R)This algorithm costs O ( n log n ) {\displaystyle O(n\log n)} work and has O ( log 3 n ) {\displaystyle O(\log ^{3}n)} depth. A more-efficient algorithm makes use of a parallel sorting algorithm.
function buildSorted(A[], n) if n = 0 return nil else if n = 1 return Node(nil, A[0], nil) else l' := build(A, n/2) || r' := (A+n/2+1, n-n/2-1) return join(l', A[n/2], r') function build(A[], n) A' := sort(A, n) return buildSorted(A, n)This algorithm costs O ( n log n ) {\displaystyle O(n\log n)} work and has O ( log n ) {\displaystyle O(\log n)} depth assuming the sorting algorithm has O ( n log n ) {\displaystyle O(n\log n)} work and O ( log n ) {\displaystyle O(\log n)} depth.
Filter
This function selects all entries in a tree satisfying a predicate p {\displaystyle p} , and return a tree containing all selected entries. It recursively filters the two subtrees, and join them with the root if the root satisfies p {\displaystyle p} , otherwise join2 the two subtrees.
function filter(T, p) if T = nil return nil else (l, k, r) := expose(T) l' := filter(l, p) || r' := filter(r, p) if p(k) return join(l', k, r') else return join2(l', R)This algorithm costs work O ( n ) {\displaystyle O(n)} and depth O ( log 2 n ) {\displaystyle O(\log ^{2}n)} on a tree of size n {\displaystyle n} , assuming p {\displaystyle p} has constant cost.
Used in libraries
The join-based algorithms are applied to support interface for sets, maps, and augmented maps 6 in libraries such as Hackage, SML/NJ, and PAM.7
Notes
External links
References
Blelloch, Guy E.; Ferizovic, Daniel; Sun, Yihan (2016), "Just Join for Parallel Ordered Sets", Symposium on Parallel Algorithms and Architectures, Proc. of 28th ACM Symp. Parallel Algorithms and Architectures (SPAA 2016), ACM, pp. 253–264, arXiv:1602.02120, doi:10.1145/2935764.2935768, ISBN 978-1-4503-4210-0 978-1-4503-4210-0 ↩
Tarjan, Robert Endre (1983), "Data structures and network algorithms", Data structures and network algorithms, Siam, pp. 45–56 ↩
Sleator, Daniel Dominic; Tarjan, Robert Endre (1985), "Self-adjusting binary search trees", Journal of the ACM, Siam ↩
Adams, Stephen (1992), "Implementing sets efficiently in a functional language", Implementing sets efficiently in a functional language, Citeseer, CiteSeerX 10.1.1.501.8427. /wiki/CiteSeerX_(identifier) ↩
Blelloch, Guy E.; Ferizovic, Daniel; Sun, Yihan (2016), "Just Join for Parallel Ordered Sets", Symposium on Parallel Algorithms and Architectures, Proc. of 28th ACM Symp. Parallel Algorithms and Architectures (SPAA 2016), ACM, pp. 253–264, arXiv:1602.02120, doi:10.1145/2935764.2935768, ISBN 978-1-4503-4210-0 978-1-4503-4210-0 ↩
Blelloch, Guy E.; Ferizovic, Daniel; Sun, Yihan (2018), "PAM: parallel augmented maps", Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ACM, pp. 290–304 ↩
Blelloch, Guy E.; Ferizovic, Daniel; Sun, Yihan (2018), "PAM: parallel augmented maps", Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ACM, pp. 290–304 ↩