scapegoat tree
BB[\(\alpha\)] tree
Description
BB[\(\alpha\)] tree, also known as scapegoat tree, is a kind of size-balanced tree.
It has a parameter \(\alpha\), which satisfies \(1/2 \leq \alpha < 1\).
For a node \(x\) , when it holds that \(x.left.size \leq \alpha \cdot x.size\) and \(x.right.size \leq \alpha \cdot x.size\), we call the node x is \(\alpha\) balanced .
Theoretical analysis
Rebuild the subtree
In BB[\(\alpha\)] tree, we can rebuild the subtree in time complexity \(\Theta(x.size)\) with a \(O(x.size)\) auxiliary space .
To sort the elements in the subtree of \(x\), we can run a inorder traversal and save the elements in the auxiliary array.
We call this process flatten.
code:
function flatten(u):
// Base case: if node 'u' is null, return immediately
if u is null:
return
// Recursively process the left subtree
flatten(left child of u)
// If the current node has elements (i.e., count is not zero)
if count of u > 0:
// Add the current node to the result array
aux[++tot] = u
// Recursively process the right subtree
flatten(right child of u)
Then we can put the median element on the root, and distribute exactly half of these elements into left subtree and the other half into right subtree. This can be done in a recursive process.
We have the pseudo code:
function rebuild(l, r): // rebuild a balanced subtree
if l > r:
return 0
// find the middle index (integer division by 2)
mid = (l + r) // 2
// recursively rebuild the left subtree
left child of aux[mid] = rebuild(l, mid - 1)
// recursively rebuild the right subtree
right child of aux[mid] = rebuild(mid + 1, r)
return aux[mid] // return the current node
function Rebuild(u):
Set tot to 0
Call the flatten function with u as an argument
Set u to the result of calling rebuild with arguments (1, tot)
In this process every element is put on the root of subtree only once, thus the time complexity if \(\Theta(x.size)\), and auxiliary space is \(O(x.size)\).
Time complexity of the seraching process
Because the tree holds that \(x.left.size \leq \alpha \cdot x.size\) and \(x.right.size \leq \alpha \cdot x.size\), when we go to the subtree of \(x\), the size of the subtree will become \(\alpha\) times less than \(x\).
Suppose that we passed nodes \(x_1,\cdots,x_k\) , the size of \(x_k\) holds that \(1=x_k.size \leq \alpha^{k-1}x_1.size = \alpha^{k-1}N\), where \(n\) denotes the depth of the path.
So we have \(k\leq \log_{\alpha}{\frac1N}+1=\log_{\frac1\alpha}{N}+1\leq \log_2N+1\).
So the time complexity of searching is \(O(\log N)\).
Potential function of the tree
Denote that \(\Delta(x)=|x.left.size-x.right.size|\).
Then the potential function is \(\displaystyle\Phi(T)=c \cdot \sum_{x\in T, \Delta(x)\geq 2} \Delta(x)\).
Because \(\Delta(x)=|x.left.size-x.right.size|\geq 0\), therefore the potential function is non-negative
Also, when the tree is \(\frac12\) balanced, \(\Delta(x)=0\), then \(\Phi(x)=\sum \Delta(x)=0\)
Amortized analysis for rebuilding
Suppose that \(m\) units potential energy can compensate for the cost of rebuilding a subtree of \(m\) nodes.
Therefore, when a subtree with root \(x\) becomes imbalanced, denote that the heavier son is \(y\), the lighter son is \(z\), then we have \(size(y)>\alpha\cdot m\) and \(size(z)+size(y)=m\)
So \(size(z)<(1-\alpha)\cdot m\)
So \(\Delta(x) = size(y)-size(z)>(2\alpha -1 )\cdot m\)
\(\displaystyle\Phi_{n-1}(T)=c \cdot \sum_{x\in T, \Delta(x)\geq 2} \Delta(x) \geq c \cdot \Delta(x)>c \cdot(2\alpha -1 )\cdot m\) and \(\Phi_n(T)=0\)
\(\therefore \hat c_i = c_i + \Phi_n(T)-\Phi_{n-1}(T)<m-c \cdot(2\alpha -1 )\cdot m=(1-c \cdot(2\alpha -1 ))\cdot m=O(1)\)
\(\therefore c>\frac1{2\alpha-1}\)
Amortized analysis for insertion and deletion
The process for insertion and deletion have two steps:
-
Find the position that the value need to be inserted/deleted. This process have the same time complexity of searching, which is \(O(\log N)\)
-
Maintain the balance between current node and root. Bacuase every rebuilding process takes \(O(1)\) time, and the path from current node to the root is \(O(\log N)\), then the time complexity of this process is \(O(\log N)\)
Therefore, total amortized time complexity for insertion and deletion is \(O(\log N)\).