Skip to content

scapegoat tree

BB[\(\alpha\)] tree

Description

BB[\(\alpha\)] tree, also known as scapegoat tree, is a kind of size-balanced tree.

It has a parameter \(\alpha\), which satisfies \(1/2 \leq \alpha < 1\).

For a node \(x\) , when it holds that \(x.left.size \leq \alpha \cdot x.size\) and \(x.right.size \leq \alpha \cdot x.size\), we call the node x is \(\alpha\) balanced .

Theoretical analysis

Rebuild the subtree

In BB[\(\alpha\)] tree, we can rebuild the subtree in time complexity \(\Theta(x.size)\) with a \(O(x.size)\) auxiliary space .

To sort the elements in the subtree of \(x\), we can run a inorder traversal and save the elements in the auxiliary array.

We call this process flatten.

code:

function flatten(u):
    // Base case: if node 'u' is null, return immediately
    if u is null:
        return

    // Recursively process the left subtree
    flatten(left child of u)

    // If the current node has elements (i.e., count is not zero)
    if count of u > 0:
        // Add the current node to the result array
        aux[++tot] = u

    // Recursively process the right subtree
    flatten(right child of u)

Then we can put the median element on the root, and distribute exactly half of these elements into left subtree and the other half into right subtree. This can be done in a recursive process.

We have the pseudo code:

function rebuild(l, r):  // rebuild a balanced subtree
    if l > r:
        return 0
    // find the middle index (integer division by 2)
    mid = (l + r) // 2   

    // recursively rebuild the left subtree
    left child of aux[mid] = rebuild(l, mid - 1) 

    // recursively rebuild the right subtree
    right child of aux[mid] = rebuild(mid + 1, r)  

    return aux[mid]  // return the current node

function Rebuild(u):
    Set tot to 0
    Call the flatten function with u as an argument
    Set u to the result of calling rebuild with arguments (1, tot)

In this process every element is put on the root of subtree only once, thus the time complexity if \(\Theta(x.size)\), and auxiliary space is \(O(x.size)\).

Time complexity of the seraching process

Because the tree holds that \(x.left.size \leq \alpha \cdot x.size\) and \(x.right.size \leq \alpha \cdot x.size\), when we go to the subtree of \(x\), the size of the subtree will become \(\alpha\) times less than \(x\).

Suppose that we passed nodes \(x_1,\cdots,x_k\) , the size of \(x_k\) holds that \(1=x_k.size \leq \alpha^{k-1}x_1.size = \alpha^{k-1}N\), where \(n\) denotes the depth of the path.

So we have \(k\leq \log_{\alpha}{\frac1N}+1=\log_{\frac1\alpha}{N}+1\leq \log_2N+1\).

So the time complexity of searching is \(O(\log N)\).

Potential function of the tree

Denote that \(\Delta(x)=|x.left.size-x.right.size|\).

Then the potential function is \(\displaystyle\Phi(T)=c \cdot \sum_{x\in T, \Delta(x)\geq 2} \Delta(x)\).

Because \(\Delta(x)=|x.left.size-x.right.size|\geq 0\), therefore the potential function is non-negative

Also, when the tree is \(\frac12\) balanced, \(\Delta(x)=0\), then \(\Phi(x)=\sum \Delta(x)=0\)

Amortized analysis for rebuilding

Suppose that \(m\) units potential energy can compensate for the cost of rebuilding a subtree of \(m\) nodes.

Therefore, when a subtree with root \(x\) becomes imbalanced, denote that the heavier son is \(y\), the lighter son is \(z\), then we have \(size(y)>\alpha\cdot m\) and \(size(z)+size(y)=m\)

So \(size(z)<(1-\alpha)\cdot m\)

So \(\Delta(x) = size(y)-size(z)>(2\alpha -1 )\cdot m\)

\(\displaystyle\Phi_{n-1}(T)=c \cdot \sum_{x\in T, \Delta(x)\geq 2} \Delta(x) \geq c \cdot \Delta(x)>c \cdot(2\alpha -1 )\cdot m\) and \(\Phi_n(T)=0\)

\(\therefore \hat c_i = c_i + \Phi_n(T)-\Phi_{n-1}(T)<m-c \cdot(2\alpha -1 )\cdot m=(1-c \cdot(2\alpha -1 ))\cdot m=O(1)\)

\(\therefore c>\frac1{2\alpha-1}\)

Amortized analysis for insertion and deletion

The process for insertion and deletion have two steps:

  1. Find the position that the value need to be inserted/deleted. This process have the same time complexity of searching, which is \(O(\log N)\)

  2. Maintain the balance between current node and root. Bacuase every rebuilding process takes \(O(1)\) time, and the path from current node to the root is \(O(\log N)\), then the time complexity of this process is \(O(\log N)\)

Therefore, total amortized time complexity for insertion and deletion is \(O(\log N)\).