Skip List

Structure and Three Operations

Structure

Every node in the skip list has several properties and methods:

template <typename K, typename V>
class Node
{
public:
    Node<K, V> *forward[MAXL + 1];
    int next[MAXL + 1];
    K key;
    V value;
    int level;
    Node<K, V>() {}
    Node<K, V>(K key, V value) : key(key), value(value)
    {
        for (int i = 0; i <= MAXL; ++i)
            forward[i] = NULL, next[i] = 0;
        level = RandomLevel();
    }
    int RandomLevel()
    {
        int lv = 0;
        int maxn = 1023;
        while (lv < MAXL && rand() % maxn < (int)(p * maxn))
            ++lv;
        return lv;
    }
    friend class SkipList<K, V>;
};

Where RandomLevel() calculates the height of a node, that is, how many layers a node appears in. From the pseudo-code, we see that the node have a probability \(p\) to appear in a higher level, until it has a probability \(1-p\) to stop the loop.

Every node has key, value to store data, level to store the height, forward[] to store pointers to adjacent node, and next[] to store bias to adjacent node. We'll discuss these propoties in the following operations.

The SkipList has a structure:

template <typename K, typename V>
class SkipList
{
public:
    static const int MAXL = 20, INF = 1e9 + 10;
    int level;
    int length;
    Node<K, V> *head;
    SkipList<K, V>() { head = new Node<K, V>(0, 0), level = 0, length = 0; }
};

That is, it has a dummy head at which the operations start, length denoting the number of elements in the list, and level denoting the maximum layer.

find

In the find function, we start finding from dummy head, and from the top-most layer. If current node p has adjacent node and the key of p is smaller than that of the adjacent node, then jump to that node;

Otherwise go to a lower layer and continue this process, until it's the lowest level, then the list becomes original linked list, then traverse the list to find the element.

In pseudo-code we have:

V find(K key)
{
    Node<K, V> *p = head;
    for (int i = level; i >= 0; --i)
    {
        while (p->forward[i] != NULL && p->forward[i]->key < key)
            p = p->forward[i];
    }
    p = p->forward[0];
    if (p->key == key)
        return p->value;
    cerr << "Can't find the key" << endl;
    return INF;
}

Note that after the while loop p becomes the largest element that has a smaller key than the key we search.

insert

Silimar to find, we start from dummy head on the top-most layer, find the largest element that has a smaller key that current key, and go to next node to get the position we need to insert. Create a NewNode there.

Then we need to change all the nodes which are in front of NewNode.

To be specific, suppose p is the updated node, we need to connect NewNode.forward[i] to p.forward[i], then connect p.forward[i] to NewNode

In pseudo-code we have:

void insert(K key, V value)
{
    Node<K, V> *update[MAXL + 1];
    int pos[MAXL + 1];
    Node<K, V> *p = head;
    int now = 0;
    for (int i = level; i >= 0; --i)
    {
        while (p->forward[i] != NULL && p->forward[i]->key < key)
            now += p->next[i], p = p->forward[i];
        update[i] = p, pos[i] = now;
    }
    p = p->forward[0], now++; // now += p->next[0]
    Node<K, V> *NewNode = new Node<K, V>(key, value);
    if (level < NewNode->level)
    {
        NewNode->level = head->level = ++level;
        update[level] = head, pos[level] = 0;
    }
    for (int i = 0; i <= level; ++i)
    {
        p = update[i];
        if (i <= NewNode->level) // p is connected to NewNode
        {
            NewNode->forward[i] = p->forward[i];
            if (p->forward[i]) 
                NewNode->next[i] = p->next[i] - (now - pos[i]) + 1;
            else
                NewNode->next[i] = 0;
            p->next[i] = now - pos[i];
            p->forward[i] = NewNode;
        }
        else
        {
            if (p->forward[i])
                p->next[i]++;
            else p->next[i] = 0;
        }
    }
    length++;
}

erase

When deleting an element, we first find the position of the element, then update the nodes: traverse every layer, connect the node which are in front of current node to the nodes behind current node, and update the bias of the updated nodes.

Then we need to delete current node to release memory.

Finally check if top layer have nodes. If top layer is empty, then decrement level.

In pseudo-code we have:

bool erase(K key)
{
    Node<K, V> *update[MAXL + 1];
    int pos[MAXL + 1];
    Node<K, V> *p = head;
    int now = 0;
    for (int i = level; i >= 0; --i)
    {
        while (p->forward[i] != NULL && p->forward[i]->key < key)
            now += p->next[i], p = p->forward[i];
        update[i] = p;
        pos[i] = now;
    }
    p = p->forward[0], now++;
    if (!p || p->key != key)
        return false;
    for (int i = 0; i <= level; ++i)
    {
        if (update[i]->forward[i] != p) // update[i] is not connected to p
        {
            if (update[i]->forward[i])
                update[i]->next[i]--;
            else update[i]->next[i] = 0;
        }
        else
        {
            update[i]->forward[i] = p->forward[i];
            update[i]->next[i] += p->next[i] - 1;
        }
    }
    delete p;
    while (level > 0 && head->forward[level] == NULL)
        level--;
    head->level = level;
    length--;
    return true;
}

Until now we haven't used next[] that we maintained in both insert and erase, but it can help simplify another operation and reduce the complexity of it.

other Operations and Optimizations

kth

If we need to find the \(k\)-th largest element and return key of it in a function, we can brutely traverse the bottom list to find it, but this operation needs \(O(k)\) time.

Alternatively, we can use the next[] to reduce the complexity.

To be specific, when we start from dummy head, we will jump until current rank is smaller than the rank that will be added when jumping to next node.

After this process we get the node with rank - 1. Therefore go to next node to get node with rank .

In pseudo-code we have:

K kth(int rank)
{
    Node<K, V> *p = head;
    for (int i = level; i >= 0; --i)
    {
        while (p->forward[i] != NULL && p->next[i] < rank)
            rank -= p->next[i], p = p->forward[i];
    }
    p = p->forward[0];
    return p->key;
}

split

To split one single SkipList into two parts.

We use rank, which is also the length of the left sublist, as the middle point, and split the list into left and right parts.

in pseudo-code we have:

template <typename K, typename V>
pair<SkipList<K, V>*, SkipList<K, V>*> split(SkipList<K, V>* u, int rank){
    SkipList<K, V> *v = new SkipList<K, V>();
    SLNode<K, V> *update[MAXL + 1];
    int pos[MAXL + 1];
    SLNode<K, V> *p = u->head;
    int now = 0, level = u->level;
    for (int i = level; i >= 0; --i){
        while (p->forward[i] != NULL && now + p->next[i] < rank)
            now += p->next[i], p = p->forward[i];
        update[i] = p, pos[i] = now;
    }
    p = p->forward[0], now++; // now += p->next[0]
    v->level = level, v->head->level = level;
    v->length = u->length - now + 1, u->length = now - 1;
    for (int i = 0; i <= level; ++i){
        p = update[i];
        v->head->forward[i] = p->forward[i];
        if (p->forward[i]) v->head->next[i] = p->next[i] - (now - pos[i]) + 1;
        else v->head->next[i] = 0;
        p->next[i] = 0;
        p->forward[i] = NULL;
    }
    return make_pair(u, v);
}

merge

Also, given two SkipLists, we need to merge them into one list.

In pseudo-code we have:

template <typename K, typename V>
SkipList<K, V>* merge(SkipList<K, V>* u, SkipList<K, V>* v){
    SLNode<K, V> *update[MAXL + 1];
    int pos[MAXL + 1];
    SLNode<K, V> *p = u->head;
    int now = 0;
    for (int i = u->level; i >= 0; --i){
        while (p->forward[i] != NULL)
            now += p->next[i], p = p->forward[i];
        update[i] = p, pos[i] = now;
    }
    p = p->forward[0], now++; // now += p->next[0]
    for (int i = 0; i <= v->level; ++i){
        if(i > u->level){
            u->head->forward[i] = v->head->forward[i];
            if(v->head->forward[i]) u->head->next[i] = u->length + v->head->next[i];
            else u->head->next[i] = 0;
        }else{
            update[i]->forward[i] = v->head->forward[i];
            if(v->head->forward[i]) update[i]->next[i] = (u->length - pos[i]) + v->head->next[i];
            else update[i]->next[i] = 0;
        }
        v->head->forward[i] = NULL;
    }
    u->level = max(u->level, v->level);
    u->head->level = u->level;
    u->length += v->length;
    delete v;
    return u;
}

code

#include <bits/stdc++.h>
using namespace std;
const int INF = 1e8 + 10;
int n, tot, o, m;
const int MAXL = 20;
const double p = 0.5;
template <typename K, typename V>
class SkipList;
template <typename K, typename V>
class SLNode{
public:
    SLNode<K, V> **forward;
    int *next;
    K key;
    V value;
    int level;
    SLNode<K, V>() {}
    ~SLNode<K, V>() {
        delete [] forward;
        delete [] next;
    }
    SLNode<K, V>(K key, V value) : key(key), value(value) {
        next = new int [MAXL + 1];
        forward = new SLNode<K, V> *[MAXL + 1]; 
        for (int i = 0; i <= MAXL; ++i) forward[i] = NULL, next[i] = 0;
        level = RandomLevel();
    }
    int RandomLevel(){
        int lv = 0;
        int maxn = 1023;
        while (lv < MAXL && rand() % maxn < (int)(p * maxn)) ++lv;
        return lv;
    }
    friend class SkipList<K, V>;
};
template <typename K, typename V>
class SkipList{
public:
    static const int MAXL = 20, INF = 1e9 + 10;
    int level;
    int length;
    SLNode<K, V> *head;
    SkipList<K, V>() { head = new SLNode<K, V>(0, 0), level = 0, length = 0; }
    ~SkipList<K, V>() {
        SLNode<K, V> *p = head;
        while(p){
            SLNode<K, V> *q = p->forward[0];
            delete p, p = q;
        }   
    }
    SLNode<K, V>* operator [](int rank){ // from 0
        ++rank;
        SLNode<K, V> *p = head;
        for (int i = level; i >= 0; --i){
            while (p->forward[i] != NULL && p->next[i] < rank)
                rank -= p->next[i], p = p->forward[i];
        }
        p = p->forward[0];
        return p;
    }
    int getrank(K key){
        SLNode<K, V> *p = head;
        int now = 0;
        for (int i = level; i >= 0; --i){
            while (p->forward[i] != NULL && p->forward[i]->key < key)
                now += p->next[i], p = p->forward[i];
        }
        return now + 1;
    }
    K kth(int rank){
        SLNode<K, V> *p = head;
        for (int i = level; i >= 0; --i){
            while (p->forward[i] != NULL && p->next[i] < rank)
                rank -= p->next[i], p = p->forward[i];
        }
        p = p->forward[0];
        return p->key;
    }
    K pre(K key){
        SLNode<K, V> *p = head;
        for (int i = level; i >= 0; --i){
            while (p->forward[i] != NULL && p->forward[i]->key < key)
                p = p->forward[i];
        }
        return p->key;
    }
    K nxt(K key){
        SLNode<K, V> *p = head;
        for (int i = level; i >= 0; --i){
            while (p->forward[i] != NULL && p->forward[i]->key <= key)
                p = p->forward[i];
        }
        p = p->forward[0];
        if(p) return p->key;
        return INF;
    }
    void insert(K key, V value){
        SLNode<K, V> *update[MAXL + 1];
        int pos[MAXL + 1];
        SLNode<K, V> *p = head;
        int now = 0;
        for (int i = level; i >= 0; --i){
            while (p->forward[i] != NULL && p->forward[i]->key < key)
                now += p->next[i], p = p->forward[i];
            update[i] = p, pos[i] = now;
        }
        p = p->forward[0], now++; // now += p->next[0]
        SLNode<K, V> *NewNode = new SLNode<K, V>(key, value);
        if (level < NewNode->level){
            NewNode->level = head->level = ++level;
            update[level] = head, pos[level] = 0;
        }
        for (int i = 0; i <= level; ++i){
            p = update[i];
            if (i <= NewNode->level){
                NewNode->forward[i] = p->forward[i];
                if (p->forward[i]) NewNode->next[i] = p->next[i] - (now - pos[i]) + 1;
                else NewNode->next[i] = 0;
                p->next[i] = now - pos[i];
                p->forward[i] = NewNode;
            }else{
                if (p->forward[i]) p->next[i]++;
                else p->next[i] = 0;
            }
        }
        length++;
    }
    bool erase(K key){
        SLNode<K, V> *update[MAXL + 1];
        int pos[MAXL + 1];
        SLNode<K, V> *p = head;
        int now = 0;
        for (int i = level; i >= 0; --i){
            while (p->forward[i] != NULL && p->forward[i]->key < key)
                now += p->next[i], p = p->forward[i];
            update[i] = p;
            pos[i] = now;
        }
        p = p->forward[0], now++;
        if (!p || p->key != key) return false;
        for (int i = 0; i <= level; ++i){
            if (update[i]->forward[i] != p){
                if (update[i]->forward[i]) update[i]->next[i]--;
                else update[i]->next[i] = 0;
            }else{
                update[i]->forward[i] = p->forward[i];
                update[i]->next[i] += p->next[i] - 1;
            }
        }
        delete p;
        while (level > 0 && head->forward[level] == NULL) level--;
        head->level = level;
        length--;
        return true;
    }
    void print(){
        SLNode<K, V> *p = head;
        while (p){
            cout << p->key << " ";
            if (p == head) cout << level << ": ";
            else cout << p->level << ": ";
            for (int i = level; i >= 0; --i){
                if (p->forward[i] == NULL) cout << "NULL ";
                else cout << "(" << p->forward[i]->key << ", " << p->next[i] << ") ";
            }
            cout << endl;
            p = p->forward[0];
        }
        cout << "length: " << length << endl;
    }
};

template <typename K, typename V>
pair<SkipList<K, V>*, SkipList<K, V>*> split(SkipList<K, V>* u, int rank){
    SkipList<K, V> *v = new SkipList<K, V>();
    SLNode<K, V> *update[MAXL + 1];
    int pos[MAXL + 1];
    SLNode<K, V> *p = u->head;
    int now = 0, level = u->level;
    for (int i = level; i >= 0; --i){
        while (p->forward[i] != NULL && now + p->next[i] < rank)
            now += p->next[i], p = p->forward[i];
        update[i] = p, pos[i] = now;
    }
    p = p->forward[0], now++; // now += p->next[0]
    v->level = level, v->head->level = level;
    v->length = u->length - now + 1, u->length = now - 1;
    for (int i = 0; i <= level; ++i){
        p = update[i];
        v->head->forward[i] = p->forward[i];
        if (p->forward[i]) v->head->next[i] = p->next[i] - (now - pos[i]) + 1;
        else v->head->next[i] = 0;
        p->next[i] = 0;
        p->forward[i] = NULL;
    }
    return make_pair(u, v);
}
template <typename K, typename V>
SkipList<K, V>* merge(SkipList<K, V>* u, SkipList<K, V>* v){
    SLNode<K, V> *update[MAXL + 1];
    int pos[MAXL + 1];
    SLNode<K, V> *p = u->head;
    int now = 0;
    for (int i = u->level; i >= 0; --i){
        while (p->forward[i] != NULL)
            now += p->next[i], p = p->forward[i];
        update[i] = p, pos[i] = now;
    }
    p = p->forward[0], now++; // now += p->next[0]
    for (int i = 0; i <= v->level; ++i){
        if(i > u->level){
            u->head->forward[i] = v->head->forward[i];
            if(v->head->forward[i]) u->head->next[i] = u->length + v->head->next[i];
            else u->head->next[i] = 0;
        }else{
            update[i]->forward[i] = v->head->forward[i];
            if(v->head->forward[i]) update[i]->next[i] = (u->length - pos[i]) + v->head->next[i];
            else update[i]->next[i] = 0;
        }
        v->head->forward[i] = NULL;
    }
    u->level = max(u->level, v->level);
    u->head->level = u->level;
    u->length += v->length;
    delete v;
    return u;
}

Expected Complexity and Worst Case Complexity Analysis

Expected Maximum layer

First, define \(L(n)\) as the layer that the expected number of nodes is \(\frac1p\)

Every layer should have \(p\) times number of nodes than previous layer, therefore \(i\)-th layer expect to have \(p^{i-1}\cdot n\) nodes.

therefore \(p^{L(n)-1}\cdot n = \frac1p\)

\(p^{L(n)} = \frac1n\)

\(\therefore L(n) = \log_p{\frac1n}=\log_{\frac1p}n\)

Therefore we should take \(L(n)\) as maximum layer, then maximum layer \(L(n)=O(\log n)\)

Complexity of Operations

Every operation run on one basic subprocess: traverse every layer, check next key and jump until next key is larger than the key we search.

We can analyse this subprocess from ending to beginning

The subprocess can be divided into two parts: climb from bottom to \(L(n)\)-th layer, and traverse top layer until encounter dummy head.

For the first part, suppose current node \(x\) is on \(i\)-th layer.

If maximum level of \(x\) equals \(i\), then next step we need to jump to previous node in the same layer. The probability is \(1-p\).

If maximum level of \(x\) is larger than \(i\), then next step we need to jump to the higher layer. The probability is \(p\).

Denote \(C(i)\) as the cost of climbing \(i\) layers in a SkipList with infinite length, then we have:

\(\displaystyle C(i)=1+(1-p)C(i)+pC(i-1)\)

\(\displaystyle\therefore C(i)=\frac ip\)

Therefore the expected cost of first part is \(C(L(n)-1)=\frac{L(n)-1}{p}\)

The second part is traversing the top layer, therefore the expected cost is the total number in the top layer, that is, \(\frac1p\).

Therefore the total expected complexity is \(\frac{L(n)-1}{p}+\frac1p = \frac{L(n)}{p}=\frac{\log_{\frac1p}n}{p}=O(\log n)\).

For find, kth, it only has this subprocess, therefore is \(O(\log n)\)

For insert, erase, it has this subprocess and updating process, and updating process also has a complexity \(O(\log n)\), since it traverses every layer and do \(O(1)\) operation.

For split, merge, they are adapted from insert, erase functions, and they have the exact same complexity of insert, merge, which is \(O(\log n)\)

Therefore for every operation, expected complexity is \(O(\log n)\).

What's more, worst case complexity of these operation is \(O(n)\), since they may not jump to any adjacent nodes on higher layers. That is, they may quickly jump to bottom layer and traverse the whole bottom layer and the number of nodes in bottom layer is \(O(n)\).

performance

P3369: 261ms

make B+ tree great again

normal B+ tree

normal B+ tree has a operation complexity of \(O(M \log_M N)=O(M\frac{\log N}{\log M})\), where \(M\) denotes the order of B+ tree, \(N\) denotes the number of nodes.

The \(O(M)\) part comes from traversing every key in a node. Since the nodes are implemented using linked list, the worst case is \(O(M)\), therefore the complexity is \(O(M \log_M N)\), and it's \(M\) related.

B+ tree using skip list to implement their nodes

However, using skip list to implement nodes can reduce this process to \(O(\log M)\), therefore total complexity becomes \(O(\log M\frac{\log N}{\log M})=O(\log N)\), and it's \(M\) irrelated.