I implemented a skip list in C++(following pseudo code in this paper).

The skip list itself provided simple set interface: contains, insert, remove

I also implemented a map interface using skip list.

I compared performance of Skip List with STL set and map. STL’s algorithm is about twice faster than skip list.

Detail of implementation and experiment data comes later…

#### Implementation

The motivation of Skip List is simple. Linked List has* O(n) *random access time, while it can perform fast deletion given node location. Array have *O(1)* random access, while deletion is* O(n)*. Thus it is nature to look for something in between, say, *O(log n) *for random accessing and deleting. Binary search tree is one such thing, while it loss the linear structure, and not so intuitive. Skip List is a more intuitive, and it has *O(log n) *complexity for random accessing and deletion on average.

Skip List is based on a sorted Linked List, thus we can expect efficient deletion. If we can have efficient accessing, then we are done. Efficient accessing is achieved by adding extra pointers to let us “skip” multiple nodes instead of going through one by one. Ideally, as shown in figure above, each node got several levels, each level has a pointer, which pointing to next node reach that level. Most nodes has low height, and few are high. If we can make sure that the number of nodes with height *h* grow reverse exponentially with *h*, then we can access each node in *O(log n)* time on average.

Accessing a node is done by starting from highest list head with level, go along the list at that level until the next element is larger than or equal to the element you need. When the next one is larger, go one level down, and continue going along. Repeat this until you are at level 1. By now, either your next element at level 1 is the element you are looking for, or search fails.

Node* search(const T& x,vector<Node*>* update=0){
Node* cur=head;
for(int i=level-1;i>=0;i--){
while(cur->next[i]!=tail &&
cur->next[i]->value < x)
cur=cur->next[i];
if(update)update->at(i)=cur;
}
cur=cur->next[0];
if(cur==tail || cur->value != x)
cur = NULL;
return cur;
}

This is the *search* method, the core of all features. *next* is a vector of pointers, which correspond to the “pointer tower” at each node.

void insert(const T& x){
vector<Node*> update(level);
Node* cur=search(x,&update);
if(cur){
cur->value=x;
}else{
Node* newNode=new Node(x);
int v=randomLevel();
if(v > level){
level=v;
update.resize(level,head);
}
newNode->next.resize(v);
for(int i=0;i<v;i++) {
newNode->next[i]=update[i]->next[i];
update[i]->next[i]=newNode;
}
}
}

Insertion is easy: first call *search*, if the element already exist, overwrite it with new value(maybe not exactly the same, but == operator returns true). If not exist, then insert the node into the list, and maintain adjacent pointers(by using vector *update*, detail refer to the paper).

int randomLevel(){
int res=1;
int r=(rand()<<16)+rand();
while(res<maxLevel && (r&1)){
res++;
r>>=1;
}
return res;
}

*randLevel* generate random number satisfying the required distribution. Note that to avoid single node has too high level, the random number should be no greater than current level + 1.

void remove(const T& x){
vector<Node*> update(level);
Node* cur=search(x,&update);
if(cur){
for(int i=0;i<level;i++){
if(update[i]->next[i]!=cur)break;
update[i]->next[i]=cur->next[i];
}
delete cur;
while(level>1){
if(head->next[level-1]==tail)
level--;
else
break;
}
}
}

*remove* is similar to *insert*, except that only one side of pointers need to be maintained. The removed node is deleted directly, so no need to maintain out going pointers.

#### Experiment 1

- Insert n(from 10000 to 100000) random numbers, repeat 10 times, find the average time for each n.
- Remove n(from 10000 to 100000) random numbers, repeat 10 times, find the average time for each n.
- Combination of insertion and deletion for n(from 10000 to 100000) random numbers with 100% hit rate(no insert same element, or remove element not exist in the list), repeat 10 times, find the average time for each n.

The image above shows the result of experiment 1. X axis is n, Y axis is average time cost.

- The running time almost grow linearly with n for insertion and deletion, although theoretical complexity is O(log n).
- Deletion is faster than Insertion. This is because insertion need to maintain pointers on both sides of inserted node, while deletion only maintain those nodes pointing to the deleted node(only one side).
- Because of combined operation, the average size of skip list is small, thus the code is faster