Recently I ran into a problem that asks for a data structure that efficiently allocate and recycle integer id in [0, MAX_ID). Each id can be allocated again after being recycled. The data structure runs on single machine in single thread. The goal is to optimize time and memory. An intuitive solution is to use a set to track recycled ids, and use a counter to track maximum allocated id so far.

Time and space complexity of this code is clearly optimal. All operations take O(1) time and we need to track recycled ids in memory which will need O(n) space in the worst case. However, it is possible to optimize constant factor. Notice that all ids fall in [0, MAX_ID), which means we can easily replace the hash set with a bitset which is more space efficient in the worst case. However, one challenge is finding an element in a bitset requires traversing the whole set in the worst case, which is O(n). Is there a way to have a data structure that is compact in space like bitset, while allows you to efficiently find an element in the set? The answer is yes. Imagine a full binary tree whose leaf nodes are labeled with numbers from 0 to n. Each node may have two color: blue or red. All nodes are blue at the beginning. This tree can be used to as a set of integers from 0 to n. A red leaf node indicates that corresponding number is in the set. Blue leaf node means the opposite. Red internal node indicates there is some red leaf in its sub tree. Blue means the opposite. It is easy to insert/remove a number from the set. It is also easy to get one number in the set. All operation takes O(log n) time. When n is INT_MAX, the height of the tree will be 32, which can be usually treated as constant. We don’t want to use transitional binary tree node class to implement this data structure because of its memory overhead. Two key observations give us compact implementation: each node has 2 states, which fits into 1 bit; binary tree is full. We can totally use heap like array representation of binary tree and each element is a single bit, which means we just need a bit set with around 2*n bits to store this tree.

The memory usage of _BitTree _is around 2x to 4x of transitional bitset, which is still very compact. getElement() returns the smallest element in the set. It is easy to make it return the largest element instead. It is also not hard to find a successor of a number in the set in O(log n) time, which means traversing this bitset can be done much more efficiently than transitional bitset.