Recently I ran into a problem that asks for a data structure that efficiently allocate and recycle integer id in [0, MAX_ID). Each id can be allocated again after being recycled. The data structure runs on single machine in single thread. The goal is to optimize time and memory. An intuitive solution is to use a set to track recycled ids, and use a counter to track maximum allocated id so far.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
class Allocator {
private static final int MAX_ID = 100000;

private final Set<Integer> recycled = new HashSet<>();
private final int maxAllocated = 0;

int allocate() {
if (maxAllocated < MAX_ID) {
return maxAllocated++;
}
if (!recycled.isEmpty()) {
int id = recycled.iterator().next();
recycled.remove(id);
return id;
}
throw new RuntimeException("...");
}

int recycle(int id) {
if (recycle.contains(id)) {
throw new RuntimeException("...");
}
recycled.add(id);
}
}

Time and space complexity of this code is clearly optimal. All operations take O(1) time and we need to track recycled ids in memory which will need O(n) space in the worst case. However, it is possible to optimize constant factor. Notice that all ids fall in [0, MAX_ID), which means we can easily replace the hash set with a bitset which is more space efficient in the worst case. However, one challenge is finding an element in a bitset requires traversing the whole set in the worst case, which is O(n). Is there a way to have a data structure that is compact in space like bitset, while allows you to efficiently find an element in the set? The answer is yes. Imagine a full binary tree whose leaf nodes are labeled with numbers from 0 to n. Each node may have two color: blue or red. All nodes are blue at the beginning. This tree can be used to as a set of integers from 0 to n. A red leaf node indicates that corresponding number is in the set. Blue leaf node means the opposite. Red internal node indicates there is some red leaf in its sub tree. Blue means the opposite. It is easy to insert/remove a number from the set. It is also easy to get one number in the set. All operation takes O(log n) time. When n is INT_MAX, the height of the tree will be 32, which can be usually treated as constant. We don’t want to use transitional binary tree node class to implement this data structure because of its memory overhead. Two key observations give us compact implementation: each node has 2 states, which fits into 1 bit; binary tree is full. We can totally use heap like array representation of binary tree and each element is a single bit, which means we just need a bit set with around 2*n bits to store this tree.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
class BitTree {

private final int\[\] arr;
private final int height;
// starting index of leave bits
private final long base;

// stores int value in \[0, n)
BitTree(int n) {
// find the number of bits needed to encode the tree
long size = 1;
int height = 0;
while (size < n) {
size <<= 1;
height++;
}
size *= 2;
// number of ints in array
int len = Math.max(1, (int) (size >> 5));
arr = new int\[len\];
this.height = height;
this.base = 1L << height;
}

void insert(int v) {
long cur = v + base;
while (cur >= 1) {
set(cur, true);
cur >>= 1;
}
}

void remove(int v) {
set(v + base, false);
long cur = (v + base) >> 1;
while (cur >= 1) {
set(cur, get(cur << 1) || get((cur << 1) + 1));
cur >>= 1;
}
}

int getElement() {
if (!get(1)) {
throw new NoSuchElementException();
}
int cur = 1;
for (int i = 0; i < height; i++) {
cur <<= 1;
if (!get(cur)) {
cur++;
}
}
return (int) (cur - base);
}

boolean isEmpty() {
return !get(1);
}

boolean contains(int v) {
return get(base + v);
}

private boolean get(long i) {
int idx = (int) (i >> 5);
int offset = (int) (i & 31);
return (arr\[idx\] & (1 << offset)) != 0;
}

private void set(long i, boolean v) {
int idx = (int) (i >> 5);
int offset = (int) (i & 31);
if (v) {
arr\[idx\] |= 1 << offset;
} else {
arr\[idx\] &= ~(1 << offset);
}
}

The memory usage of _BitTree _is around 2x to 4x of transitional bitset, which is still very compact. getElement() returns the smallest element in the set. It is easy to make it return the largest element instead. It is also not hard to find a successor of a number in the set in O(log n) time, which means traversing this bitset can be done much more efficiently than transitional bitset.