BitTree: a more powerful bitset

Recently I ran into a problem that asks for a data structure that efficiently allocate and recycle integer id in [0, MAX_ID). Each id can be allocated again after being recycled. The data structure runs on single machine in single thread. The goal is to optimize time and memory.

An intuitive solution is to use a set to track recycled ids, and use a counter to track maximum allocated id so far.

class Allocator {
  private static final int MAX_ID = 100000;

  private final Set<Integer> recycled = new HashSet<>();
  private final int maxAllocated = 0;

  int allocate() {
    if (maxAllocated < MAX_ID) {
      return maxAllocated++;
    }
    if (!recycled.isEmpty()) {
      int id = recycled.iterator().next();
      recycled.remove(id);
      return id;
    }
    throw new RuntimeException("...");
  }

  int recycle(int id) {
    if (recycle.contains(id)) {
      throw new RuntimeException("...");
    }
    recycled.add(id);
  }
}

Time and space complexity of this code is clearly optimal. All operations take O(1) time and we need to track recycled ids in memory which will need O(n) space in the worst case. However, it is possible to optimize constant factor.

Notice that all ids fall in [0, MAX_ID), which means we can easily replace the hash set with a bitset which is more space efficient in the worst case. However, one challenge is finding an element in a bitset requires traversing the whole set in the worst case, which is O(n). Is there a way to have a data structure that is compact in space like bitset, while allows you to efficiently find an element in the set? The answer is yes.

Imagine a full binary tree whose leaf nodes are labeled with numbers from 0 to n. Each node may have two color: blue or red. All nodes are blue at the beginning. This tree can be used to as a set of integers from 0 to n.

A red leaf node indicates that corresponding number is in the set. Blue leaf node means the opposite.

Red internal node indicates there is some red leaf in its sub tree. Blue means the opposite.

It is easy to insert/remove a number from the set. It is also easy to get one number in the set. All operation takes O(log n) time. When n is INT_MAX, the height of the tree will be 32, which can be usually treated as constant.

We don’t want to use transitional binary tree node class to implement this data structure because of its memory overhead. Two key observations give us compact implementation: each node has 2 states, which fits into 1 bit; binary tree is full. We can totally use heap like array representation of binary tree and each element is a single bit, which means we just need a bit set with around 2*n bits to store this tree.

class BitTree {

    private final int[] arr;
    private final int height;
    // starting index of leave bits
    private final long base;

    // stores int value in [0, n)
    BitTree(int n) {
        // find the number of bits needed to encode the tree
        long size = 1;
        int height = 0;
        while (size < n) {
            size <<= 1;
            height++;
        }
        size *= 2;
        // number of ints in array
        int len = Math.max(1, (int) (size >> 5));
        arr = new int[len];
        this.height = height;
        this.base = 1L << height;
    }

    void insert(int v) {
        long cur = v + base;
        while (cur >= 1) {
            set(cur, true);
            cur >>= 1;
        }
    }

    void remove(int v) {
        set(v + base, false);
        long cur = (v + base) >> 1;
        while (cur >= 1) {
            set(cur, get(cur << 1) || get((cur << 1) + 1));
            cur >>= 1;
        }
    }

    int getElement() {
        if (!get(1)) {
            throw new NoSuchElementException();
        }
        int cur = 1;
        for (int i = 0; i < height; i++) {
            cur <<= 1;
            if (!get(cur)) {
                cur++;
            }
        }
        return (int) (cur - base);
    }

    boolean isEmpty() {
        return !get(1);
    }

    boolean contains(int v) {
        return get(base + v);
    }

    private boolean get(long i) {
        int idx = (int) (i >> 5);
        int offset = (int) (i & 31);
        return (arr[idx] & (1 << offset)) != 0;
    }

    private void set(long i, boolean v) {
        int idx = (int) (i >> 5);
        int offset = (int) (i & 31);
        if (v) {
            arr[idx] |= 1 << offset;
        } else {
            arr[idx] &= ~(1 << offset);
        }
    }

 

The memory usage of BitTree is around 2x to 4x of transitional bitset, which is still very compact.

getElement() returns the smallest element in the set. It is easy to make it return the largest element instead.

It is also not hard to find a successor of a number in the set in O(log n) time, which means traversing this bitset can be done much more efficiently than transitional bitset.

Skip List in LevelDB

In earlier post, we presented a skip list implementation in C++.

I am going to go thorough a skip list implementation in real world software: leveldb.

What is LevelDB

A key-value store library open-sourced by Google.  It borrowed ideas from BigTable.

LevelDB is currently being used in many software like Chrome browser.

Project page: http://leveldb.org

Github: https://github.com/google/leveldb

How is Skip List used in LevelDB

Skip list is used as a buffer that tracks recent writes. The buffer is used to efficiently serve read with key that is recently touched. When buffer is full, its content is flushed into a SSTable (sorted string table) for persistence.

Why Not Hash Table

Hash table is good for serving reads, but not great for flushing data into SSTable.  As its name indicates, SSTable content is sorted byte strings, which is good for look things up with binary search. Flushing buffer content to SSTable require traversing buffer contents in sorted order, which is hard for hash table, but easy for Skip List.

Why Not Binary Search Tree

Binary search tree (BST) is a good data structure that supports efficient look up and efficient content traversal in sorted order. I think the biggest advantage of skip list over BST is its concurrent property. Skip list supports concurrent read while write is on going (though writes have to go in sequential). I may create a separate post to go into more details.

Code Structure

There are two key sub-classes: Node and Iterator.

Node Code

// Implementation details follow
template<typename Key, class Comparator>
struct SkipList<Key,Comparator>::Node {
  explicit Node(const Key& k) : key(k) { }

  Key const key;

  // Accessors/mutators for links.  Wrapped in methods so we can
  // add the appropriate barriers as necessary.
  Node* Next(int n) {
    assert(n >= 0);
    // Use an 'acquire load' so that we observe a fully initialized
    // version of the returned Node.
    return reinterpret_cast<Node*>(next_[n].Acquire_Load());
  }
  void SetNext(int n, Node* x) {
    assert(n >= 0);
    // Use a 'release store' so that anybody who reads through this
    // pointer observes a fully initialized version of the inserted node.
    next_[n].Release_Store(x);
  }

  // No-barrier variants that can be safely used in a few locations.
  Node* NoBarrier_Next(int n) {
    assert(n >= 0);
    return reinterpret_cast<Node*>(next_[n].NoBarrier_Load());
  }
  void NoBarrier_SetNext(int n, Node* x) {
    assert(n >= 0);
    next_[n].NoBarrier_Store(x);
  }

 private:
  // Array of length equal to the node height.  next_[0] is lowest level link.
  port::AtomicPointer next_[1];
};

What is AtomicPointer

A pointer object that supports atomic read and write. Each atomic read/write is guarded by memory barrier, which flushes CPU cache lines to make sure changes are visible to other threads.

What’s the difference between AtomicPointer and volatile pointer

My guess is, AtomicPointer supports both read/write with/without memory barrier, which is more flexible and efficient when memory barrier is unnecessary.

Why next_ has length 1

Nodes are constructed using factory method.

template<typename Key, class Comparator>
typename SkipList<Key,Comparator>::Node*
SkipList<Key,Comparator>::NewNode(const Key& key, int height) {
  char* mem = arena_->AllocateAligned(
      sizeof(Node) + sizeof(port::AtomicPointer) * (height - 1));
  return new (mem) Node(key);
}

It is my first time seeing this syntax, but I think this creates a Node object with next_ pointing to an array of length height. This hack avoids space overhead of vector or other dynamic array, which is neat.

Iterator Code

class Iterator {
  public:
   // Initialize an iterator over the specified list.
   // The returned iterator is not valid.
   explicit Iterator(const SkipList* list);

   // Returns true iff the iterator is positioned at a valid node.
   bool Valid() const;

   // Returns the key at the current position.
   // REQUIRES: Valid()
   const Key& key() const;

   // Advances to the next position.
   // REQUIRES: Valid()
   void Next();

   // Advances to the previous position.
   // REQUIRES: Valid()
   void Prev();

   // Advance to the first entry with a key >= target
   void Seek(const Key& target);

   // Position at the first entry in list.
   // Final state of iterator is Valid() iff list is not empty.
   void SeekToFirst();

   // Position at the last entry in list.
   // Final state of iterator is Valid() iff list is not empty.
   void SeekToLast();

  private:
   const SkipList* list_;
   Node* node_;
   // Intentionally copyable
 };

This iterator is different from traditional iterator in C++ or Java because it supports search feature.

Difference between mine and LevelDB Skip List implementation

Template and comparator

The biggest difference is probably LevelDB uses template and comparator to make the code work with any type of key.

Random height

template<typename Key, class Comparator>
int SkipList<Key,Comparator>::RandomHeight() {
  // Increase height with probability 1 in kBranching
  static const unsigned int kBranching = 4;
  int height = 1;
  while (height < kMaxHeight && ((rnd_.Next() % kBranching) == 0)) {
    height++;
  }
  assert(height > 0);
  assert(height <= kMaxHeight);
  return height;
}
int randomLevel(){
  int res=1;
  int r=(rand()<<16)+rand();
  while(res<maxLevel && (r&1)){
    res++;
    r>>=1;
  }
  return res;
}

My code generate height 1 with 1/2 chance, 2 with 1/4, 3 with 1/8….

LevelDB chose a different distribution: height 1 with (1 – 4/1),  2 with (1-1/4)^2, 3 with (1-1/4)^3…

This means LevelDB skip list is more “flat”, and probably less efficient for data lookup. However, it stores less pointer overall, and more space efficient than my code.

LevelDB implementation calls random number generator once per iteration, which is less efficient than my version.

Search

template<typename Key, class Comparator>
typename SkipList<Key,Comparator>::Node* SkipList<Key,Comparator>::FindGreaterOrEqual(const Key& key, Node** prev)
    const {
  Node* x = head_;
  int level = GetMaxHeight() - 1;
  while (true) {
    Node* next = x->Next(level);
    if (KeyIsAfterNode(key, next)) {
      // Keep searching in this list
      x = next;
    } else {
      if (prev != nullptr) prev[level] = x;
      if (level == 0) {
        return next;
      } else {
        // Switch to next list
        level--;
      }
    }
  }
}
Node* search(const T& x,vector<Node*>* update=0){
  Node* cur=head;
  for(int i=level-1;i>=0;i--){
    while(cur->next[i]!=tail &&
        cur->next[i]->value < x)
      cur=cur->next[i];
    if(update)update->at(i)=cur;
  }
  cur=cur->next[0];
  if(cur==tail || cur->value != x)
    cur = NULL;
  return cur;
}

There is really no much difference except that LevelDB version returns the first element that is greater than or equal to given key.

Insert

template<typename Key, class Comparator>
void SkipList<Key,Comparator>::Insert(const Key& key) {
  // TODO(opt): We can use a barrier-free variant of FindGreaterOrEqual()
  // here since Insert() is externally synchronized.
  Node* prev[kMaxHeight];
  Node* x = FindGreaterOrEqual(key, prev);

  // Our data structure does not allow duplicate insertion
  assert(x == nullptr || !Equal(key, x->key));

  int height = RandomHeight();
  if (height > GetMaxHeight()) {
    for (int i = GetMaxHeight(); i < height; i++) {
      prev[i] = head_;
    }
    //fprintf(stderr, "Change height from %d to %d\n", max_height_, height);

    // It is ok to mutate max_height_ without any synchronization
    // with concurrent readers.  A concurrent reader that observes
    // the new value of max_height_ will see either the old value of
    // new level pointers from head_ (nullptr), or a new value set in
    // the loop below.  In the former case the reader will
    // immediately drop to the next level since nullptr sorts after all
    // keys.  In the latter case the reader will use the new node.
    max_height_.NoBarrier_Store(reinterpret_cast<void*>(height));
  }

  x = NewNode(key, height);
  for (int i = 0; i < height; i++) {
    // NoBarrier_SetNext() suffices since we will add a barrier when
    // we publish a pointer to "x" in prev[i].
    x->NoBarrier_SetNext(i, prev[i]->NoBarrier_Next(i));
    prev[i]->SetNext(i, x);
  }
}
void insert(const T& x){
  vector<Node*> update(level);
  Node* cur=search(x,&update);
  if(cur){
    cur->value=x;
  }else{
    Node* newNode=new Node(x);
    int v=randomLevel();
    if(v > level){
      level=v;
      update.resize(level,head);
    }
    newNode->next.resize(v);
    for(int i=0;i<v;i++) {
      newNode->next[i]=update[i]->next[i];
      update[i]->next[i]=newNode;
    }
  }
}

There is no much difference except that LevelDB version uses atomic pointer methods to perform read/write with/without memory barrier to provide concurrent read safety.

How Skip List is used in LevelDB

A MemTable abstraction layer is built on top of skip list which exposes a key-value store interface. Skip list stores concatenated key-value byte string (tagged by length), which guarantees that Memtable data is sorted by keys.

class MemTableIterator: public Iterator {
 public:
  explicit MemTableIterator(MemTable::Table* table) : iter_(table) { }

  virtual bool Valid() const { return iter_.Valid(); }
  virtual void Seek(const Slice& k) { iter_.Seek(EncodeKey(&tmp_, k)); }
  virtual void SeekToFirst() { iter_.SeekToFirst(); }
  virtual void SeekToLast() { iter_.SeekToLast(); }
  virtual void Next() { iter_.Next(); }
  virtual void Prev() { iter_.Prev(); }
  virtual Slice key() const { return GetLengthPrefixedSlice(iter_.key()); }
  virtual Slice value() const {
    Slice key_slice = GetLengthPrefixedSlice(iter_.key());
    return GetLengthPrefixedSlice(key_slice.data() + key_slice.size());
  }
  .....
}
void MemTable::Add(SequenceNumber s, ValueType type,
                   const Slice& key,
                   const Slice& value) {
  // Format of an entry is concatenation of:
  //  key_size     : varint32 of internal_key.size()
  //  key bytes    : char[internal_key.size()]
  //  value_size   : varint32 of value.size()
  //  value bytes  : char[value.size()]
  size_t key_size = key.size();
  size_t val_size = value.size();
  size_t internal_key_size = key_size + 8;
  const size_t encoded_len =
      VarintLength(internal_key_size) + internal_key_size +
      VarintLength(val_size) + val_size;
  char* buf = arena_.Allocate(encoded_len);
  char* p = EncodeVarint32(buf, internal_key_size);
  memcpy(p, key.data(), key_size);
  p += key_size;
  EncodeFixed64(p, (s << 8) | type);
  p += 8;
  p = EncodeVarint32(p, val_size);
  memcpy(p, value.data(), val_size);
  assert((p + val_size) - buf == encoded_len);
  table_.Insert(buf);
}
bool MemTable::Get(const LookupKey& key, std::string* value, Status* s) {
  Slice memkey = key.memtable_key();
  Table::Iterator iter(&table_);
  iter.Seek(memkey.data());
  if (iter.Valid()) {
    // entry format is:
    //    klength  varint32
    //    userkey  char[klength]
    //    tag      uint64
    //    vlength  varint32
    //    value    char[vlength]
    // Check that it belongs to same user key.  We do not check the
    // sequence number since the Seek() call above should have skipped
    // all entries with overly large sequence numbers.
    const char* entry = iter.key();
    uint32_t key_length;
    const char* key_ptr = GetVarint32Ptr(entry, entry+5, &key_length);
    if (comparator_.comparator.user_comparator()->Compare(
            Slice(key_ptr, key_length - 8),
            key.user_key()) == 0) {
      // Correct user key
      const uint64_t tag = DecodeFixed64(key_ptr + key_length - 8);
      switch (static_cast<ValueType>(tag & 0xff)) {
        case kTypeValue: {
          Slice v = GetLengthPrefixedSlice(key_ptr + key_length);
          value->assign(v.data(), v.size());
          return true;
        }
        case kTypeDeletion:
          *s = Status::NotFound(Slice());
          return true;
      }
    }
  }
  return false;
}

Memtable serves as “buffer” purpose to store recent writes.

Flushing buffer onto disk is done by traversing memtable using iterator.

Status BuildTable(const std::string& dbname,
                  Env* env,
                  const Options& options,
                  TableCache* table_cache,
                  Iterator* iter,
                  FileMetaData* meta) {
...
  iter->SeekToFirst();

  std::string fname = TableFileName(dbname, meta->number);
  if (iter->Valid()) {
    WritableFile* file;
    s = env->NewWritableFile(fname, &file);
    if (!s.ok()) {
      return s;
    }

    TableBuilder* builder = new TableBuilder(options, file);
    meta->smallest.DecodeFrom(iter->key());
    for (; iter->Valid(); iter->Next()) {
      Slice key = iter->key();
      meta->largest.DecodeFrom(key);
      builder->Add(key, iter->value());
    }
....
  }
}

DB read is implemented by reading from memtable first. If not found, then read from SSTable files.

Status DBImpl::Get(const ReadOptions& options,
                   const Slice& key,
                   std::string* value) {
....
  MemTable* mem = mem_;
  MemTable* imm = imm_;
....
    // First look in the memtable, then in the immutable memtable (if any).
    LookupKey lkey(key, snapshot);
    if (mem->Get(lkey, value, &s)) {
      // Done
    } else if (imm != NULL && imm->Get(lkey, value, &s)) {
      // Done
    } else {
      s = current->Get(options, lkey, value, &stats);
      have_stat_update = true;
    }
....
}

Note that there are two memtables: mem_ and imm_.

mem_ is the buffer we talked about.

imm_ is the memtable being flushed to disk if there is any. It allows write to happen while memtable is being flushed to disk.

Inorder traversal in constant space – Morris algorithm explained

The well known recursive binary tree inorder traversal needs O(H) space where H is the height of the tree.

static void inorder(Node root, List<Integer> elements) {
    if (root == null) {
        return;
    }
    inorder(root.left, elements);
    elements.add(root.value);
    inorder(root.right, elements);
}

It is not hard to figure out iterative implementation by simulating back tracking behavior using a stack data structure, which still needs O(H) space.

If parent pointers are present, they will naturally forms stack structure, which makes backtracking feasible in O(1) space.

I recently learnt Morris algorithm, which does inorder traversal in O(1) extra space without using parent pointer. The source code follows.

static void inorder(Node node, List<Integer> elements) {
    while (node != null) {
        if (node.left != null) {
            Node t = node.left;
            while (t.right != null && t.right != node) {
                t = t.right;
            }
            if (t.right == node) {
                // visiting back
                elements.add(node.value);
                t.right = null;
                node = node.right;
            } else {
                t.right = node;
                node = node.left;
            }
        } else {
            elements.add(node.value);
            node = node.right;
        }
    }
}
Why mutating pointers?

Morris algorithm use some null pointers (only right null pointers) in the tree as location to store temporary data. All mutations are guaranteed to be reverted when traversal finishes.

It is kinda cheating to claim it uses O(1) extra space, but it is still an impressive algorithm.

why it works?

The contract of the algorithm is intuitive: it takes in a binary tree, and performs inorder traversal on it, with all mutations reverted.

If left subtree is empty, it is trivial: visit root and start over for the right subtree.

The algorithm will in-order traverse the right subtree by contract.

Let’s focus on the case when left subtree is not empty.

Find the precessor (the red node) of root and set its right pointer to root. start over for the left subtree.

Now from the left subtree perspective, the tree looks like following.

Let’s ignore the weird upward pointer for now. Something we know for sure:

  • the blue node will be visited after the red node when traversing the tree above.
  • if we try to find the precessor of the blue node, we are going to hit blue node itself again in the process.

These fact basically gives us opportunity to revert mutation after traversing the left subtree:

When finding precessor of root in previous step, if we ever find some node’s right pointer points at root, set its right pointer to null.  Now the tree looks the same as beginning.

Visit root and start over for the right subtree.

The same algorithm will deal with the right subtree for you.

Note that mutation in red is reverted in the green step. Given that left and right subtrees are handled by the same algorithm, which reverts all mutations by contract, we guarantee that all mutations are reverted after the algorithm finishes.

When the a iteration starts, it might do or undo mutation. The way to tell which is by looking at the right pointer of max node in left subtree.

The key idea is, the algorithm performs specific mutate to allows “backtracking” to the exact right node after finishes traversing sub tree.

A full example follows.

 

Why O(n) time?

It is clear that everything except for the inner while loop (find the last node in left sub tree) sums up to O(n).

How long does it take to perform “find last node in left sub tree” for each node?

It is not hard to observe that each edge in the tree is visited no more than once in such process, so O(n) is the answer.

Therefore, overall time complexity is still O(n).

 

Beautiful Scheme – Macros

Macros are not covered in SICP. I didn’t know how to use macro in scheme until  I read Peter Norvig’s JScheme code. Only then I started to understand why “symbol” and “quote” are important to scheme.

Scheme macros is similar to C macro, both transform some pattern into another. Scheme macros are more powerful because expansion/transformation happens at run time, and input/output patterns are structured scheme data that represents scheme code.

how is macro used?

Let’s look at an example. We know that let-expression can be replaced by equivalent lambda expression application.

(let ((x 1)(y 2)) (+ x y))
; is the same as 
((lambda (x y) (+ x y)) 1 2)

That transformation can be easily defined as a macro.

(define my-let 
          (macro (bindings . body)
                 (append (list (append 
                                (list 'lambda (map car bindings))
                                body))
                         (map cadr bindings))))

What the code does, is basically transform input like (my-let ((x 1)(y 2)) (+ x y))into ((lambda (x y) (+ x y)) 1 2).

Macro is very similar to functions, except that it’s parameter is quoted code, and it returns quoted code as well.

When scheme interpreter sees an macro application, it will

  1. expand macro by applying macro body with parameter bound to quoted parameter expressions (without evaluating them).
  2. evaluate result of macro expansion.
why is macro important to scheme?

One thing I like about macro is it makes scheme interpreter/compiler much easier to implement.

To write a scheme interpreter, you only need to implement very few core features (begin, define, lambda/macro, function call, quote, if expression, built-in functions). The rest of the language like let, cond, and, or can all be implemented in macros. JScheme source code uses this approach, and its source code contains a file that defines all these language features with macros. These definitions are reusable when writing scheme interpreter/compiler in the future.

It is amazing that that adding a language feature can make the language easier to implement.

What is quasiquotes, unquote and unquote-splicing?

Quasiquote is the same as quote except that you can “unquote” part of the expression.

(quote (1 (+ 2 3)))                
; usually written as '(1 (+ 2 3))
; => (1 (+ 2 3))

(quasiquote (1 (unquote (+ 2 3)))) 
; usually written as `(1 ,(+ 2 3))
; => (1 5)

Unquoted code is evaluated instead of being treated as data.

The best way to explain unquote-splicing is by example.

(quasiquote (1 (unquote-splicing (append (list 2) (list 3))) 4 ))
; usually written as `(1 ,@(append (list 2) (list 3)) 4)
; => (1 2 3 4)

; Compare to unquote
`(1 ,(append (list 2) (list 3)) 4)
; => (1 (2 3) 4)

These are utilities that make macro much easier to write.

“my-let” above can be written as

(define my-let 
          (macro (bindings . body)
                 `((lambda ,(map car bindings) . ,body)
                   ,@(map cadr bindings))))

It might be brain twisting when looking at these for the first time, but it will start to make more sense when started writing more macros.

Surprisingly, quasiquote, unquote, unquote-splicing can be implemented with macros. Following implementation is written by Darius Bacon, and I copied it from JScheme code.

(define quasiquote 
  (macro (x) 
    (define (constant? exp)
      (if (pair? exp) (eq? (car exp) 'quote) (not (symbol? exp))))
    (define (combine-skeletons left right exp)
      (cond
       ((and (constant? left) (constant? right)) 
  (if (and (eqv? (eval left) (car exp))
     (eqv? (eval right) (cdr exp)))
      (list 'quote exp)
      (list 'quote (cons (eval left) (eval right)))))
       ((null? right) (list 'list left))
       ((and (pair? right) (eq? (car right) 'list))
  (cons 'list (cons left (cdr right))))
       (else (list 'cons left right))))
    (define (expand-quasiquote exp nesting)
      (cond
       ((vector? exp)
  (list 'apply 'vector (expand-quasiquote (vector->list exp) nesting)))
       ((not (pair? exp)) 
  (if (constant? exp) exp (list 'quote exp)))
       ((and (eq? (car exp) 'unquote) (= (length exp) 2))
  (if (= nesting 0)
      (second exp)
      (combine-skeletons ''unquote 
             (expand-quasiquote (cdr exp) (- nesting 1))
             exp)))
       ((and (eq? (car exp) 'quasiquote) (= (length exp) 2))
  (combine-skeletons ''quasiquote 
         (expand-quasiquote (cdr exp) (+ nesting 1))
         exp))
       ((and (pair? (car exp))
       (eq? (caar exp) 'unquote-splicing)
       (= (length (car exp)) 2))
  (if (= nesting 0)
      (list 'append (second (first exp))
      (expand-quasiquote (cdr exp) nesting))
      (combine-skeletons (expand-quasiquote (car exp) (- nesting 1))
             (expand-quasiquote (cdr exp) nesting)
             exp)))
       (else (combine-skeletons (expand-quasiquote (car exp) nesting)
        (expand-quasiquote (cdr exp) nesting)
        exp))))
    (expand-quasiquote x 0)))

To be honest I still don’t full understand the code, but I did some tests and was convinced it works.

What is hygiene macro?

Macros are powerful, but there is an issue when used in practice. Consider following example

(define plus-one
          (macro (x)
                 `(let ((one 1))
                    (+ ,x one))))

(plus-one (begin (set! one 2) 1))
; (let ((one 1)) (+ (begin (set! one 2) 1) one))

User code might modify variables defined in macro body and cause unexpected behavior. Hygiene macro addresses this problem by replacing variables defined in macro body with unique identifiers dynamically so that user don’t have to worry about naming collision.

R5RS scheme has syntax-rules syntax which supports hygiene macro and pattern matching features. There are many resources on web about syntax-rule like this, and I still don’t understand it’s pattern matching mechanism well. But surprisingly again, syntax-rules can be implemented with regular macro as well! I found this implementation in JScheme home page.

What else can macro do?

It is possible to extend scheme language feature with macro.

One example is stream (infinite length list) processing. Scheme does not  natively support stream processing, but it is easy to add steaming support using macro.

(define stream-cons (macro (a b) `(cons ,a (lambda () ,b))))
(define stream-car car)
(define stream-cdr (lambda (s) ((cdr s))))
(define stream-null? null?)

Here is a program that creates infinite stream of prime numbers using Sieve of Eratosthenes.

(define (stream-map f s)
  (if (stream-null? s)
      '()
      (stream-cons (f (stream-car s)) (stream-map f (stream-cdr s)))))

(define (stream-filter f s)
  (cond
    ((stream-null? s) '())
    ((f (stream-car s)) (stream-cons (stream-car s) (stream-filter f (stream-cdr s))))
    (else (stream-filter f (stream-cdr s)))))

(define ones (stream-cons 1 ones))

(define ints (stream-cons 1 (stream-map (lambda (x) (+ x 1)) ints)))

(define evens (stream-filter (lambda (x) (= 0 (modulo x 2))) ints))

(define (sieve s)
  (let ((fst (stream-car s)))
    (stream-cons fst
                 (sieve (stream-filter (lambda (x) (not (= 0 (modulo x fst)))) (stream-cdr s))))))

(define prime (sieve (stream-cdr ints)))

Macro is such a simple yet powerful feature that makes scheme a beautiful languages. I am still learning and being surprised by what macro can do.

 

 

 

 

 

Beautiful Scheme – quote

Scheme is a beautiful language.

— prof Ben Leong

Recently I read some Scheme articles and I find myself start appreciating more about its beauty. I am going to start a series of posts to share my thoughts and problems I had in learning.

Code is data

Any technology sufficiently advanced is indistinguishable from magic.

— Arthur C. Clarke

Programming language interpreter is magic to beginners.

However, writing a scheme interpreter in scheme is surprisingly simple. There are many reasons, but the most important one is probably the fact that scheme code can be naturally represented in scheme list.

The boundary between code and data has been blurring since Von Neumann architecture, and many programming languages like Javascript supports evaluating code in string format at runtime. Scheme went one step further. Any scheme code can be converted to scheme data value with “quote” function.

“quote” function is something that confused me a lot when I learnt scheme. I listed some questions I had and my answers below.

why it is called “quote”?

Think about how you use javascript eval(). You write some JS code, quote them into a string and call eval on the string. It is the same for scheme, except that quoted scheme code is structured data, which is much easier to process.

what does “quote” exactly do?

Converts scheme code into data.

code data
‘123 [integer] 123
‘#t [boolean] #t
‘”hello” [string] “hello”
‘define [symbol] define
‘(1 (2 3)) [list] ([integer] 1, [list]([integer] 2, [integer] 3))
”(1 2) [list] ([symbol] quote, [list]([integer] 1, [integer] 2))

''(1 2) is actually (quote (quote (1 2))), which converts (quote (1 2))into data.

Unlike “” in javascript, you don’t need to escape “quote” inside your code because there is no ambiguity.

why does scheme need “quote”?

Scheme without “quote” is still Turing complete. I think the main reason for introducing “quote” is macros, which makes scheme flexible and extensible.

why do we need symbol?

Symbol seems to be the same as interned string. However, we cannot replace symbol with string because '"define"  and 'define will be not distinguishable.

To make 'define meaningful, a new first class type has to be introduced, and that is symbol. Note that introducing symbol data type does not add any syntax to scheme. 'define is just syntactical sugar for (quote define).

This may sounds hacky, but it worth it because it gives scheme the power to process code in the same way it process lists. We will see more about this in another post about macros.

 

Fancy-coder on raspberry pi

I created this blog to share thoughts and learning. This blog is hosted on a poor raspberry pi sits under my table to save electricity bills. Please bear with me if you find the site slow.

I imported some posts I wrote in school which I still find interesting. Source code in existing posts is not pretty-printed for unknown reason and I am too lazy to fix them.

GCC Allows Arrays of Variable Length!!

This is very surprising. When I was learning C, I was convinced that array size must be determined at compile time, and I never wrote a line of code that violate this. Today, my friend (a C beginner) asked me to help him to debug his code. His code contains following lines:

[sourcecode lang=”c”]
…..
int T, R, C, i, j;
…..
char tile[R][C];
…..
[/sourcecode]

To my great surprise, this code compiles with no error. According to this post and this page, C99 allows array size to be a variable, which is supported by GCC. This feature may free programmer from many new, delete, malloc, free and memory leak in many cases.

Javascript tips

I am reading Secrete of Javascript Ninja these days. This book introduced many js tips/patterns, and I wish to summarize what I find useful in this post.

Force the “new” operator

JS is not that object oriented. In JS, all functions is a constructor when it is called with a “new” operator, and “this” variable in the function is bounded to the newly constructed object automatically. This was considered a harmful pattern, because if I forgot to add “new”, then the function will not be considered a constructor, and “this” variable refers to global variable. However, this situation can be avoided by adding the “new” operator automatically if we forgot. Following code does this:

[sourcecode language=”javascript”]
function User(first, last){
if ( !(this instanceof arguments.callee) )
return new User(first, last);
….
}
[/sourcecode]

In this case, “User” cannot be used as a normal function.

Get function it self inside function execution

JS allows you to write code in functional programming style. In other functional programming languages like Scheme, it is not easy to do recursion using a lambda function, because you don’t know how to call the function it self: it got no name. In JS, you can always call current function through arguments.callee.

[sourcecode language=”javascript”]
var factorial=function(n){return n==0?1:n*arguments.callee(n-1);}
[/sourcecode]

Use setTimtout to help break time consuming script

If a script is running when page is loaded, then the page will only be displayed after the script finishes running. However, we can use setTimeout to break long script into small pieces. When the script is constructing page, page view update is not done immediately. Instead, it is pushed into a queue, and everything in the queue will be handled after script block finish running. setTimeout function will break the current script block, and let those tasks in the queue have a chance to be handled.
[sourcecode language=”javascript”]
bd=document.getElementsByTagName("body")[0];
tb=document.createElement("table");
bd.appendChild(tb);
for(var i=0;i<10000;i++){
var tr=document.createElement("tr");
var td=document.createElement("td")
td.appendChild(document.createTextNode(i));
tr.appendChild(td);
td=document.createElement("td")
td.appendChild(document.createTextNode(i*i));
tr.appendChild(td);
tb.appendChild(tr);
}
[/sourcecode]
[sourcecode language=”javascript”]
bd=document.getElementsByTagName("body")[0];
tb=document.createElement("table");
bd.appendChild(tb);
var j=10;
(function(){
for(var i=0;i<1000;i++){
var tr=document.createElement("tr");
var td=document.createElement("td")
td.appendChild(document.createTextNode(i));
tr.appendChild(td);
td=document.createElement("td")
td.appendChild(document.createTextNode(i*i));
tr.appendChild(td);
tb.appendChild(tr);
}
if(j–)setTimeout(arguments.callee,0);
})();
[/sourcecode]

Refer to group in regular expression

Like regular expressions in many other languages, JS use “\1” to refer to first matching group, “\2” for the second…However, this only works inside regular expression itself. One application is primality testing in my earlier post.

JS string also has “replace” method, which support regular expression replacement. To refer to matching group in the second argument of replace method, we need to use “$1”, “$2” instead of “\1”, “\2”.

[sourcecode language=”javascript”]
"hello world".replace(/(\w+)\s+(\w+)/,"$2 $1");
[/sourcecode]

Remove empty text node from HTML

When using DOM to get html node, text nodes containing white space only is very annoying. Following code removes white space text nodes using DFS.

[sourcecode language=”javascript”]
function cleanWhitespace( element ) {
// If no element is provided, do the whole HTML document
element = element || document;
// Use the first child as a starting point
var cur = element.firstChild;
// Go until there are no more child nodes
while ( cur != null ) {
// If the node is a text node, and it contains nothing but whitespace
if ( cur.nodeType == 3 && ! /\S/.test(cur.nodeValue) ) {
// Remove the text node
element.removeChild( cur );
// Otherwise, if it’s an element
} else if ( cur.nodeType == 1 ) {
// Recurse down through the document
cleanWhitespace( cur );
}
cur = cur.nextSibling; // Move through the child nodes
}
}
[/sourcecode]

Get element by class name

[sourcecode language=”javascript”]
function hasClass(name,type) {
var r = [];
var e = document.getElementsByTagName(type || "*");
for ( var j = 0; j < e.length; j++ )
if ( e[j].classList.contains(name) )
r.push( e[j] );
return r;
}
[/sourcecode]

Expressive Javascript

When I was learning Scheme, I was told that Scheme is a expressive language, because it can implement higher order function, closure, and even OOP.

Now I am learning javascript, and I find that JS is can be even more expressive than scheme. First, you can write C-style code in JS, which is difficult, or impossible in scheme. Moreover, JS has scheme’s functional programming behavior, and it can also implement higher order function, closure, and even OOP.

Let’s implement a Scheme’s classic example: print all permutation of a set of numbers

[sourcecode language=”javascript”]
cons=function(a,b){return {fst:a,rst:b};};
car=function(p){return p.fst;};
cdr=function(p){return p.rst;};
range=function(n){
var res=null;
for(i=n;i>0;i–){
res=cons(i,res);
}
return res;
};
list=function(){
var arg=[].slice.call(arguments);
if(arg.length==0)return null;
return cons(arg[0],list.apply(undefined,arg.slice(1)));
};
print=function(l){
var cur=l;
document.write("(");
while(cur){
var v=car(cur);
cur=cdr(cur);
if(typeof v == "number")
document.write(v+(cur?", ":""));
else
print(v);
}
document.write(")<br>");
};
map=function(f,l){
return l==null?null:cons(f(car(l)),map(f,cdr(l)));
}
filter=function(f,l){
if(l==null)return null;
if(f(car(l)))return cons(car(l),filter(f,cdr(l)));
return filter(f,cdr(l));
}
append=function(l1,l2){
if(l1==null)return l2;
return cons(car(l1),append(cdr(l1),l2));
}
flat=function(l){
if(l==null)return null;
if(typeof car(l) == ‘number’)return cons(car(l),flat(cdr(l)));
return append(car(l),flat(cdr(l)));
}
length=function(l){
return l==null?0:1+length(cdr(l));
};
perm=function(l){
if(length(l)==1)return list(l);
return flat(map(function(x){ return map(function(rst){return cons(x,rst);},perm(filter( function(t){ return t!=x; },l))) },l));
};
print(perm(list(1,2,3,4)));
[/sourcecode]

We start from simulating Pair and List in scheme, then implement operations on list, and finally implement “perm” operation. All implementations are following the way they implemented in scheme style (use recursions instead of loops).

An impressive example of scheme is it’s higher order function, which takes functions as arguments, or return functions. Higher order function is a more general function, and an good example is memoization.

Memoization is one implementation of Dynamic Programming: it keeps DP formula’s recursive style in code, which is easy to understand, while avoid repeated calculations by memorize all result for each states.

All memoizations have similar format:

if state in memo: 
   return memo[state]
if state is base case: 
   get result for state
else:
   calculate result using recursive formula
add result into memo
return result

Although all memoization code looks similar, we need to rewrite all code from scratch every time in languages like C. However, if we can use higher order function, memoization can be generalized into a template, where we only need to pass recursive formula and base case into it, and the higher order function handle the rest automatically.

[sourcecode language=”javascript”]
var memo=function(base,dp){
var f=[];
return function(){
var arg=[].slice.apply(arguments),res;
if(f[arg]!=undefined)return f[arg];
res=base.apply(undefined,arg);
if(res!=undefined)return f[arg]=res;
return f[arg]=dp.apply(0,arg);
};
};
[/sourcecode]

The higher order memoization can be implemented in lines of code in js. base argument is a function that returns answer inly on base case parameters, while return undefined otherwise. dp is function that implement dp formula using recursive calls.

Some interesting examples:
[sourcecode language=”javascript”]
var fact=memo(function(n){if(n==0)return 1;},function(n){return fact(n-1)*n;});
var fib =memo(function(n){if(n<=1)return n;},function(n){return fib(n-1)+fib(n-2);});
var comb=memo(
function(n,m){
if(n<0 || m<0 || m>n)
return 0;
if(n==m || m==0)
return 1;
},
function(n,m){
return comb(n-1,m-1)+comb(n-1,m);
});
[/sourcecode]

DP functions can also call each other:

[sourcecode language=”javascript”]
var even=memo(function(n){if(n==0)return true;},function(n){return odd(n-1);})
var odd=memo(function(n){if(n==0)return false;},function(n){return even(n-1);})
[/sourcecode]

Javascript is also able to do “currying”. Currying is how Haskell handle functions: “+” is a function takes 2 arguments, return the sum; “1+” is a function takes 1 arguments x, return x+1. That is to say, if you pass function with partial arguments in Haskell, you will get a function taking the rest arguments, and return the result with complete arguments.

[sourcecode language=”javascript”]
Function.prototype.curry=function(){
var args=[].slice.apply(arguments),f=this;
return function(){
return f.apply(this,args.concat([].slice.apply(arguments)));
};
};
add=function(a,b){return a+b;}
addOne=add.curry(1)
[/sourcecode]

Recursion to Loop

Sometimes, we need to convert recursive functions into loop for efficiency consideration, or stack size constrain. Converting tail recursion into iteration is trivial. The method we discuss in this post is a general method to convert any recursive functions into iterative ones.

The idea is to simulate stack behavior: we store frame record as a structure into a simulated stack when calling a function, and restore the top record in stack when function returns.

The record should contain 2 part of data:

  • local variable, including parameter
  • next instruction to execute(used when restoring the record)

We also need a global variable to store return value (the same as eax register).

“Next instruction to execute” can be stored using label, and we can “goto” corresponding instruction after restoring record. However, “goto” statement is not supported in some languages, and also not recommended. Thus, we use integer number to represent label id, and use switch statement to jump to different part of code.

Following example shows recursive and iterative version of factorization function. Iterative version is significantly longer in code length, and slightly slower in practice. However, iterative version can work on large input where recursive version may get stack overflow.

We select javascript to demonstrate because:

  • Dynamic type saves declaration code
  • Closure can simulate behavior of frame record perfectly (stack elements are actually functions)
  • Powerful literal value representation

 

var fact=function(n){
  return n==0?1:n*fact(n-1);
};
var Fact=function(n){
  var _ret;
  var stk=[];
  var construct=function(n){
    var l=0;
    return function(){
      switch(l){
        case 0:
          l=1;
          if(n==0){
            _ret=1;
            stk.pop();
            return;
          }
          stk.push(construct(n-1));
          return;
        case 1:
          _ret*=n;
          stk.pop();
          return;
      }
    };
  };
  stk.push(construct(n));
  while(stk){
    if(stk.length==0)return _ret;
    stk[stk.length-1]();
  }
};

 

Now let’s look at more complicated examples:

fibonacci number
var Fib=function(n){
  var _ret;
  var stk=[];
  var construct=function(n){
    var l=0,a,b;
    return function(){
      switch(l){
        case 0:
          l=1;
          if(n<=1){
            _ret=n;
            stk.pop();
            return;
          }
          stk.push(construct(n-1));
          return;
        case 1:
          l=2;
          a=_ret;
          stk.push(construct(n-2));
          return;
        case 2:
          b=_ret;
          _ret=a+b;
          stk.pop();
          return;
      }
    };
  };
  stk.push(construct(n));
  while(true){
    if(stk.length==0)return _ret;
    stk[stk.length-1]();
  }
}

 

hanoi tower
var Hanoi=function(a,b,c,n){
  var stk=[];
  var construct=function(a,b,c,n){
    var l=0;
    return function(){
      switch(l){
        case 0:
          l=1;
          if(n>0){
            stk.push(construct(a,c,b,n-1));
            return;
          }
          stk.pop();
          return;
        case 1:
          l=2;
          document.write(a+"->"+c+"<br>");
          stk.push(construct(b,a,c,n-1));
          return;
        case 2:
          stk.pop();
          return;
      }
    };
  };
  stk.push(construct(a,b,c,n));
  while(stk){
    if(stk.length==0)return;
    stk[stk.length-1]();
  }
};
merge sort
var MergeSort=function(a,i,j){
  if(!i)i=0;
  if(!j)j=a.length;
  var stk=[];
  var construct=function(a,i,j){
    var l=0;
    var mid,t=[],x,y;
    return function(){
      switch(l){
        case 0:
          l++;
          if(j-i<=1){
            stk.pop();
            return;
          }
          mid=floor((i+j)/2);
          stk.push(construct(a,i,mid));
          return;
        case 1:
          l++;
          stk.push(construct(a,mid,j));
          return;
        case 2:
          x=i, y=mid;
          while(x<mid && y<j){
            if(a[x]<a[y]){
              t.push(a[x]);
              x++;
            }else{
              t.push(a[y]);
              y++;
            }
          }
          while(x<mid)t.push(a[x++]);
          while(y<j)t.push(a[y++]);
          for(var k=i;k<j;k++)
            a[k]=t[k-i];
          stk.pop();
          return;
      }
    };
  };
  stk.push(construct(a,i,j));
  while(stk){
    if(stk.length==0)return;
    stk[stk.length-1]();
  }
};