Example for TreeList and BinaryIndexedTree

We mentioned BinaryIndexedTree and TreeList in previous post. Now I will give an example problem where both data structure are needed.

Given array a[0…n-1], which stores a permutation of 0 to n-1. We define array b[0…n-1] using following equation:

b[i]=|{x|x occurs before i in a[], and x > i}|

The problem is: given a[], compute b[] in O(nlogn) time; given b[], compute a[] in O(nlogn) time.

To convert a[] into b[], assume we have an array t[0…n](initially all 0). We go through each elements in a[], and for each element a[i] we encounter, b[a[i]]=t[0]+…+t[a[i]], and then we set t[a[i]]=1. The correctness is obvious. Implementing this method naively gives O(n^2). Notice that binary indexed tree is designed for this kind of prefix sum operation:

[sourcecode language=”cpp”]
vector<int> a2b(vector<int> a){
vector<int> b(a.size());
BITWraper t(a.size(),0);
for (int i = 0; i < a.size(); i++) {
return b;

By default, binary indexed tree’s array index starts from 1. Thus, we implemented a wrapper class, which allows arbitrary starting index.

How to computing a[] given b[]? Let’s look at large elements first. Let’s first write down the largest element, for each elements i, there are b[i] elements greater than it. Since we write down elements in descending order, all b[i] elements must have been written down. Thus, we insert i into b[i] position. After all elements are inserted, a[] is constructed. Again, implementing using array or linked list will result in O(n^2) complexity, while TreeList only need O(nlogn) time to do all insertions.

[sourcecode language=”cpp”]
vector<int> b2a(vector<int> b){
TreeList<int> a;
for(int i=b.size()-1;i>=0;i–)
return a.toVector();

C code that print its own source

Very interesting. The trick is that use s to print itself, thus s should not contain any character like “\n”, “\”” which will be displayed differently as its literal value. Thus, we use numbers to represent those special characters.
[sourcecode language=”cpp”]
#include <stdio.h>
int main(){const char *s="#include<stdio.h>%cint main(){const char *s=%c%s%c;printf(s,10,34,s,34);}";printf(s,10,34,s,34);}


As we know, array data structure has O(1) access time and O(n) insertion/deletion(kth element); linked list need O(n) for all operations, and O(1) insertion/deletion only when node pointer is given.

We also know that binary search tree and skip-list provide O(log n) for accessing, insertion and deletion.

Now what if we want a data structure, which behaves like a list, which can insert/delete element at kth position efficiently?

One of my friend suggest “blocked linked list”, which is a combination of array and linked list. Assume the input data size N is known, we can use a list of arrays to store data. List has length sqrt(N), each array also has length sqrt(N). Insertion and deletion has complexity O(sqrt(N)), and array is split when length exceed sqrt(N), and adjacent blocks are merged when length smaller than sqrt(N). This data structure is very popular in high school programming contest, where data size is known in advance. However, O(sqrt(N)) is not good enough: we want O(log n) for all operations.

Another method is to use binary search tree: binary search tree with “size” field can find “kth” element in O(log n) time. Elements have a floating point value as key for comparison. When inserting kth element, we first look for element k-1 and element k, and take their mean as key for new element. Thus new element is inserted into kth position. However, this method is not that elegant: size of the list depends on floating point precision.

Finally, comes our “TreeList” data structure. I don’t think I am the first guy who invent it, and I don’t know whether there is any other name for such data structure. The TreeList support 3 operations:

  • insert element into kth position
  • remove element in kth position
  • get kth element

The implementation is based on treap. Actually, any binary search tree can be used. The complexity for all 3 operations are O(log n). Source code is shown below.

When do we need this data structure? Follow my latest post

template<class T>
class TreeList{
  // list structure, support 3 operations:
  // -insert element into kth position
  // -remove element into kth position
  // -get kth element
  // implemented using treap
  struct node{
    T value;
    int pri,size;
    node *left,*right;
    node(const T& v,node* l,node* r){
  void resize(node* cur){
  void left_rotate(node* &cur){
    node *res=cur->left;
  void right_rotate(node* &cur){
    node *res=cur->right;
  void remove_node(node* &cur){
    if(cur->left == null && cur->right == null){
    }else if(cur->left->pri < cur->right->pri){

  void insert_to_kth_pos(const T& x,node* &cur,int k){
    if(cur == null && k==0){
      cur = new node(x,null,null);
    }else if(k <= cur->left->size){
      if(cur->pri > cur->left->pri)
    }else if(k <= cur->size){
      if(cur->pri > cur->right->pri)
  void remove_kth_element(node* & cur,int k){
    if(k == cur->left->size){
    }else if(k < cur->left->size){
      if(cur->pri > cur->left->pri)
    }else if(k < cur->size){
      if(cur->pri > cur->right->pri)
  void print(node* cur){
    if(cur == null)return;
    cout<<cur->value<<" ";
  TreeList (){
    null=new node();
    null->pri = 0x7fffffff;
    null->size = 0;
  void print(){
  void insert(const T& x,int k){
  void remove(int k){
  T get(int k){
    node* cur=root;
      if(k < cur->left->size){
      }else if(k == cur->left->size){
    return cur->value;



Warning: this post is very technical. You may find it very hard to understand, or boring.

One of my friend asked me about this problem.

The description of this problem is simple: given a string pattern, find the expected position of first occurrence of such pattern in random strings.

The problem looks like dynamic programming, because there seems exists some recursive relations between its subproblems.

Let’s first assume there exists a random string generator, which generate random strings letter by letter. Left F[n] be the number of expected letters to be generated before the pattern occur, given that most recently generated n letters matches first n letters of the pattern. For example, assume the pattern is ABCDE, and the random string generator generates ….ABC, then F[3] would be the expected number of letters needed before ABCDE is generated.

We can consider “N-prefix” of the pattern as “state” for DP(described by N), and F[0] would be the answer we need: expected number of letters needed before pattern is observed, given no prefix match(random string generator hasn’t generated anything yet).

Now let’s see how one state transfer into another state. Given that most recent generated N letters match N-prefix of patter, if one more letters is generated, how would be the suffix-prefix match goes? In the best case, it will become N+1, if the newly generated letter matches the N+1th letter in the pattern. In other cases, the state would transfer to some number no larger than N, and that number can be found easily: find the largest prefix-suffix match between generated string and pattern.

Now the recursive relationship between states becomes clear: F[i] = 1+\sum_{c=’A’}^{‘Z’}F[next(i,c)]/26. Base case would be F[pattern.length]=0.
In the original problem, number of letters is a variable, rather than fixed 26. next(i,c) is state transition function.

Now comes another problem: this DP cannot be solved because states graph is not DAG, and no topological order can be found. Actually, this is not a DP problem. After listing out all equations of form F[i] = 1+\sum_{c=’A’}^{‘Z’}F[next(i,c)]/26, we can already solve the linear system: using Gaussian-Elimination.

By now, the problem is mostly solved. However, when I submit the code to the judge, I got “wrong answer”. I guess this is because of precision problem: I use double data type, since the expected length can be real number. However, the required output is integer, thus I cast the result into “long long” data type. Real number data type can store 15-16 significant digits. However, long long data type can represent numbers with almost 20 significant digits. Thus, in extreme case, converting double may lose necessary precision when the answer is very large.

I fixed this problem by using “long double” instead of “double”. To my best knowledge, “long double” is some system dependent data type in C. It has higher precision than normal double: on my machine(gcc), sizeof(long double) gives 12, and actually 80 bits are used to store data, 16 more bits are for padding. I am still not sure how many significant digits “long double” can have, but this problem shows that at least it is useful in some cases.

#include <vector>
#include <iostream>
#include <algorithm>
#include <string>
using namespace std;
const long double EPI=1e-6;
string s;
int N;
long double abs(long double a){return a>0?a:-a;}
int g(int i,char c){
  string t=s.substr(0,i)+c;
  for(int j=0;j<i+1;j++){
      return i+1-j;
  return 0;
void sub_from(vector<long double> & a,vector<long double> &b, long double k){
  // a=a-bk
  for(int i=0;i<a.size();i++)
vector<long double> solve_linear_system(vector<vector<long double> >& a){

  for(int i=0;i<a.size();i++){
    int j;
      if(abs(a[j][i]) > EPI)
  for(int i=a.size()-1;i>=0;i--){
    for(int j=i-1;j>=0;j--)
  vector<long double> ans;
  for(int i=0;i<a.size();i++)
  return ans;
long double solve(){
  int m=s.size();
  vector<vector<long double> > a;
  for(int i=0;i<m;i++){
    a.push_back(vector<long double>(m+2,0));
    for(int j=0;j<N;j++)
  a.push_back(vector<long double>(m+2,0));
  a[m][m]=1; //xm=0
  vector<long double> ans=solve_linear_system(a);
  return ans[0]; //x0
int main(int argc, const char *argv[])
  int times;
  for(int i=0;i<times;i++){
    cout<<"Case "<<i+1<<":"<<endl<<(long long)solve()<<endl;
  return 0;


My grep

Thanks to Robert Sedgewick(Princeton)’s lecture notes, I just implemented a ‘grep’ program, which perform regular expression matching.
The code is only about 100 lines long, with the help of c++ STL.
Matching is done by converting RE into NFA. The claimed complexity is O(NM), N is the length of string, M is the length of RE.

Now it only support very basic regular expression matching:

a-z,A-Z,0-9 literal meaning
. match any character
* repeat multiple times
() group regular expression
| logic or, must appear inside ()

Can do some improvement in the future:
Support RE like a|b, i.e. | without parentheses. This can be done by adding ( ) around the input.
Add support for RE like [a-z], [^a-z], [abc].(may need a parser for this)
Add support for meta characters like \d: may need an abstract character type, which can match certain set of characters.
To support ^ and $, I think we need another abstract type for input string characters.

GDB tips

gdb is quite a useful tool. Despite it’s text interface, it can do more than GUI debuggers. I am still learning it, and this post is used as a memo.

  • ptype is useful to print type, structure(class) definition. This is useful to check user defined data types(sometimes deeply defined in header files).
  • disassemble is useful to see assembly code of function/statement.
  • info registers shows registers information.
  • print a@10 shows 10 elements start from address of variable a.
  • x command examine memory content. x/8xw $eip displays 8 words since where eip register points to. x/i $eip
  • shows instruction eip register points to.

extern “C”

You may have seen following C++ code somewhere.

[sourcecode language=”cpp”]
extern "C"{

Why do we need the extern “C”?

The answer is available in this post. The idea is, C++ compiler treat c++ differently from C. A function void f(int i) in C code is translated into _f in assembly, while the same function in C++ code will be translated into __Z1fi(the i at the end means first parameter is i, but I am not sure what does __Z1 seems represent return type). Compiler determin whether the code is C or C++ by looking at file extension only(.c for c file, .cpp for c++ file). C++ code is treated differently because C++ language support function overloading. That is to say, void f(int n) and void f(char c) are two different function because they have different parameter type. Compiler implement this by using different names for these two functions: including parameter type into function name. This method works fine, but when you want to call a C function in your C++ code, and the C function is already compilede as C code, then you have trouble.

[sourcecode language=”cpp”]
// sqr.c
int sqr(int n){
return n*n;
[sourcecode language=”cpp”]
// main.c
int sqr(int n); //declaration
int main(){
return 0;
The code above works fine. However, if you rename “main.c” into “main.cpp”, then you have problem: compiler(actually linker) complains that “int sqrt(int) is not defined”. This is because, as a c++ function, “sqr” in “main.cpp” is translated into “__Z3sqri”, while as C code, “sqr” in “sqr.c” is translated into “_sqr”. Thus, linker cannot find the function “int sqrt(int)”. To solve this problem, we can use extern “C”{} to enclose sqr function declaration, so that the function name is translated into C style.
[sourcecode language=”cpp”]
// main.cpp
extern "C"{
int sqr(int n); //declaration
int main(){
return 0;

Please compile and link above code using g++ instead of gcc, or else you may get “undefined reference to `__gxx_personality_v0′” problem.

You may ask why functions declared in “stdio.h” works well for both C and C++ code? You can check out preprocessor’s output:

[sourcecode language=”cpp”]
//dummy c++ code
int main(){
return 0;
[sourcecode language=”cpp”]
extern "C" {
int __attribute__((__cdecl__)) fscanf (FILE *, const char *, …) __attribute__ ((__format__ (__scanf__, 2, 3)));

int __attribute__((__cdecl__)) printf (const char *, …) __attribute__ ((__format__ (__printf__, 1, 2)));

int __attribute__((__cdecl__)) scanf (const char *, …) __attribute__ ((__format__ (__scanf__, 1, 2)));

The output of preprocessor use extern “C” to enclose all standard C library function declaration. However, you will not be able to find extern “C” if you change file extension to .c. Stanfdard library uses following trick to do this:

[sourcecode language=”cpp”]
// /usr/include/sys/cdefs.h
#ifdef __cplusplus
# define __BEGIN_DECLS extern "C" {
# define __END_DECLS }
# define __BEGIN_DECLS
# define __END_DECLS

All C library header files include this “cdefs.h”, and use __BEGIN_DECLS and __END_DECLS to enclose function declarations.

Python tips

  • When write to global variable in function, declare the variable as global ... at the first line of function definition.
  • Conditions can be chained(!!). 1 < a < 3 checks that a is both less than 3 and more than 1.
  • If you want to execute python code without starting a new console(in windows), change python file extension’s name to “pyw”. This is useful for GUI applications.
  • __import__ function can import modules dynamically.
  • multi parameter function:

    • def f(*a): all parameters in list a
    • def f(**a): all parameters in dict a (should be called in this way: f(a=1,b=2))
    • When using python’s iterative shell, _ variable stores return value of last command.

“Regular Expression” matches all non-prime numbers

I found a “regular expression” which matches all strings consists of non-prime number of 1s.

This is quite surprising because it can be proven that non-prime number of 1s is not a regular language. That is to say, no regular expression can match that language.

However, this regular expression works.
Following python code will print out all prime numbers within [0,99]

[sourcecode language=”python”]
import re
filter(lambda n:not re.match(r"^1?$|^(111*)\1+$","1"*n),range(100))

So why it works? Let’s look at the regular expression: ^1?$ matches “” and “1”, which is trivial. In ^(111*)\1+$, (111*) matches 2 or more 1s, so (111*)\1+ matches n*k 1s, k>=2, n>=2, which is exactly all composite numbers.

Then what’s wrong with the theory? Actually, nothing is wrong. The “regular expression” in most programming language is much more expressive than the formal regular expression defined in computational theory. In this case, we do not allow “\1” in formal regular language. Actually, this “\1” may potentially require unbounded amount of memory, because the program need to remember the matched string in “\1”, which can be arbitrarily long. In formal regular language, the memory cost should be bounded when regular expression is given: bounded by the size of corresponding DFA. Thus, “\1” kind of stuff gives regular expression much more power.

Next question is: is “regular expression” in programming language equivalent to Turing Machine?

burrows wheeler algorithm for data compression

Burrows Wheeler algorithm is another popular data compression algorithm. My implementation follows the guideline here.

The consists of three parts:

  1. Burrows Wheeler transformation
  2. Move to front transformation
  3. Huffman conpression
  4. key step of this algorithm is Burrows Wheeler transformation. Details will be given describe this later.

    Experiment result
    data huffman lzw Burrows Wheeler
    all_same_char(best case) .139 .048 .165
    equal_freq(a-z) .686 .188 0.269
    random_printable_char .847 1.089 .870
    Google.htm .724 .626 .695
    random_byte 1.169 1.639 1.189
    huffman.exe .671 .611 .692
    NIPS2009_0300.pdf .997 1.280 1.006
    jquery.js .683 .523 0.685
    long_cn_utf-8.txt .732 .561 .769
    long_cn_unicode.txt .889 .729 0.936
    Wikipedia-logo.png 1.043 1.505 1.062
    Wallpapers.jpg 1.001 1.247 1.022
    google_logo.gif 1.190 1.494 1.215
    lena.bmp 0.94 .957 0.906
    compressed file 1.028 1.461 1.043

    It’s hard to say whether it’s good or bad. Discussion session comes later.