Sunday, December 14, 2014

Scala Bility

In the Coursera 'Functional Programming Principles in Scala' course there's an example of representing a set as a binary tree. In the lecture, Dr Ordersky defines a union method of a set returning a new set containing both the object and the argument. It's defined for the NonEmpty set as:

class NonEmpty(elem: Int, left: IntSet, right: IntSet) extends IntSet {
...
def union(other: IntSet): IntSet =
    ((left union right) union other) incl elem
...
}

There's another method, incl, that returns a set including that element. There's an exercise following the class where the task is to create a TweetSet class with similar functionality.

Unfortunately, defining union as described above causes the program to hang at the first access of the test dataset. It turns out the above is so slow creating a union of 7 sets, each with 100 elements, that it never finishes. One solution is to replace the union function with the following (just rearranging brackets):

class NonEmpty(elem: Tweet, left: TweetSet, right: TweetSet) extends TweetSet {
...
def union(other: TweetSet): TweetSet =
    left union (right union (other incl elem))
...
}

After which it evaluates almost immediately. So why is there such a difference between the two? We can compare performance by counting the number of times the union function gets called for varying set sizes, where Sn is a set with n elements, and u is the union function:


MethodS10uS10S10uS1S20uS1S50uS1
((LuR)uO)inclE90902,1792,131,662
Lu(Ru(OinclE))10102050

We can see here that the only thing that affects the amount of times union gets called is the size of the left array S10uS10 has the same calls as S10uS1, and so I used S1 on the right hand side for the other tests. We can also see how much less efficient the first method is: It takes over 2 million calls for a set of 50 elements, versus only 50 for the second method.

For the following, we consider AuB, where A has length N, and the left tree of A has length M (0 <= M <= N-1), leaving the right tree with N-M-1. We define T(N) as the time it takes to union two trees where the left hand side has length N, therefore AuB will take T(N), and that T(0)=1 and the cost of calling union once is 1.

In the first method, the first step (LuR) will take T(M) and return a tree of length N-1. Then (LuR)uO will take T(N-1). This gives us T(N)=T(M)+T(N-1)+1. In the best case scenario, M=0 and T(M)=T(0)=1. We then get O(N^2) execution time. In the worst case scenario, M=(N-1) and we have T(N) = 2*T(N-1), giving O(2^N) execution time. From the data above, I empirically get something like 1.4^N time.

For the second method, Ru(OinclE), will take T(N-M-1) and Lu(Ru(OinclE)) will take T(M) giving T(N) = T(N-M-1)+T(M)+1 with T(0)=1. This works out at O(N) execution time, as we see above.

There's a lot of hype surrounding functional programming at the moment. I was quite impressed with the ability to define data structures from really basic, almost trivially correct functions. Unfortunately, it's not possible to blindly implement things without considering the performance overhead, especially in recursive functions. Whether it's possible for smart compilers to optimize away problems like this or not, for the moment writing efficient functional programs relies on very careful analysis, and hacks like tail-recursion. As seen above, two functionally identical procedures can have wildly different execution times with just a simple rearrangement of parentheses, and predicting which one is more efficient isn't (to me at least) intuitive.