Thursday, March 28, 2013

Converting Grayscale to RGB with Numpy

Converting Grayscale to RGB with Numpy


There's a lot of scientific two-dimensional data out there, and if it's grayscale, sooner or later you need to convert it to RGB (or RGBA). I couldn't find any info about the bast way to do this in numpy, a typical scenario is converting a x by y array of floats into a x by y by 3 array of 8-bit ints. Numpy has a few ways to do it (that I could think of) and I figured testing them is the only real way to see which way's the fastest.
In the following, I'm assuming that any processing steps have already been taken, and all that's left is to truncate the values to unsigned 8-bit.

The file containing the tests can be downloaded from github.

Method 1: Basic

The first way I tried was to simply create the image and populate it with the array three times:

def to_rgb1(im):
    # I think this will be slow
    w, h = im.shape
    ret = np.empty((w, h, 3), dtype=np.uint8)
    ret[:, :, 0] = im
    ret[:, :, 1] = im
    ret[:, :, 2] = im
    return ret

This can be slightly modified (in Python, multiple assignment works by assigning the rightmost expression to all of the others, so the two methods here are different):

def to_rgb1a(im):
    # This should be fsater than 1, as we only
    # truncate to uint8 once (?)
    w, h = im.shape
    ret = np.empty((w, h, 3), dtype=np.uint8)
    ret[:, :, 2] =  ret[:, :, 1] =  ret[:, :, 0] =  im
    return ret

def to_rgb1b(im):
    # I would expect this to be identical to 1a
    w, h = im.shape
    ret = np.empty((w, h, 3), dtype=np.uint8)
    ret[:, :, 0] = im
    ret[:, :, 1] = ret[:, :, 2] = ret[:, :, 0]
    return ret

Method 2: Broadcasting


We can create the array as before, and assign it in one line, so I'd hoped this would be faster than method 1:

def to_rgb2(im):
    # as 1, but we use broadcasting in one line
    w, h = im.shape
    ret = np.empty((w, h, 3), dtype=np.uint8)
    ret[:, :, :] = im[:, :, np.newaxis]
    return ret

Method 3: Using dstack

The dstack method stacks arrays in the third dimesnion, and as the docs say 'This is a simple way to stack 2D arrays (images) into a single 3D array for processing'.

def to_rgb3(im):
    # we can use dstack and an array copy
    # this has to be slow, we create an array with
    # 3x the data we need and truncate afterwards
    return np.asarray(np.dstack((im, im, im)), dtype=np.uint8)


def to_rgb3a(im):
    # we can use the same array 3 times, converting to
    # uint8 first
    # this explicitly converts to np.uint8 once and is short
    return np.dstack([im.astype(np.uint8)] * 3)

But, there's a problem with the above: The returned array is not contiguous. This can be seen by comparing np.frombuffer(to_rgb3a(im)) vs np.frombuffer(to_rgb1(im)). Since most display code needs a contiguous buffer of RGB pixels this won't work. Luckily the copy function will fix this for us, but it requires an extra step:

def to_rgb3b(im):
    # as 3a, but we add an extra copy to contiguous 'C' order
    # data
    return np.dstack([im.astype(np.uint8)] * 3).copy(order='C')

Method 4: Using weave

Another way to do this is to use the weave module (included with SciPy) to write C code inline, which gets compiled to native code. Since many of the numpy functions are native code anyway it's not obvious this is going to make any difference, but it's something to try.

def to_rgb4(im):
    # we use weave to do the assignment in C code
    # this only gets compiled on the first call
    import scipy.weave as weave
    w, h = im.shape
    ret = np.empty((w, h, 3), dtype=np.uint8)
    code = """
    int impos=0;
    int retpos=0;
    for(int j=0; j<Nim[1]; j++)
    {
        for (int i=0; i<Nim[0]; i++)
        {
            unsigned char d=im[impos++];
            ret[retpos++] = d;
            ret[retpos++] = d;
            ret[retpos++] = d;
        }
    }
    """
    weave.inline(code, ["im", "ret"])
    return ret

Method 5: Using repeat

This was suggested by greger:

def to_rgb5(im):
    im.resize((im.shape[0], im.shape[1], 1))
    return np.repeat(im.astype(np.uint8), 3, 2)

Testing and timing

To test the above methods, I used inspection to find any functions beginning 'to_rgb' and tested them on data returned from the numpy.random.uniform function (which returns float64 data) for various image sizes, after checking all functions gave identical output.

The full code is here:

import numpy as np
import time
import sys


def to_rgb1(im):
    # I think this will be slow
    w, h = im.shape
    ret = np.empty((w, h, 3), dtype=np.uint8)
    ret[:, :, 0] = im
    ret[:, :, 1] = im
    ret[:, :, 2] = im
    return ret

def to_rgb1a(im):
    # This should be fsater than 1, as we only
    # truncate to uint8 once (?)
    w, h = im.shape
    ret = np.empty((w, h, 3), dtype=np.uint8)
    ret[:, :, 2] =  ret[:, :, 1] =  ret[:, :, 0] =  im
    return ret

def to_rgb1b(im):
    # I would expect this to be identical to 1a
    w, h = im.shape
    ret = np.empty((w, h, 3), dtype=np.uint8)
    ret[:, :, 0] = im
    ret[:, :, 1] = ret[:, :, 2] = ret[:, :, 0]
    return ret

def to_rgb2(im):
    # as 1, but we use broadcasting in one line
    w, h = im.shape
    ret = np.empty((w, h, 3), dtype=np.uint8)
    ret[:, :, :] = im[:, :, np.newaxis]
    return ret


def to_rgb3(im):
    # we can use dstack and an array copy
    # this has to be slow, we create an array with
    # 3x the data we need and truncate afterwards
    return np.asarray(np.dstack((im, im, im)), dtype=np.uint8)


def to_rgb3a(im):
    # we can use the same array 3 times, converting to
    # uint8 first
    # this explicitly converts to np.uint8 once and is short
    return np.dstack([im.astype(np.uint8)] * 3)


def to_rgb3b(im):
    # as 3a, but we add an extra copy to contiguous 'C' order
    # data
    return np.dstack([im.astype(np.uint8)] * 3).copy(order='C')


def to_rgb4(im):
    # we use weave to do the assignment in C code
    # this only gets compiled on the first call
    import scipy.weave as weave
    w, h = im.shape
    ret = np.empty((w, h, 3), dtype=np.uint8)
    code = """
    int impos=0;
    int retpos=0;
    for(int j=0; j<Nim[1]; j++)
    {
        for (int i=0; i<Nim[0]; i++)
        {
            unsigned char d=im[impos++];
            ret[retpos++] = d;
            ret[retpos++] = d;
            ret[retpos++] = d;
        }
    }
    """
    weave.inline(code, ["im", "ret"])
    return ret

def to_rgb5(im):
    im.resize((im.shape[0], im.shape[1], 1))
    return np.repeat(im.astype(np.uint8), 3, 2)

funcs = dict(((x,eval(x)) for x in list(globals()) if "to_rgb" in x))
print "testing Numpy v",np.version.version
print "on Python ", sys.version
for size in [64,256,1024,2048]:
    s = np.random.uniform(256, size=(size,size))
    op = [funcs[f](s) for f in funcs]
    assert(all(map(lambda x: np.array_equal(x, op[0]), op)))
    times=min(3*10**7/size**2, 10**5) or 1
    print "\n\nFor Size",size,"\n==============="
    for name, func in sorted(funcs.items()):
        start = time.time()
        for i in range(times):
            func(s)
        end=time.time()
        print "%s took \t%0.3f ms average for %d times" % (
            name, (1000*(end-start)/times),times)

Results

The results were a little bit unexpected, and highlight the idiom that testing is the only real way to gauge performance. The following table lists the average time in ms to process a 1024x1024 image. The Windows results were using a version of numpy compiled against MKL as found here. All results were from Python 2.7.3.

Architectureto_rgb1to_rgb1ato_rgb1bto_rgb2to_rgb3to_rgb3ato_rgb3bto_rgb4to_rgb5
MacBookPro
Win64/Python32
14.8 14.8 8.9 15.9 28.8 8.7 9.55.926.7
MacBookPro
Win64/Python64
3.3 3.2 4.7 6.8 17.4 3.8 4.83.721.1
MacBookPro
OSX64/Python32
2.6 2.6 3.7 9.2 20.1 2.9 3.41.419.9
MacBookPro
OSX64/Python64
2.7 2.6 3.6 5.7 16.7 2.8 4.12.118.4
Xeon W3520
Win64/Python32
9.2 9.2 8.4 22.4 54.1 12.3 40.36.8
Xeon W3520
Win64/Python64
5.7 5.3 7.2 13.7 51.3 10.9 30.04.4

The first thing to notice is just how varied these different methods are. to_rgb3 is the only obviously sub-optimal implementation, and it shows, being the slowest for any configuration. There is a surprising difference in the performance under Windows 7 vs MacOSX for the same hardware, with the Mac performing noticeably better. to_rgb3a works well under most configurations, but the overhead of the extra copy in rgb_3b varies from being relatively minor (~10% for Win64/Python32) to awful (~200% for the Xeon Win64/Python64).  The Python32 results under Win64 are awful for the MacBook, but relatively better for the Xeon machine.

1 and 1a produced identical results, with 1b working noticeably better under Win64/Python32 but worse in the other configurations.  Method 2 was surprisingly slow, especially considering it was one call into C code.

In general the results above are just confusing, and vary significantly on different platforms and different architectures. Methods 1/1a give generally decent results, and I can't think of a good reason not to recommend them, so I'd have to go with 1a as the least worst most generally optimal solution, and the one I'd tentatively recommend. The weave results are generally good, but they do require a compiler to be installed and set up on the machine in use. Another thing to note is that converting to 8-bit doesn't coerce values, so you need to be careful to do this yourself if needed. In this case using weave to coerce and assign all at once may well give the best performance.

Monday, March 25, 2013

Koch Snowflakes and CDFG

Koch Snowflakes and CFDG

CFDG ('Context-Free Design Grammer', but don't let that put you off) is a fun little program that allows you to build up complex pictures using elementary shapes, often repeated thousands or even millions of times. It makes it very easy to do recursive shapes, and also random-branching (for instance, you can specify that shape 'A' does one thing randomly  99 times out of 100 and something else the other time). It has an online gallery of user submitted pictures and the code used to generate them. 

What I really like about CDFG is the organic quality some of the pictures have, and also some of the complexity that's possible with a very small amount of code. It's fun to look through the most popular images and try to guess how long the code is just by looking at the output.

Koch Snowflake

Because CDFG is good at recursive code, it's easy to make certain kinds of fractals. For example, one side of a koch snowflake can be made from repetitions of a simple basic shape:
To make this in CDFG, we could use the following code:

startshape spike

sqrt3over4 = sqrt(3)/4
one_third = 1/3

shape line
{
SQUARE[s 1 0.01]
}

shape spike
{
line[x -1]
line[r 60 x -0.25 y sqrt3over4]
line[r -60 x 0.25 y sqrt3over4]
line[x 1]
}

Here we define two shapes, 'line' which draws a line (we're actually using a squished square, scaled 1 in the x dimension and 0.01 in the y) and 'spike', which draws four lines (the x and y values specify relative locations and r is a rotation in degrees). To generate the fractal we just need to make one change to the line function to call spike again reduced in size by a factor of 3:

startshape spike

sqrt3over4 = sqrt(3)/4
one_third = 1/3

shape line
{
SQUARE[s 1 0.01]
spike[s one_third]
}

shape spike
{
line[x -1]
line[r 60 x -0.25 y sqrt3over4]
line[r -60 x 0.25 y sqrt3over4]
line[x 1]
}

Which gives us the following:

There's a small problem with this approach: when we scale the next generation of the shape, it scales in all dimensions, including the line width, making later generations harder to see and so we don't get a good view of what we really want to see, which is what the final generations tend to.

What's fun about the code above is that, since we're only interested in the final generations, and since the final generations are going to eventually be ~1pixel big, it really doesn't matter what shape we use in the line shape. It helps to make the shape transparent (we can use the 'a' (alpha) parameter for that) so that we don't block subsequent generations, but it doesn't have to be a line at all. For example, if we replace the SQUARE[s 1 0.01] with CIRCLE[a -0.9] we get the following:
We can use all kinds of shapes in the line shape to produce different early generations, but all tending towards the same snowflake perimeter.
SQUARE[s 2 a -0.9]
TRIANGLE [s 1.5 alpha -0.9]

CIRCLE [  s .5 alpha -0.8]

CIRCLE [s .5 alpha -0.8 y -.288675]

SQUARE[s 1 x 1 y -1 a -0.9]


You can find my small selection of CFDG scripts here.

Sunday, March 17, 2013

LocoWordFinder, Part III


LocoWordFinder, Part III

In parts I and II we made a simple client-based anagram solver. It has the ability to handle multiple blank tiles, regular expression based filtering and it all happens in real time as the user types.

In the final part, we're going to look back at the program and make a few changes to our dictionary javascript file.

The files can be found on github.

Dictionary File

The source dictionary file is 1.66Mb, but once we've converted it into a javascript object, it has ballooned to 4.14Mb, a lot larger. The reason for this is that we're doing some preprocessing (with the python script from part I) and giving jaavscript both a sorted set of letters and all anagrams of those letters. This has a large overhead, as many words don't share that many anagrams, and the javascript object code ends up looking like:
var v={...,'aeeeinpsssssv':['passivenesses'],'ellmpsuu':['plumules'],'acmnooottxyy':['cytotaxonomy'],...}

Where we're essentially making the file roughly twice as big as it needs to be. How about if we returned a list of  list of anagrams, and let the client code create the object by working out the sorted list of component letters itself? That way, our javascript object now becomes:

var vk=[..,['discerned','rescinded'],['electric'],['backspaced'],['fir','rif'],['ifs'],..]

And all the client needs to do is sort the letters from the first element in each list and make that the method name for that list. This makes our anagram_keyless.js file only 2.30Mb, only slightly larger than the source file. It takes an extra 4s on a relatively powerful PC to sort the lists over the previous version, but uses a lot less bandwidth.

There's also a second advantage to this method: In the previous method we used strings as the keys, then converted to a sorted list of characters. This list was easier to analyse and allowed us to add and remove elements as needed. But we then had to finally convert to a string again before looking up the result. When we use the client to sort the letters, we can keep the key as a list of characters instead of a string, and we don't have to worry about converting back and forth between them anymore (I'd expected this to make a substantial speed difference, but trying it out, I couldn't see any such difference. It's still an advantage if it makes the code simpler, which it does).

Conclusion

So here's the final version, with the keyless dictionary file. Again, it hasn't been tested on other browsers, so take care. Overall, javascript performance was better than I had hoped. It does take a while for it to load and parse a 2.3Mb list, but once it does searching for keys is very fast, and it makes a very usable, realtime anagram finding tool.

Sunday, March 10, 2013

LocoWordFinder Part II

LocoWordFinder, Part II

In Part I we looked at converting a dictionary text file into a format suitable for javascript. Then we created a simple client website where the user is presented anagrams of words, live, as they are typed in. Next we're going to add blanks, and make the client look a little nicer.

The final files can be found on github.

Blanks


We want to allow the user to use '?' in the search term to represent any character. If we ask for the anagrams of 'cv?' we want the program to return 'vac' (the only three letter word containing v and c).

To do this, instead of looking up just one request, we look up 26^n, where n is the number of blanks we have. We achieve this by changing the client code slightly. Instead of searching for requested string, we make a new function, function get_strings(word) which returns a list of strings for a word where any '?'s get replaced with all possible letters (if there are no '?'s, we simply return a list with just the original string).

The function looks like:
function get_strings(word)
{
 //we return a list of the strings to check, with no blanks, it's simply the sorted letters in word
 ret=[];
 var bytes=str_to_chars(word)

 var blanks=[]; //index of our blank squares
 for(var i=0; i<bytes.length; i++)
 {
  if(bytes[i]=='?')
   blanks.push(i);
 }

 if(blanks.length>0)
 {
  for(var i=0; i<Math.pow(26, blanks.length); i++)
  {
   var local_bytes=bytes.slice(0);
   var modi=i;
   for(var bi=0; bi<blanks.length; bi++)
   {
    local_bytes[blanks[bi]] = String.fromCharCode(97+modi%26); //97 =='a'
    modi /= 26;
   }
   local_bytes.sort();
   ret.push(local_bytes)
  }
 }
 else
 {
  bytes.sort()
  ret.push(bytes);
 }
 return ret;
}
We then call our recursive anagram function, recuranagram, repeatedly with each returned string.

Scoring, Sorting and Formatting

We also want to score the words, and, to make things look nicer, I decided to put the words in a table, with each column containing words with the same number of letters. Another nice feature is to indicate whether a found word has a blank in it, so there's a function to put html tags around any blanks in a word. Finally, we add a simple regex filter: if this is specified, only words that match the regex will be returned.

I've added some CSS to the top of the file to make the table look a little nicer.

The file can be found  here. It needs the anagram.js file from the previous post to work.

In the final part we'll look at making the anagram.js file a little smaller.

Monday, March 4, 2013

Loco Word Finder Part I

LocoWordFinder, Part I

I've started playing Words With Friends with, well, a friend. Words With Friends (WWF from now on) has a 'feature' where it only allows you to play a word if it's in its dictionary, and you can try as many times as you like. This has two drawbacks, one being that the whole element of challenging words that you believe to be false has been removed from the game, and the second that it's hard to justify not using some kind of word search program to help you. After all, if you can just sit there trying every combination of words anyway, why not just skip that part and let a computer program do it for you?

So, my friend and I decided to play, allowing each other to look up words as we wanted. He recommended the scrabblecheat website, which is perfectly functional, the only problem being that, to me, it seemed a little slow as all requests are sent to the server. So I looked around to see if there were any equivalent programs out there that stored the word list locally and did all the processing on the client side.

I couldn't find any, so I decided to try and write one. I wasn't sure the large dictionaries needed could be stored and accessed quickly enough using Javascript, but it turns out (for Chrome at least) that this wasn't a problem.

The final files for the project can be found on github.

1. Getting the Words

The first step is working out how to store all the words. I wanted to store the words in javascript in such a way that for a given, alphabetized list of letters, I would be able to get all words which were anagrams of those letters. Once this functionality is there, it's just a matter of javascript working out all possible alphabetical combinations of letters for a given set of letters and then displaying all matching words. In Python, this would be a dictionary, but in javascript it's simply an object, created with a similar syntax.

The next step is to then get the dictionary. It doesn't look like there's an official WWF word list, but google finds a few sources, for instance here. The dictionary is in a text format with a single word on each line:
aa
aah
aahed
aahing
aahs
...

And we need to convert it into a javascript object where the member names are the alphabetized order of the letters, and the value of that member is a list of all the anagrams of those letters. We define it as:
var v = {'aa':['aa'], 'aah':['aah','aha']...
so that afterwards, for example,  v.aa = ['aa']  and v.einsttw = ['entwist','twinset']. To do this I wrote a simple script in python to convert the WWF wordlist into a javascript object:
from collections import defaultdict
from collections import defaultdict
use_subset = False
dsource = open("wordswithfriends.txt")
words = [x.strip() for x in dsource.readlines()]
if use_subset:
    words = words[:1000]
dsource.close()
# create a dictionary where
# dict['string in order']=[all words with those letters]
anagrams = defaultdict(list)
for w in words:
    anagrams[''.join(sorted(w))].append(w)
# we now export to a json compatible format. Could use json module,
# but it's simple enough to do ourselves
osource = open('anagram.js', 'w')
osource.write("var v={")
for i, key in enumerate(anagrams):
    if i > 0:
        osource.write(",")
    osource.write(
        "'%s':[%s]" % (key, ','.join(["'" + x + "'" for x in anagrams[key]])))
    if i % 20 == 0:
        osource.write("\n")
osource.write("}")
osource.close()

This creates a file, anagram.js, containing a single javascript statement creating the object we need. The size of the file is 4.3Mb (we'll see how we can get this a little smaller by portioning off some of the work to the client in the next part).

2. Writing the client

Next we need to write a simple client that loads our anagram.js file. We then allow the user to type in a word, find all combinations of letters in that word, and see if any of them are contained in the large object, v. If they are, we add them to a list. Finally, we display the list to the user.
<html><body>
<script src="anagram.js"></script>
<script>
function str_to_chars(s)
{
 var bytes=[]
 for(var i=0; i<s.length; i++)
 {
  bytes.push(s[i]);
 }
 return bytes;
}
function chars_to_str(b)
{
 var ret=""
 for(var i=0; i<b.length; i++)
  ret+=b[i];
 return ret;
 //return String.fromCharCode.apply(null, b)
}


function recuranagram(s)
{
 //s is a sorted array of chars, we have to convert to string if needed
 if(s.length<2) return 0;
 if(searched[s])
  return;


 searched[s]=true;

 var s_string = chars_to_str(s)
 if(v[s_string])
 {
  for(var i=0;  i<v[s_string].length; i++)
  {
   found[v[s_string][i]]=true;
   //console.log("found"+v[s_string][i])
  }
 }

 var lastchar="";
 for(var i=0; i<s.length; i++)
 {
  if(lastchar==s[i])
   continue; //searching for aabob and removing first or second a is equivalent
  lastchar=s[i];
  //var bs=str_to_chars(s)
  //console.log(bs.length)
  var locals = s.slice(0); //returns a copy
  locals.splice(i,1)
  //console.log(bs.length)

  recuranagram(locals)
 }
}


found={};
searched={};
lastrequest="";

function myscript()
{

 document.getElementById("outp").innerHTML="";
 var request = document.getElementById("fname").value;
 if(request.indexOf(lastrequest)==-1 || lastrequest=="")
 {
  found={}
  searched={}
 }
 recuranagram(str_to_chars(request).sort())
 
 //console.log(bytes)
 
 var final_words = Object.keys(found);
 var foundhtml="<b>"+final_words.length+" matches for '"+request+"':</b><br>"
 for(var i=0; i<final_words.length; i++)
 {
  foundhtml+=final_words[i]+"<br>"
 }
 document.getElementById("outp").innerHTML =foundhtml

}

</script>


word:<input type="text" id="fname" onkeyup = "myscript()" /><br />
<span id="outp"></span>
</body>
</html>

This works quite nicely now: The user types in a word, and an instant word list is created for all anagrams in that word. In fact, it works so quickly that it should be possible to add blanks, and search 26 times for every blank entered. We'll be doing that in part II.

If you want to try this out, you will need to download anagram.js and anagramv1.html, put them both in the same folder, and point your browser to the html file. I've only tested this in Chrome, and it may very well break other browsers.

In Part II we'll work on improving the client, including adding blank letters, scores and using CSS to make it look a little nicer.