Sunday, March 17, 2013

LocoWordFinder, Part III


LocoWordFinder, Part III

In parts I and II we made a simple client-based anagram solver. It has the ability to handle multiple blank tiles, regular expression based filtering and it all happens in real time as the user types.

In the final part, we're going to look back at the program and make a few changes to our dictionary javascript file.

The files can be found on github.

Dictionary File

The source dictionary file is 1.66Mb, but once we've converted it into a javascript object, it has ballooned to 4.14Mb, a lot larger. The reason for this is that we're doing some preprocessing (with the python script from part I) and giving jaavscript both a sorted set of letters and all anagrams of those letters. This has a large overhead, as many words don't share that many anagrams, and the javascript object code ends up looking like:
var v={...,'aeeeinpsssssv':['passivenesses'],'ellmpsuu':['plumules'],'acmnooottxyy':['cytotaxonomy'],...}

Where we're essentially making the file roughly twice as big as it needs to be. How about if we returned a list of  list of anagrams, and let the client code create the object by working out the sorted list of component letters itself? That way, our javascript object now becomes:

var vk=[..,['discerned','rescinded'],['electric'],['backspaced'],['fir','rif'],['ifs'],..]

And all the client needs to do is sort the letters from the first element in each list and make that the method name for that list. This makes our anagram_keyless.js file only 2.30Mb, only slightly larger than the source file. It takes an extra 4s on a relatively powerful PC to sort the lists over the previous version, but uses a lot less bandwidth.

There's also a second advantage to this method: In the previous method we used strings as the keys, then converted to a sorted list of characters. This list was easier to analyse and allowed us to add and remove elements as needed. But we then had to finally convert to a string again before looking up the result. When we use the client to sort the letters, we can keep the key as a list of characters instead of a string, and we don't have to worry about converting back and forth between them anymore (I'd expected this to make a substantial speed difference, but trying it out, I couldn't see any such difference. It's still an advantage if it makes the code simpler, which it does).

Conclusion

So here's the final version, with the keyless dictionary file. Again, it hasn't been tested on other browsers, so take care. Overall, javascript performance was better than I had hoped. It does take a while for it to load and parse a 2.3Mb list, but once it does searching for keys is very fast, and it makes a very usable, realtime anagram finding tool.

No comments:

Post a Comment