Monday, March 4, 2013

Loco Word Finder Part I

LocoWordFinder, Part I

I've started playing Words With Friends with, well, a friend. Words With Friends (WWF from now on) has a 'feature' where it only allows you to play a word if it's in its dictionary, and you can try as many times as you like. This has two drawbacks, one being that the whole element of challenging words that you believe to be false has been removed from the game, and the second that it's hard to justify not using some kind of word search program to help you. After all, if you can just sit there trying every combination of words anyway, why not just skip that part and let a computer program do it for you?

So, my friend and I decided to play, allowing each other to look up words as we wanted. He recommended the scrabblecheat website, which is perfectly functional, the only problem being that, to me, it seemed a little slow as all requests are sent to the server. So I looked around to see if there were any equivalent programs out there that stored the word list locally and did all the processing on the client side.

I couldn't find any, so I decided to try and write one. I wasn't sure the large dictionaries needed could be stored and accessed quickly enough using Javascript, but it turns out (for Chrome at least) that this wasn't a problem.

The final files for the project can be found on github.

1. Getting the Words

The first step is working out how to store all the words. I wanted to store the words in javascript in such a way that for a given, alphabetized list of letters, I would be able to get all words which were anagrams of those letters. Once this functionality is there, it's just a matter of javascript working out all possible alphabetical combinations of letters for a given set of letters and then displaying all matching words. In Python, this would be a dictionary, but in javascript it's simply an object, created with a similar syntax.

The next step is to then get the dictionary. It doesn't look like there's an official WWF word list, but google finds a few sources, for instance here. The dictionary is in a text format with a single word on each line:
aa
aah
aahed
aahing
aahs
...

And we need to convert it into a javascript object where the member names are the alphabetized order of the letters, and the value of that member is a list of all the anagrams of those letters. We define it as:
var v = {'aa':['aa'], 'aah':['aah','aha']...
so that afterwards, for example,  v.aa = ['aa']  and v.einsttw = ['entwist','twinset']. To do this I wrote a simple script in python to convert the WWF wordlist into a javascript object:
from collections import defaultdict
from collections import defaultdict
use_subset = False
dsource = open("wordswithfriends.txt")
words = [x.strip() for x in dsource.readlines()]
if use_subset:
    words = words[:1000]
dsource.close()
# create a dictionary where
# dict['string in order']=[all words with those letters]
anagrams = defaultdict(list)
for w in words:
    anagrams[''.join(sorted(w))].append(w)
# we now export to a json compatible format. Could use json module,
# but it's simple enough to do ourselves
osource = open('anagram.js', 'w')
osource.write("var v={")
for i, key in enumerate(anagrams):
    if i > 0:
        osource.write(",")
    osource.write(
        "'%s':[%s]" % (key, ','.join(["'" + x + "'" for x in anagrams[key]])))
    if i % 20 == 0:
        osource.write("\n")
osource.write("}")
osource.close()

This creates a file, anagram.js, containing a single javascript statement creating the object we need. The size of the file is 4.3Mb (we'll see how we can get this a little smaller by portioning off some of the work to the client in the next part).

2. Writing the client

Next we need to write a simple client that loads our anagram.js file. We then allow the user to type in a word, find all combinations of letters in that word, and see if any of them are contained in the large object, v. If they are, we add them to a list. Finally, we display the list to the user.
<html><body>
<script src="anagram.js"></script>
<script>
function str_to_chars(s)
{
 var bytes=[]
 for(var i=0; i<s.length; i++)
 {
  bytes.push(s[i]);
 }
 return bytes;
}
function chars_to_str(b)
{
 var ret=""
 for(var i=0; i<b.length; i++)
  ret+=b[i];
 return ret;
 //return String.fromCharCode.apply(null, b)
}


function recuranagram(s)
{
 //s is a sorted array of chars, we have to convert to string if needed
 if(s.length<2) return 0;
 if(searched[s])
  return;


 searched[s]=true;

 var s_string = chars_to_str(s)
 if(v[s_string])
 {
  for(var i=0;  i<v[s_string].length; i++)
  {
   found[v[s_string][i]]=true;
   //console.log("found"+v[s_string][i])
  }
 }

 var lastchar="";
 for(var i=0; i<s.length; i++)
 {
  if(lastchar==s[i])
   continue; //searching for aabob and removing first or second a is equivalent
  lastchar=s[i];
  //var bs=str_to_chars(s)
  //console.log(bs.length)
  var locals = s.slice(0); //returns a copy
  locals.splice(i,1)
  //console.log(bs.length)

  recuranagram(locals)
 }
}


found={};
searched={};
lastrequest="";

function myscript()
{

 document.getElementById("outp").innerHTML="";
 var request = document.getElementById("fname").value;
 if(request.indexOf(lastrequest)==-1 || lastrequest=="")
 {
  found={}
  searched={}
 }
 recuranagram(str_to_chars(request).sort())
 
 //console.log(bytes)
 
 var final_words = Object.keys(found);
 var foundhtml="<b>"+final_words.length+" matches for '"+request+"':</b><br>"
 for(var i=0; i<final_words.length; i++)
 {
  foundhtml+=final_words[i]+"<br>"
 }
 document.getElementById("outp").innerHTML =foundhtml

}

</script>


word:<input type="text" id="fname" onkeyup = "myscript()" /><br />
<span id="outp"></span>
</body>
</html>

This works quite nicely now: The user types in a word, and an instant word list is created for all anagrams in that word. In fact, it works so quickly that it should be possible to add blanks, and search 26 times for every blank entered. We'll be doing that in part II.

If you want to try this out, you will need to download anagram.js and anagramv1.html, put them both in the same folder, and point your browser to the html file. I've only tested this in Chrome, and it may very well break other browsers.

In Part II we'll work on improving the client, including adding blank letters, scores and using CSS to make it look a little nicer.

No comments:

Post a Comment