Saturday, August 19, 2017

Solving a Sunday Puzzle with Python and NLTK

Solving a Sunday Puzzle with Python and NLTK

NPR (the US version of Radio 4, if you like) has a regular Sunday puzzle (starring Will Shortz, the NYTimes puzzle editor).

The 2017-07-09 puzzle goes as follows:
Take a certain 7-letter word. Remove the first letter and you get a 6-letter synonym of that word. And the letter you removed is an abbreviation for the opposite of both words.
I spent quite a bit of time trying, unsuccessfully,  to come up with the answer - I kept thinking it would be a inflammable/flammable type word but the only 1-letter abbreviations I could think of were thinks like l, m, s for large, medium, small.

So I figured I'd use a word list to just look through all the seven letter words whose last six letters also made a word. I decided to use the Python natural language toolkit, or NLTK, purely as a dictionary. It turned out to be a wise choice as it wasn't as easy as I'd hoped:

from nltk.corpus import words
set(w for w in words.words() if len(w) == 7)
seven = set(w for w in words.words() if len(w) == 7)
print("Found {} 7 letter words".format(len(seven)))
six = set(w for w in words.words() if len(w) == 6)
print("Found {} 6 letter words".format(len(six)))

Which gives

Found 23870 7 letter words
Found 17708 6 letter words

So we have about 24k 7-letter words. How many have a 6-letter word as a suffix:

sub_is_six = set(w for w in seven if w[1:] in six)
print("Found {} 7 letter words where word[1:] is also a word".format(len(sub_is_six)))

Found 1621 7 letter words where word[1:] is also a word

Ok, there are 1621 7-letter words that have this property: slotter, apathic, zincite, barrack, hattery, etc, etc. I tried manually exploring the list, but it was just too many.

Luckily with NLTK, it's possible to get a list of synonyms for words using the wordnet module. Here's a function that returns a set of synonyms for a given word:

def get_synonyms(word):
    return {n for x in nltk.wordnet.wordnet.synsets(word) 
            for n in x.lemma_names()}

Now we can use this to filter our 7-letter words:

sub_is_syn = {w for w in sub_is_six if w[1:] in get_synonyms(w)}
sorted(sub_is_syn)

Which gives us a list of 11 words (much more manageable): arouser, asquint, challah, crumple, factual, grumble, opossum, orotund, screaky, spatter, twinkle.

Factual is the correct answer, as it's the only one whose first letter is an abbreviation for its antonym: 'f' for false.

After doing this it's easy to extend to words with more or less letters. For example amongst 9-letter words whose 8-letter suffix is a synonym of the original we can find the python keyword enumerate, and the old word for odometer, hodometer. Interestingly, hodometer comes from the greek word hodos (ancient Greek for for path) and when combined with meta (meaning development in ancient Greek) it produces the word 'method'.