Solving a Sunday Puzzle with Python and NLTK
NPR (the US version of Radio 4, if you like) has a regular Sunday puzzle (starring Will Shortz, the NYTimes puzzle editor).
The 2017-07-09 puzzle goes as follows:
Take a certain 7-letter word. Remove the first letter and you get a 6-letter synonym of that word. And the letter you removed is an abbreviation for the opposite of both words.I spent quite a bit of time trying, unsuccessfully, to come up with the answer - I kept thinking it would be a inflammable/flammable type word but the only 1-letter abbreviations I could think of were thinks like l, m, s for large, medium, small.
So I figured I'd use a word list to just look through all the seven letter words whose last six letters also made a word. I decided to use the Python natural language toolkit, or NLTK, purely as a dictionary. It turned out to be a wise choice as it wasn't as easy as I'd hoped:
from nltk.corpus import words set(w for w in words.words() if len(w) == 7) seven = set(w for w in words.words() if len(w) == 7) print("Found {} 7 letter words".format(len(seven))) six = set(w for w in words.words() if len(w) == 6) print("Found {} 6 letter words".format(len(six)))
Which gives
Found 23870 7 letter words Found 17708 6 letter words
So we have about 24k 7-letter words. How many have a 6-letter word as a suffix:
sub_is_six = set(w for w in seven if w[1:] in six) print("Found {} 7 letter words where word[1:] is also a word".format(len(sub_is_six)))
Found 1621 7 letter words where word[1:] is also a word
Ok, there are 1621 7-letter words that have this property: slotter, apathic, zincite, barrack, hattery, etc, etc. I tried manually exploring the list, but it was just too many.
Luckily with NLTK, it's possible to get a list of synonyms for words using the wordnet module. Here's a function that returns a set of synonyms for a given word:
def get_synonyms(word): return {n for x in nltk.wordnet.wordnet.synsets(word) for n in x.lemma_names()}
Now we can use this to filter our 7-letter words:
sub_is_syn = {w for w in sub_is_six if w[1:] in get_synonyms(w)} sorted(sub_is_syn)
Which gives us a list of 11 words (much more manageable): arouser, asquint, challah, crumple, factual, grumble, opossum, orotund, screaky, spatter, twinkle.
Factual is the correct answer, as it's the only one whose first letter is an abbreviation for its antonym: 'f' for false.
After doing this it's easy to extend to words with more or less letters. For example amongst 9-letter words whose 8-letter suffix is a synonym of the original we can find the python keyword enumerate, and the old word for odometer, hodometer. Interestingly, hodometer comes from the greek word hodos (ancient Greek for for path) and when combined with meta (meaning development in ancient Greek) it produces the word 'method'.