Word Search in Python
This implementation of Word Search was, in most part, an experiment. An experiment to see how I utilize a tool, Python, to try and solve the problem of implementing basic word search solving algorithm.
- What is Word Search?
- How and where do we start?
- Which tool do we use?
- Python installation
- Pen and paper
- Onto the code!
- Closing remarks
Word search is a puzzle we usually see in newspapers, and in some magazines, located along the crossword puzzles. They can be located sometimes in bookstores, around the trivia area, as a standalone puzzle book, in which the sole content is a grid of characters and a set of words per page.
How a traditional word search puzzle works is, for a given grid of different characters, you have to find the hidden words inside the grid. The word could be oriented vertically, horizontally, diagonally, and also inversely in the previously mentioned directions. After all the words are found, the remaining letters of the grid expose a secret message.
In some word search puzzles, a limitation exists in the length of the hidden words, in that it should contain more than 2 characters, and the grid should have a fixed size of 10 x 10, or an equivalent length and width proportion (which this python implementation doesn’t have).
Before going deeper into the computer side of the algorithm, let’s first clarify how we tend to solve a word search puzzle:
- We look at a hidden word and its first letter then we proceed to look for the first letter inside the grid of letters.
- Once we successfully find the first letter of the hidden word inside the grid, we then check the neighboring characters of that successful match and check whether the second letter of our word matches any of the neighbors of the successful match.
- After confirming a successful match for the second letter of the hidden word through its neighbors, we proceed to a much narrower step. After the successful matching of the second letter of the word in the successful second match’s neighbors, we then follow-through to a straight line from there, hoping to get a third match (and so on) of the hidden word’s letters.
To realize this series of steps in solving a word search puzzle, we will utilize a programming language known for having a syntax similar to pseudo-code—Python.
There are two main versions of Python—versions 2.x and 3.x. For this project, we would be utilizing version 2.7.
For the installation part, we’ll be covering installation instructions for both Windows, Unix, and Linux.
First, determine whether you’re running a 32- or 64-bit operating system. To do that, click Start, right-click Computer, then click Properties. You should see whether you’re running on 32-bit or 64-bit under System type. If you’re running on 32-bit, click on this link then start the download; if you’re on 64-bit, click this one. Again, take note that we will be utilizing version 2.7 of Python.
Download this file then extract. After extraction, go inside the extracted directory then run the following:
$ ./configure $ make $ make install
In Linux/Unix, to make sure that we can actually run Python when we enter the
python command in a terminal, let’s make sure that the installed Python files can be located by the system.
Type the following, then press Enter on your corresponding shell:
Bash, Sh, Zsh, Ash, or Ksh:
$ export PATH="$PATH:/usr/local/bin/python"
Csh or Tcsh:
$ setenv PATH "$PATH:/usr/local/bin/python"
As problems go in software development or in programming in general, it is better to tackle the problem with a clear head—going at it with the problem statement and constraints clear in our minds. What we are going to do first is to outline the initial crucial steps in a word search puzzle.
First, write the word
dog, then on the space immediately below it, draw a grid of characters on the paper, like the following:
dog d o g g o o g o g o g d
To start the hunt, we look at the first letter of the word
dog, which is the letter
d. If, somehow, the first letter of
dog doesn’t exist within the grid, it means that we won’t be able to find the word in it! If we successfully find a character match for the first letter of
dog, we then proceed to look at the second letter of
dog. This time, we are now restricted to look around among the adjacent letters of the first letter match. If the second letter of
dog can’t be located around the adjacent letters of
d inside the grid, this means that we have to proceed to the next occurrence of the letter
d inside the grid.
If we find a successful match around the adjacent letters of the next occurrence of
d inside the grid, then the next steps are literally straightforward. For example:
* o d
In the previous grid, the first letter
d matched on the corner of the grid, and the word’s second letter
o which is adjacent to
d, also successfully matched. If that’s the case, the next location in the grid to check for the subsequent matches of the remaining letters of the word
dog, will now be in a straight line with the direction drawn from the first letter to the second letter. In this case, we will check the letter directly above
o for the third letter of the word
dog, which is
g. If instead of the asterisk, the grid showed:
d o d
This means that we don’t have a match, and we should be going to the next occurrence of the first letter, inside the grid. If the asterisk is replaced by the correct missing letter:
g o d
We have a match! However, for our version of word search, we will not stop there. Instead, we will count for all the adjacent letters of the letter
d, then look for the matches of the letter
o! For example, if we are presented with the following grid:
g g o o d
Then so far, for the word
dog, we found 2 matches! After all the neighbors of the letter
d have been checked for a possible match, we then move to the next occurrence of the letter in the grid.
With the basic algorithm in mind, we can now start implementing the algorithm from the previous section.
def matrixify(grid, separator='\n'): return grid.split(separator)
The purpose of this function is to return a list whose elements are lines of string. This provides us the ability to index individual elements of the grid through accessing them by their row and column indices:
>>> grid = 'dogg oogo gogd' >>> matrix = matrixify(grid, ' ') ['dogg', 'oogo', 'gogd'] >>> matrix 'g'
def coord_char(coord, matrix): row_index, column_index = coord return matrix[row_index][column_index]
Given a coordinate ((row_index, column_index) structure) and the matrix where this coordinate is supposedly located in, this function returns the element located at that row and column:
>>> matrix ['dogg', 'oogo', 'gogd'] >>> coord_char((0, 2), matrix) 'g'
def convert_to_word(coord_matrix, matrix): return ''.join([coord_char(coord, matrix) for coord in coord_matrix])
This function will run through a list of coordinates through a
for loop and gets the single length strings using
>>> [coord_char(coord, matrix) for coord in [(0, 0), (0, 1), (0, 2)]] ['d', 'o', 'g']
and then uses the
join() method of strings to return one single string. The
'' before the
join() method is the separator to use in between the strings, but in our case, we want one single word so we used an empty string separator.
def find_base_match(char, matrix): base_matches = [(row_index, column_index) for row_index, row in enumerate(matrix) for column_index, column in enumerate(row) if char == column] return base_matches
The value of
base_matches above is computed by a
list comprehension. A list comprehension is just another way of constructing a list, albeit a more concise one. The above list comprehension is roughly equivalent to the following:
base_matches =  for row_index, row in enumerate(matrix): for column_index, column in enumerate(row): if char == column: base_matches.append((row_index, column_index))
I used the
enumerate() function because it appends a counter to an iterable, and that is handy because the counter’s value could correspond to either the row or column indices of the matrix!
To show that the above code indeed scrolls through the individual characters of
grid, let’s modify the body of our
for loop in order to display the characters and their corresponding coordinates:
>>> for row_index, row in enumerate(matrix): ... for column_index, column in enumerate(row): ... print column, (row_index, column_index) ... d (0, 0) o (0, 1) g (0, 2) g (0, 3) o (1, 0) o (1, 1) g (1, 2) o (1, 3) g (2, 0) o (2, 1) g (2, 2) d (2, 3)
Giving our function
find_base_match the arguments
grid, respectively, we get the following:
>>> find_base_match('d', grid) [(0, 0), (2, 3)]
As you can see from the previous
for loop output, the coordinates output by our function are indeed the coordinates where the character
By calling this function, we can determine whether or not to continue with the further steps. If we deliberately give
find_base_match a character that is not inside
>>> find_base_match('c', grid) 
The function returns an empty list! This means, that inside the encompassing function that will call
find_base_match, one of the conditions could be:
if not find_base_match(char, grid): pass
def matched_neighbors(coord, char, matrix, row_length, col_length): row_num, col_num = coord neighbors_coords = [(row, column) for row in xrange(row_num - 1, row_num + 2) for column in xrange(col_num - 1, col_num + 2) if row_length > row >= 0 and col_length > column >= 0 and coord_char((row, column), matrix) == char and not (row, column) == coord] return neighbors_coords
This function finds the adjacent coordinates of the given coordinate, wherein the character of that adjacent coordinate matches the
neighbors_coords, we’re trying to create a list of all the coordinates adjacent the one we gave, but with some conditions to further filter the resulting coordinate:
[(row, column) for row in xrange(row_num - 1, row_num + 2) for column in xrange(col_num - 1, col_num + 2) if row_length > row >= 0 and col_length > column >= 0 and coord_char((row, column), matrix) == char and not (row, column) == coord]
In the above code snippet, we are creating a list of adjacent coordinates (through
(row, column)). Because we want to get the immediate neighbors of a certain coordinate, we deduct 1 from our starting range then add 2 to our end range, so that, if given a row of 0, we will be iterating through
xrange(-1, 2). Remember that the
xrange() functions is not inclusive of the end range, which means that it doesn’t include the end range in the iteration (hence, the 2 that we add at the end range, not only 1):
>>> list(xrange(-1, 2)) [-1, 0, 1]
We do the same to the column variable, then later, we filter the contents of the final list through an
if clause inside the list comprehension. We do that because we don’t want this function to return coordinates that are out of bounds of the matrix.
To further hit the nail in the coffin, we also give this function a character as its second argument. That is because we want to further filter the resulting coordinate. We only want a coordinate whose string equivalent matches the second character argument that we give the function!
If we want to get the neighbors of the coordinate
(0, 0), whose adjacent character in the matrix should be
c, call this function with
(0, 0) as the first argument, the string
c as the second, the matrix itself, and the matrix’s row length and column length, respectively:
>>> matched_neighbors((0, 0), 'c', matrix, 4, 3) 
Notice that it returns an empty list, because in the neighbors of the coordinate
(0, 0), there is no coordinate in there that has the string
c as its string equivalent!
If we replace
>>> matched_neighbors((0, 0), 'a', matrix, 4, 3) [(0, 1), (1, 0), (1, 1)]
This function returns a list of the adjacent coordinates that match the given character.
def complete_line(base_coord, targ_coord, word_len, row_length, col_len): if word_len == 2: return base_coord, targ_coord line = [base_coord, targ_coord] diff_1, diff_2 = targ_coord - base_coord, targ_coord - base_coord for _ in xrange(word_len - 2): line += [(line[-1] + diff_1, line[-1] + diff_2)] if 0 <= line[-1] < row_length and 0 <= line[-1] < col_len: return line return 
We are now at the stage where functions seem a bit hairier to comprehend! I will attempt to discuss the thoughts I had before creating this function.
In the Pen and paper section, after matching the first and second letters of the word inside the matrix, I mentioned that the next matching steps become narrower. It becomes narrower in the sense that, after matching the first and second letters of the word, the only thing you need to do after that is to go straight in the direction that the first and second letters created.
In the above grid, once the letters
o are found, one only need to go straight in a line from the first letter
d to the second letter
o, then take the direction that
d took to get to
o. In this case, we go upwards of
o to check for the third letter match:
* <- Check this next. o d
The direction that the above matches create is north-east. This means that we have to check the place north-east of 'o':
* <- This one. o d
With that being said, I wanted a function to give me all the coordinates forming a straight line, when given two coordinates.
The first problem I had to solve was—Given two coordinates, how do I compute the coordinate of the third one, which will later form a straight line in the matrix?
To solve this problem, I tried plotting all the expected goal coordinates, if for example, the first coordinate match is
(1, 1) and the second coordinate match is
first (1, 1) (1, 1) (1, 1) (1, 1) (1, 1) (1, 1) (1, 1) (1, 1) second (0, 0) (0, 1) (0, 2) (1, 2) (2, 2) (2, 1) (2, 0) (1, 0) expected (-1,-1)(-1, 1)(-1, 3) (1, 3) (3, 3) (3, 1) (3,-1) (1,-1)
While looking at the above plot, an idea came into my mind. What I wanted to get was the amount of step needed to go from the second coordinate to the third. In hopes of achieving that, I tried subtracting the row and column values of the first from the second:
second (0, 0) (0, 1) (0, 2) (1, 2) (2, 2) (2, 1) (2, 0) (1, 0) first (1, 1) (1, 1) (1, 1) (1, 1) (1, 1) (1, 1) (1, 1) (1, 1) diff (-1,-1)(-1, 0)(-1, 1) (0, 1) (1, 1) (1, 0) (1,-1) (0,-1)
After that, I tried adding the values of the
diff row to the values of
second (0, 0) (0, 1) (0, 2) (1, 2) (2, 2) (2, 1) (2, 0) (1, 0) diff (-1,-1)(-1, 0)(-1, 1) (0, 1) (1, 1) (1, 0) (1,-1) (0,-1) sum (-1,-1)(-1, 1)(-1, 3) (1, 3) (3, 3) (3, 1) (3,-1) (1,-1)
If you look closely, the values of the
sum row match those of the
expected row! To summarize, I get the difference by subtracting values of the first coordinate from the values of the second coordinate, then I add the difference to the second coordinate to arrive at the expected third!
Now, back to the function:
def complete_line(base_coord, targ_coord, word_len, row_length, col_len): if word_len == 2: return base_coord, targ_coord line = [base_coord, targ_coord] diff_1, diff_2 = targ_coord - base_coord, targ_coord - base_coord for _ in xrange(word_len - 2): line += [(line[-1] + diff_1, line[-1] + diff_2)] if 0 <= line[-1] <= row_length and 0 <= line[-1] <= col_len: return line return 
For this function, however, I decided pass it the length of the word as an argument for mainly two reasons—to check for double length words, and for the length of the final list output. We check for double length words because with words that have lengths of 2, we no longer need to compute for a third coordinate because the word only needs two coordinates to be complete.
For the second reason, this serves as the quirk of my algorithm. Instead of checking the third coordinate for a match of the third character (and the subsequent ones), I instead create a list of coordinates, forming a straight line in the matrix, whose length is equal to the length of the word.
I first create the
line variable which already contains the coordinates of the first match and the second match of the word. After that, I get the difference of the second coordinates values and the first. Finally, I create a
for loop whose loop count is the length of the word minus 2 (because
line already has two values inside). Inside the loop, I append to the
line list variable a new coordinate by getting
line’s last variable values then adding the difference of the second and first match coordinates.
Finally, to make sure that the created coordinate list can be found inside the matrix, I check the last coordinate of the
line variable if it’s within the bounds of the matrix. If it is, I return the newly created coordinate list, and if not, I simply return an empty list.
Let’s say we want a complete line when given coordinate matches
(0, 0) and
(1, 1), and the length of our word is 3:
>>> core.complete_line((0, 0), (1, 1), 3, 4, 3) [(0, 0), (1, 1), (2, 2)]
If we give the function a word length of 4:
>>> core.complete_line((0, 0), (1, 1), 4, 4, 3) 
it returns an empty list because the last coordinate of the created list went out of bounds.
def complete_match(word, matrix, base_match, word_len, row_len, col_len): new = (complete_line(base, n, word_len, row_len, col_len) for base in base_match for n in matched_neighbors(base, word, matrix, row_len, col_len)) return [ero for ero in new if convert_to_word(ero, matrix) == word]
This is the
complete_line function on steroids. The goal of this function is to apply
complete_line to all the neighbors of the first match. After that, it creates a lists of coordinates whose word equivalent is the same as the word we’re trying to look for inside the matrix.
For the value of the
new variable, I utilize a generator comprehension. These are like list comprehensions, except, they release their values one by one, only upon request, in contrast to list comprehensions which return all the contents of the list in one go.
To accomplish the application of
complete_line to all the neighbors of the first match, I iterate through all the first matches:
for base in base_match
then inside that
for loop, I iterate through all the neighbors that
matched_neighbors gave us:
for n in matched_neighbors(base, word, matrix, row_len, col_len)
I then put the following statement in the first part of the generator comprehension:
complete_line(base, n, word_len, row_len, column_len)
The above generator comprehension is roughly equivalent to:
for base in base_match: for n in matched_neighbors(base, word, matrix, row_len, col_len): yield complete_line(base, n, word_len, row_len, col_len)
After the creation of the
new variable, we now start going through its values one by one:
[ero for ero in new if convert_to_word(ero, matrix) == word]
This list comprehension above will filter the
new and the resulting list will only contain coordinates that, when converted to its word counterpart, match the original word we wanted to find.
Attempting to find the word
dog inside our matrix returns a list of lists containing matched coordinates:
>>> base_match = find_base_match('dog', matrix) >>> core.complete_match('dog', matrix, base_match, 3, 3, 4) [[(0, 0), (0, 1), (0, 2)], [(0, 0), (1, 0), (2, 0)], [(0, 0), (1, 1), (2, 2)], [(2, 3), (1, 3), (0, 3)]]
def find_matches(word, grid, separator='\n'): word_len = len(word) matrix = matrixify(grid, separator) row_len, column_len = len(matrix), len(matrix) base_matches = find_base_match(word, matrix) if column_len < word_len > row_len or not base_matches: return  elif word_len == 1: return base_matches return complete_match(word, matrix, base_matches, word_len, row_len, column_len)
This function will serve as the helper of our main function. Its ultimate goal is to output a list containing the coordinates of all the possible matches of
grid. For general purposes, I defined four variables:
word_lenvariable whose value is the length of the
wordargument, which will generally be useful throughout the script
matrixvariable whose value we get through giving
matrixifyfunction, which will allow us to later be able to index contents of the matrix through its row and column indices.
base_matcheswhich contain the coordinates of all the first letter matches of
After the variables, we will do some sanity checks:
if column_len < word_len > row_len or not base_matches: return  elif word_len == 1: return base_matches
if elif statement will check if the length of
word is longer than both the
row_len and also checks if
base_matches returns an empty list. If that condition is not satisfied, it means that
word can fit inside the matrix, and
base_matches found a match! However, if the length of
word is 1, we simply return
If the word is longer than 1, we then pass the local variables to
complete_match for further processing.
dog, the string chain
dogg oogo gogd, and the
' ' separator as arguments:
>>> find_matches('dog', 'dogg oogo gogd', ' ') [[(0, 0), (0, 1), (0, 2)], [(0, 0), (1, 0), (2, 0)], [(0, 0), (1, 1), (2, 2)], [(2, 3), (1, 3), (0, 3)]]
Voila! This is the list, which contain lists of coordinates where the word
dog matched inside
dogg oogo gogd!
def wordsearch(word, string_grid, separator='\n'): return len(find_matches(word, string_grid, separator))
This function simply returns the number of matches of running
find_matches(word, string_grid, separator='\n'):
>>> wordsearch('dog', 'dogg oogo gogd', ' ') 4
There are 4 matches of
dogg oogo gogd!
Remember, it’s never a bad idea to go back to using pen and paper to solve programming problems. Sometimes, we express ideas better using our bare hands, and to top it off, a good ol’ break from the monitor and from the walls of code could just be what you need for a breakthrough—just like when I got stuck thinking about how I should implement my complete_line function!