Implementing a rhyme finder

rhyme-finder

algorithm-development

programming

linguistic-tools

closed-question

Implementing a rhyme finder

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

A useful rhyme finder is usually built on pronunciation, not spelling. English spelling is too irregular for suffix matching alone, so the practical approach is to map words to phonemes and then compare the portion of the pronunciation that begins at the last stressed vowel.

What Counts as a Rhyme

For a simple “perfect rhyme” finder, two words rhyme when their pronunciation endings match from the last stressed vowel onward. For example, “time” and “rhyme” rhyme because their phoneme tails align, while words with similar spelling but different pronunciation may not.

That means:

string endings are not enough
a pronunciation dictionary is extremely helpful
one word can have multiple pronunciations

The CMU Pronouncing Dictionary is a common starting point for English because it gives words in phoneme form.

Build on a Pronouncing Dictionary

With NLTK, you can access the CMU dictionary directly:

python

1import nltk
2from nltk.corpus import cmudict
3
4nltk.download("cmudict")
5
6pronunciations = cmudict.dict()
7
8print(pronunciations["time"])
9print(pronunciations["rhyme"])

Each word maps to one or more pronunciations, and each pronunciation is a list of phoneme tokens. The digits on vowel phonemes indicate stress, which is useful for rhyme detection.

Extract a Rhyme Key

One simple approach is to define a rhyme key as everything from the last stressed vowel to the end of the pronunciation:

python

1import nltk
2from nltk.corpus import cmudict
3
4nltk.download("cmudict")
5
6entries = cmudict.dict()
7
8
9def rhyme_key(pronunciation):
10    for i in range(len(pronunciation) - 1, -1, -1):
11        phoneme = pronunciation[i]
12        if phoneme[-1].isdigit():
13            return tuple(pronunciation[i:])
14    return tuple(pronunciation)
15
16
17print(rhyme_key(entries["time"][0]))
18print(rhyme_key(entries["rhyme"][0]))

This gives you a normalized suffix for rhyme lookup. Once you can compute that key, the next step is to index the dictionary by it.

Index Words by Their Rhyme Key

Precomputing an index makes lookup fast:

python

1from collections import defaultdict
2
3rhyme_index = defaultdict(set)
4
5for word, pronunciations in entries.items():
6    for pronunciation in pronunciations:
7        rhyme_index[rhyme_key(pronunciation)].add(word)
8
9
10def find_rhymes(word):
11    word = word.lower()
12    if word not in entries:
13        return []
14
15    results = set()
16    for pronunciation in entries[word]:
17        results.update(rhyme_index[rhyme_key(pronunciation)])
18
19    results.discard(word)
20    return sorted(results)
21
22
23print(find_rhymes("time")[:10])

This is a practical rhyme finder already. It is fast enough for many command-line tools, small APIs, or educational apps.

Handle Multiple Pronunciations and Missing Words

Many English words have multiple pronunciations. A good rhyme finder should consider all of them instead of assuming one canonical pronunciation.

Out-of-vocabulary words are another issue. If the word is not in the dictionary, you have a few options:

return no result
fall back to spelling-based heuristics
run a grapheme-to-phoneme model

For a first implementation, returning no result is acceptable and keeps the behavior honest.

Decide Whether You Want Perfect or Loose Rhymes

The algorithm above finds close pronunciation matches, which is useful for perfect rhymes. But many applications also want near rhymes or slant rhymes. That is a harder problem because the match is no longer exact.

At that point you might:

compare only final vowel and consonant classes
score phoneme similarity instead of exact equality
keep a looser ranking system rather than exact match buckets

The important design choice is to define what “rhyme” means for your application before tuning the algorithm.

Common Pitfalls

The most common mistake is using spelling suffixes alone. That produces many false positives and false negatives because English spelling and sound do not line up consistently.

Another pitfall is ignoring multiple pronunciations. If you index only the first pronunciation of each word, you silently miss valid rhymes for words with alternate pronunciations.

It is also easy to forget case normalization and dictionary coverage. A user may enter “Time” or a proper noun not present in the pronunciation dictionary, and the tool should handle both cases deliberately.

Finally, do not assume “perfect rhyme” and “what users expect from poetry” are identical. Once the tool becomes user-facing, you may need ranking, slant rhymes, or phrase support rather than only exact phoneme-tail matches.

Summary

A reliable rhyme finder should compare pronunciations, not spelling endings.
The last stressed vowel onward is a useful rhyme key for perfect rhymes.
The CMU Pronouncing Dictionary is a practical data source for English.
Precomputing an index by rhyme key makes lookup fast.
Decide early whether the application needs exact rhymes, near rhymes, or both.