Tries (Prefix Trees)

Course

DSA Fundamentals

Tries (Prefix Trees)

Topics Covered

Trie Structure and Mental Model

Why a Trie Instead of a Hash Set

Node Structure

Memory Trade-Offs

Visualizing a Trie

What Tries Are Not Good For

Insert and Search Operations

Insert

Search for Exact Word

Prefix Search

Practice: Implement Trie

Wildcard Search with DFS

Practice: Word Dictionary with Wildcards

Trie + Backtracking on a Grid

The Problem

The Trie Insight

Why This Beats Per-Word DFS

Storing Words at Terminal Nodes

Pruning Dead Trie Branches

Practice: Word Search II

Trie Variants and When to Use Them

Compressed Tries (Radix Tries / Patricia Tries)

Suffix Tries

Aho-Corasick

When to Use Each Variant

Common Trie Pitfalls

Trie Structure and Mental Model

A trie (pronounced "try") is a tree-shaped dictionary keyed on the characters of a string. Every word you insert traces a path from the root downward, one node per character. Words sharing a prefix share that prefix's path through the tree, which is exactly why tries dominate prefix-based problems: looking up "auto", "automobile", and "automatic" all start at the same nodes.

Why a Trie Instead of a Hash Set

A hash set lets you check whether a word exists in O(L) time, where L is the word length. So does a trie. The difference is what each structure can answer beyond exact-match queries:

Hash set: only answers "is this exact word in the set?"
Trie: answers "is this exact word in the set?", "does any word start with this prefix?", "list all words with this prefix", and "does any word match this pattern with wildcards?"

Hash sets are blind to prefixes - the hash of "app" tells you nothing about whether "apple" exists. A trie's structure naturally encodes prefix relationships because shared prefixes share nodes. This is why autocomplete, spell-check, and IP routing tables all use tries (or close cousins like radix trees and patricia tries).

Trie with shared prefixes

Words "apple", "app", and "apricot" share the prefix "ap" and branch where their characters diverge.

Node Structure

Every trie node holds two pieces of information: a map from characters to child nodes, and a flag marking whether this node terminates a complete word. The root node represents the empty string. The flag is essential because the same path can represent both a complete word and a prefix of longer words - for example, "app" is a complete word and also a prefix of "apple".

Some implementations use a fixed-size array of 26 entries (one per lowercase letter) instead of a hash map. The array version is faster on lookups (constant-time character indexing instead of hash overhead) and has predictable memory usage, but it wastes space for sparse alphabets and cannot handle Unicode without modification.

Memory Trade-Offs

The main cost of a trie is memory. A trie storing n words of average length L has at worst n * L nodes if no prefixes are shared, and each node carries a hash map or array of children. A hash set for the same data uses roughly n * L bytes for the strings themselves plus hash table overhead - typically less.

The break-even point is shared prefixes. If your dataset contains many words with common prefixes (URLs, file paths, dictionary words), the trie's shared structure pays for itself. If your dataset is random strings with no common prefixes, the trie wastes memory and a hash set is better.

Interview Tip

Tries shine when your queries are about prefixes, not just exact words. If the only operation you need is 'does this exact string exist', use a hash set - it is simpler and uses less memory. The moment you need 'all words starting with X' or 'autocomplete this fragment', the trie's structure pays for itself many times over.

Visualizing a Trie

Inserting "cat", "car", "cart", and "dog" produces the following structure. The dollar sign marks is_end = True:

1        root
2       /    \
3      c      d
4      |      |
5      a      o
6     / \     |
7    t$  r    g$
8        |
9        t$

"cat" terminates at the t node (marked $). "car" terminates at the r node (marked $). "cart" extends "car" with another t (marked $). "dog" terminates at g (marked $). The path from root to any $ node spells out a complete word, and the path from root to any internal node spells out a prefix shared by all words below it.

What Tries Are Not Good For

Tries are not the right answer for problems about substrings (use suffix arrays or Aho-Corasick), problems where you need ordered iteration of all words (BSTs are simpler), or problems where memory is tight and prefixes are not shared. They are also overkill for tiny alphabets - a problem about binary strings can usually use a simpler bitmask approach.

Insert and Search Operations

Both insert and search walk a path from the root, character by character. Insert creates new nodes when no path exists; search returns false when no path exists. Both operations are O(L) where L is the word length, completely independent of how many words are stored in the trie. This is the second major advantage over a hash set - hash sets are O(L) on average but O(n*L) worst case due to hash collisions, while tries are O(L) deterministically.

Insert

Walk from the root following the characters of the word. At each step, if the current character does not exist as a child of the current node, create a new node. Move to that child. After processing the last character, mark the current node's is_end = True.

Inserting "apple" into an empty trie creates 5 new nodes (one for each letter), and the final node has is_end = True. Inserting "app" afterward reuses the first 3 nodes (a, p, p) and only marks the second p's is_end = True. Inserting "apply" reuses 4 nodes (a, p, p, l) and creates one new node for y.

Search for Exact Word

Walk from the root following the characters of the word. If at any step the current character does not exist as a child of the current node, return false. After processing the last character, return the current node's is_end value.

The crucial detail is the final return node.is_end rather than return True. If you return true unconditionally after walking the path, you would incorrectly report that "app" exists in a trie that only contains "apple" - the path exists, but no word terminates at the end of "app".

Prefix Search

Prefix search is identical to exact search, except that the final step does not check is_end. As long as the path exists, the prefix exists in the trie.

Trie insert and search

Insert creates nodes along the path; search follows existing nodes and checks the end-of-word flag.

The contrast between search and starts_with is the entire point of using a trie. They differ by one line of code, but starts_with is impossible to implement efficiently with a hash set.

Practice: Implement Trie

Build the basic trie with insert, search, and startsWith. The implementation directly mirrors the code above. Pay special attention to the final return in search - it must check is_end, not just whether the path exists.

Loading problem...

Watch insert, search, and startsWith operations walk the trie character by character:

Loading animator...

Wildcard Search with DFS

Some problems extend the basic search to support wildcards - for example, the dot character . matching any single letter. The naive approach (test all possible substitutions) is exponential. The trie approach uses depth-first search: when you hit a non-wildcard character, follow that exact child. When you hit a wildcard, recursively try every child of the current node.

The recursion explores at most all possible matches but prunes aggressively because nonexistent paths short-circuit immediately. Worst case (all wildcards) the cost approaches the size of the trie, but typical queries with one or two wildcards are very fast.

Interview Tip

When implementing wildcard search, the natural instinct is to write a loop. Resist it - the recursive DFS is much cleaner because each wildcard fans out into multiple branches that need their own state. A loop forces you to maintain a stack manually, which is what recursion gives you for free.

Practice: Word Dictionary with Wildcards

Extend the basic trie to support a dot wildcard in search queries. The dot matches any single character, so search(".at") should return true if "cat" or "bat" or "hat" was added previously. The implementation is the wildcard DFS pattern shown above.

Loading problem...

Step through wildcard search - watch the dot character fan out into multiple trie branches:

Loading animator...

Trie + Backtracking on a Grid

The most powerful application of tries in interview problems is searching for many words in a 2D grid. Without a trie, you would run a separate DFS for every word, leading to massive duplicated work because words sharing prefixes traverse the same paths repeatedly. Building a trie of all target words lets you do one combined DFS that follows multiple words in lockstep.

The Problem

Given a 2D grid of letters and a list of target words, find all words that can be formed by walking adjacent cells (up, down, left, right) without reusing any cell. This is the Word Search II problem. A naive approach runs DFS from every cell for every word - if you have 100 words and 144 cells, that is 14,400 separate searches.

The Trie Insight

Build a trie containing all the target words. Now run a single DFS from each grid cell. As the DFS walks the grid, it simultaneously walks the trie. At each step, the current grid letter must correspond to a child of the current trie node - otherwise no target word matches this path and we can stop immediately. This pruning is the key efficiency gain: paths that do not match any prefix of any target word are eliminated in O(1).

Trie-guided grid DFS

The trie prunes grid exploration: if the current path is not a prefix of any target word, backtrack immediately.

Why This Beats Per-Word DFS

Suppose your target words include "oath", "oats", and "oat". A per-word DFS would walk the o-a-t prefix three times - once for each word. The trie-guided DFS walks the o-a-t prefix exactly once, at which point the trie node has children h (for oath), s (for oats), and is_end set (for oat). All three matches are detected in one pass. The savings scale linearly with the number of words sharing each prefix.

Storing Words at Terminal Nodes

A common trick in this pattern is to store the actual word string at the terminal trie node (instead of just is_end = True). When the DFS reaches a terminal node, you immediately know which word was matched without having to reconstruct it from the path. After matching, set the stored word to None to prevent duplicate detection if the same word can be formed from multiple grid paths.

Common Pitfall

A common bug in trie-guided grid search is forgetting to mark cells as visited during DFS. Without marking, the search can revisit the same cell within a single word match - for example, finding 'ball' in a grid where there is only one 'l' but the algorithm walks back to it twice. The fix is to temporarily mark the cell (e.g., set it to '#') before recursing and restore it after.

Pruning Dead Trie Branches

An advanced optimization removes a trie node when it has no children remaining and is not a word terminator. After the DFS finds and removes the word "oath", if the h node has no children, it can be pruned from its parent. This shrinks the trie as words are found, making subsequent DFS steps even faster. The optimization is especially valuable when the trie has many words and the grid is small.

Practice: Word Search II

Combine trie construction with grid backtracking to find all words from a target list that can be formed on the board. The trie organizes the target words; the DFS walks both the grid and the trie in lockstep. This problem is one of the hardest in the trie chapter, but the solution falls out naturally once you see how the trie eliminates duplicate work.

Loading problem...

Watch the trie-guided DFS walk the grid and trie in lockstep, pruning paths that match no target word:

Loading animator...

Trie Variants and When to Use Them

The standard trie is the right starting point for most prefix-based problems, but several variants exist that trade memory or simplicity for additional capabilities. Knowing which variant to reach for is what separates a working solution from an optimal one.

Compressed Tries (Radix Tries / Patricia Tries)

A standard trie uses one node per character, which wastes space when long stretches of the trie have no branching. A compressed trie collapses each unbranching chain of nodes into a single node holding the full substring. The resulting structure has the same query semantics but uses far less memory and pointer chasing.

For example, the words "automobile" and "automotive" share the prefix "automo". A standard trie uses 6 separate nodes for that prefix, then branches. A compressed trie uses one node holding "automo" and then branches into two nodes holding "bile" and "tive". Lookups and inserts are slightly more complex (you must compare against substring labels, not single characters) but memory usage drops dramatically for sparse tries.

Patricia tries are used in IP routing tables for exactly this reason - the routing table is sparse (only a tiny fraction of all possible IP prefixes are populated), so collapsing unbranching chains is essential.

Standard vs compressed trie

Unbranching chains of single-child nodes collapse into single nodes holding the full substring, saving memory and pointer hops.

Suffix Tries

A suffix trie stores every suffix of a single string. Inserting "banana" gives you the suffixes "banana", "anana", "nana", "ana", "na", "a". Now any substring of "banana" is a prefix of some suffix, so you can answer "does this substring exist" in O(L) where L is the substring length. This is the trie analog of the suffix array, and it's the foundation for substring search problems.

The downside is memory: a string of length n produces a suffix trie with O(n squared) nodes in the worst case. Suffix trees (a compressed variant) reduce this to O(n) but are notoriously tricky to construct. For most problems where you need substring search, a suffix array is the more practical choice.

Aho-Corasick

For multi-pattern string matching ("find all occurrences of any of these k words in this text"), Aho-Corasick augments a trie with failure links - pointers that say "if matching fails here, jump to this other node which represents the longest matching suffix". This lets you scan the text in O(n + total matches) time, regardless of how many patterns you are searching for.

Aho-Corasick is the algorithm behind grep with multiple patterns, malware signature scanning, and many intrusion detection systems. It is overkill for interview problems but worth knowing exists.

When to Use Each Variant

Exact word lookup, prefix search, autocomplete: standard trie
Same as above, but memory-tight on sparse data: compressed trie / radix trie
Substring queries on a single text: suffix array (preferred) or suffix trie
Find all occurrences of many patterns in text: Aho-Corasick
Wildcard search with .: standard trie + DFS
Multi-word search on a 2D grid: standard trie + grid backtracking

Interview Tip

In interviews, the standard trie covers 95% of trie problems. Compressed tries and suffix structures are worth knowing exist, but you will rarely be asked to implement them from scratch under time pressure. Focus your practice on the standard trie operations - insert, search, startsWith, and trie-guided DFS - and you will be ready for almost any trie question.

Common Trie Pitfalls

Forgetting is_end: Returning true unconditionally from search makes it behave like startsWith. Always check is_end at the end of the path.

Sharing nodes incorrectly: Each TrieNode must be a fresh object. If you reuse a single TrieNode instance across multiple positions (a common bug when initializing with [TrieNode()] * 26), all "different" children point to the same node and the trie collapses. Use a list comprehension instead: [TrieNode() for _ in range(26)].

Forgetting to restore visited cells: In trie-guided grid search, marking cells as visited is essential, but forgetting to restore them after the recursive call returns means subsequent searches see the wrong grid state. The pattern is "mark before recurse, restore after recurse" - both halves are required.

Storing too much per node: A node with children, is_end, word, count, frequency, etc., uses much more memory than necessary. Only store what your specific problem requires. For most problems, children and is_end (or children and word) are enough.

Course

DSA Fundamentals

Arrays and Hashing

Two Pointers and Sliding Window

Stacks and Queues

Linked Lists

Trees and Graphs

Dynamic Programming

Advanced Data Structures

Advanced Patterns