algoviz - Trie Construction — Insert Words into a Prefix Tree

Trie Construction

Trees

Builds a trie (prefix tree) by inserting words one character at a time.

A Trie (pronounced "try"), also called a prefix tree or digital tree, is a tree data structure used for efficient storage and retrieval of strings over a known alphabet. Each path from the root to a marked node represents a stored word, and strings that share a common prefix share the same initial path in the trie. The name "trie" was coined by Edward Fredkin in 1960, derived from the word "retrieval," though Fredkin himself pronounced it "tree." To avoid confusion with the general concept of trees, most computer scientists now pronounce it "try." The underlying idea dates back even further to Axel Thue's work on string combinatorics in the early 1900s and was independently explored by Rene de la Briandais in 1959.

How It Works

Start at root: The root node represents the empty string and has no character associated with it.
Process each character: For each character of the word being inserted, check whether the current node already has a child edge labeled with that character.
- If a matching child exists, follow it -- the prefix is already present in the trie.
- If no matching child exists, create a new node and link it as a child with that character label.
Mark end of word: After processing all characters, set an end-of-word flag on the final node. This flag distinguishes complete words from prefixes that happen to be paths to deeper words.
Repeat: Insert the next word starting from the root.

Consider inserting the words "CAR," "CARD," and "CARE" into an empty trie. The first word "CAR" creates nodes for C, A, and R, marking R as an end of word. When "CARD" is inserted, the algorithm follows the existing C-A-R path and only creates a new node for D. Similarly, "CARE" shares C-A-R and adds E. The shared prefix "CAR" is stored only once, which is the core efficiency advantage of tries.

Complexity

Operation	Time
Insert word	O(L) where L = word length
Search word	O(L)
Prefix search	O(P) where P = prefix length
Delete word	O(L)
Space	O(N x A) worst case

Here N is the total number of characters stored and A is the alphabet size (e.g., 26 for lowercase English letters). Each node may have up to A children. In the worst case with no shared prefixes, space consumption can be significant, but in practice natural language data shares many prefixes, yielding substantial compression.

Implementation Choices

Trie nodes can store children in different ways, each with trade-offs. An array of size A provides O(1) child lookup but wastes space for sparse nodes. A hash map at each node reduces space for sparse alphabets while maintaining average O(1) lookup. A sorted list of children saves space but requires O(log A) lookup via binary search. Production systems choose based on alphabet size and memory constraints. For small alphabets like DNA sequences (A=4), arrays are ideal. For Unicode text, hash maps are practically required.

Comparison with Other Data Structures

Compared to hash tables, tries offer several unique capabilities. A hash table can check whether a word exists in O(1) average time, but it cannot efficiently enumerate all words with a given prefix. Tries support prefix queries naturally: navigate to the node representing the prefix, then enumerate all words in that subtree. Compared to balanced BSTs storing strings, tries avoid repeated string comparisons and provide lookup time proportional to the query string length rather than O(L log n).

Practical Applications

Tries are the backbone of autocomplete systems in search engines, text editors, and mobile keyboards. Spell checkers use tries to quickly verify whether a word exists in the dictionary and to suggest corrections by exploring nearby paths. IP routers use a specialized form called a Patricia trie (or radix tree) for longest prefix match routing. T9 predictive text input on older mobile phones used tries to map numeric key sequences to words.

When to Use

Choose a trie when your problem involves prefix-based operations -- autocomplete, prefix search, spell checking, or dictionary storage with prefix queries. If you only need exact key lookups with no prefix operations, a hash table is simpler and more space-efficient.