anagram algorithm
algorithm optimization
computational complexity
string manipulation
efficient algorithms

Anagram algorithm with minimum complexity

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

To check whether two strings are anagrams, the minimum practical time complexity is usually linear in the length of the strings, assuming you can count character frequencies efficiently. A sorting-based solution works in O(n log n), but a counting-based solution can reduce the main work to O(n) for the common case.

Define the problem precisely

Two strings are anagrams if they contain the same characters with the same counts, possibly in a different order.

Examples:

  • 'listen and silent are anagrams'
  • 'rail safety and fairy tales can be anagrams if spaces are ignored'
  • 'Listen and silent are anagrams only if case is normalized'

So before choosing an algorithm, define whether you are ignoring:

  • case
  • spaces
  • punctuation
  • Unicode normalization differences

That decision affects both correctness and preprocessing cost.

Sorting-based solution

The simplest widely known algorithm is:

  1. normalize both strings if needed
  2. sort both character sequences
  3. compare the results
python
1def are_anagrams_sort(a: str, b: str) -> bool:
2    return sorted(a) == sorted(b)
3
4print(are_anagrams_sort("listen", "silent"))

This is easy to read, but sorting costs O(n log n).

Counting-based O(n) approach

A more efficient approach counts character frequencies.

python
1from collections import Counter
2
3
4def are_anagrams_count(a: str, b: str) -> bool:
5    return Counter(a) == Counter(b)
6
7print(are_anagrams_count("listen", "silent"))

This is effectively linear in the length of the input strings, assuming hash operations are well-behaved.

The reasoning is simple:

  • if the strings have different lengths, they cannot be anagrams
  • if every character count matches, they are anagrams

Manual frequency-array version

If the alphabet is fixed and small, such as lowercase English letters only, you can use an array instead of a hash map.

python
1def are_anagrams_lowercase(a: str, b: str) -> bool:
2    if len(a) != len(b):
3        return False
4
5    counts = [0] * 26
6
7    for ch1, ch2 in zip(a, b):
8        counts[ord(ch1) - ord('a')] += 1
9        counts[ord(ch2) - ord('a')] -= 1
10
11    return all(count == 0 for count in counts)

This is also O(n) and uses constant extra space relative to the fixed alphabet size.

Why O(n) is the real target

You have to inspect every character at least once, so no correct algorithm can beat O(n) time in the general case. That means a counting method is asymptotically optimal for ordinary anagram checking.

So if the question is “minimum complexity,” the practical answer is:

  • 'O(n) time with character counting'
  • 'O(1) extra space for a fixed-size alphabet'
  • or O(k) extra space when counting over k distinct characters

Normalization often matters more than the core algorithm

Real anagram problems are often not about raw letters only. For example:

python
1import re
2from collections import Counter
3
4
5def normalize(text: str) -> str:
6    return re.sub(r'[^a-z0-9]', '', text.lower())
7
8
9def are_anagrams(text1: str, text2: str) -> bool:
10    a = normalize(text1)
11    b = normalize(text2)
12    return Counter(a) == Counter(b)

This kind of preprocessing is often necessary if phrases, punctuation, or case should be ignored.

Choosing the right solution

Use sorting when:

  • simplicity matters more than asymptotic optimality
  • input sizes are small
  • readability is the top priority

Use counting when:

  • performance matters
  • strings may be large
  • you want the asymptotically optimal check

Common Pitfalls

A common mistake is calling a permutation-based approach “the direct solution.” Generating permutations is computationally terrible for this problem.

Another mistake is forgetting to normalize case or spaces when the problem statement expects that.

A third mistake is assuming an O(1) space claim is universal when it really depends on a fixed alphabet assumption.

Summary

  • The optimal practical time complexity for anagram checking is O(n).
  • Counting character frequencies is better than sorting when performance matters.
  • Sorting gives a simple O(n log n) solution.
  • Real correctness often depends on normalization rules such as case and whitespace handling.
  • Avoid permutation-based solutions; they are drastically less efficient.

Course illustration
Course illustration

All Rights Reserved.