Anagram algorithm with minimum complexity

anagram algorithm

algorithm optimization

computational complexity

string manipulation

efficient algorithms

Anagram algorithm with minimum complexity

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

To check whether two strings are anagrams, the minimum practical time complexity is usually linear in the length of the strings, assuming you can count character frequencies efficiently. A sorting-based solution works in O(n log n), but a counting-based solution can reduce the main work to O(n) for the common case.

Define the problem precisely

Two strings are anagrams if they contain the same characters with the same counts, possibly in a different order.

Examples:

'listen and silent are anagrams'
'rail safety and fairy tales can be anagrams if spaces are ignored'
'Listen and silent are anagrams only if case is normalized'

So before choosing an algorithm, define whether you are ignoring:

case
spaces
punctuation
Unicode normalization differences

That decision affects both correctness and preprocessing cost.

Sorting-based solution

The simplest widely known algorithm is:

normalize both strings if needed
sort both character sequences
compare the results

python

1def are_anagrams_sort(a: str, b: str) -> bool:
2    return sorted(a) == sorted(b)
3
4print(are_anagrams_sort("listen", "silent"))

This is easy to read, but sorting costs O(n log n).

Counting-based `O(n)` approach

A more efficient approach counts character frequencies.

python

1from collections import Counter
2
3
4def are_anagrams_count(a: str, b: str) -> bool:
5    return Counter(a) == Counter(b)
6
7print(are_anagrams_count("listen", "silent"))

This is effectively linear in the length of the input strings, assuming hash operations are well-behaved.

The reasoning is simple:

if the strings have different lengths, they cannot be anagrams
if every character count matches, they are anagrams

Manual frequency-array version

If the alphabet is fixed and small, such as lowercase English letters only, you can use an array instead of a hash map.

python

1def are_anagrams_lowercase(a: str, b: str) -> bool:
2    if len(a) != len(b):
3        return False
4
5    counts = [0] * 26
6
7    for ch1, ch2 in zip(a, b):
8        counts[ord(ch1) - ord('a')] += 1
9        counts[ord(ch2) - ord('a')] -= 1
10
11    return all(count == 0 for count in counts)

This is also O(n) and uses constant extra space relative to the fixed alphabet size.

Why `O(n)` is the real target

You have to inspect every character at least once, so no correct algorithm can beat O(n) time in the general case. That means a counting method is asymptotically optimal for ordinary anagram checking.

So if the question is “minimum complexity,” the practical answer is:

'O(n) time with character counting'
'O(1) extra space for a fixed-size alphabet'
or O(k) extra space when counting over k distinct characters

Normalization often matters more than the core algorithm

Real anagram problems are often not about raw letters only. For example:

python

1import re
2from collections import Counter
3
4
5def normalize(text: str) -> str:
6    return re.sub(r'[^a-z0-9]', '', text.lower())
7
8
9def are_anagrams(text1: str, text2: str) -> bool:
10    a = normalize(text1)
11    b = normalize(text2)
12    return Counter(a) == Counter(b)

This kind of preprocessing is often necessary if phrases, punctuation, or case should be ignored.

Choosing the right solution

Use sorting when:

simplicity matters more than asymptotic optimality
input sizes are small
readability is the top priority

Use counting when:

performance matters
strings may be large
you want the asymptotically optimal check

Common Pitfalls

A common mistake is calling a permutation-based approach “the direct solution.” Generating permutations is computationally terrible for this problem.

Another mistake is forgetting to normalize case or spaces when the problem statement expects that.

A third mistake is assuming an O(1) space claim is universal when it really depends on a fixed alphabet assumption.

Summary

The optimal practical time complexity for anagram checking is O(n).
Counting character frequencies is better than sorting when performance matters.
Sorting gives a simple O(n log n) solution.
Real correctness often depends on normalization rules such as case and whitespace handling.
Avoid permutation-based solutions; they are drastically less efficient.

Anagram algorithm with minimum complexity

Master System Design with Codemia

Introduction

Define the problem precisely

Sorting-based solution

Counting-based O(n) approach

Manual frequency-array version

Why O(n) is the real target

Normalization often matters more than the core algorithm

Choosing the right solution

Common Pitfalls

Summary

Counting-based `O(n)` approach

Why `O(n)` is the real target