Choosing a multiplier for a string hash function

hash function

string hashing

programming

algorithm design

computer science

Choosing a multiplier for a string hash function

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

If you are building a simple polynomial-style string hash, the multiplier controls how strongly earlier characters influence the final value. There is no single perfect constant, but good choices are usually odd numbers with decent mixing properties, and the best multiplier depends on the rest of the hash design, especially the modulus or overflow behavior.

The Role Of The Multiplier

A common string hash uses a recurrence like this:

python

1def hash_string(text, multiplier=31, modulus=1_000_000_007):
2    h = 0
3    for ch in text:
4        h = (h * multiplier + ord(ch)) % modulus
5    return h
6
7
8print(hash_string("hello"))

Each new character shifts the influence of the previous characters by multiplying the accumulated hash before adding the next code point.

If the multiplier is too small or poorly matched to the modulus, different strings can collapse into similar patterns and cause more collisions.

Why Constants Like 31 And 33 Show Up So Often

Certain multipliers became popular because they worked reasonably well in practice and were cheap to compute on older hardware.

Examples:

'31 in Java's classic string hash'
'33 in the djb2 family'
'131 in some BKDR-style hashes'

These constants are not magic. They are simply serviceable tradeoffs that produce acceptable mixing for many workloads.

Historically, small odd numbers were attractive because they avoided trivial even-number patterns and could sometimes be optimized efficiently. For example, multiplying by 31 can be rewritten as a shift and subtraction in low-level code, though modern compilers usually handle this for you.

The Multiplier Is Not The Whole Story

A strong hash depends on more than one constant.

You also need to consider:

whether arithmetic wraps naturally or uses a modulus
the expected character set
the table resizing strategy
whether a final mixing step is applied

For example, if your hash table size is a power of two, using an even multiplier is especially poor because low bits can lose variation quickly. Odd multipliers are usually safer in that setting.

If you are using a large prime modulus for rolling hashes, the multiplier is often called the base. In that case, many developers choose a base larger than the alphabet size and smaller than the modulus, then verify collision behavior empirically for the target workload.

Compare A Few Multipliers

A quick experiment can show how the constant affects bucket distribution.

python

1from collections import Counter
2
3
4def hash_string(text, multiplier, buckets):
5    h = 0
6    for ch in text:
7        h = (h * multiplier + ord(ch)) % buckets
8    return h
9
10
11words = ["cat", "cot", "cut", "cart", "coat", "cute", "at", "ta", "tac"]
12
13for multiplier in (3, 31, 33, 131):
14    counts = Counter(hash_string(word, multiplier, 16) for word in words)
15    print(multiplier, sorted(counts.items()))

This does not prove one multiplier is universally best, but it illustrates the real point: the constant changes how strings cluster into buckets, especially when the bucket count is small.

Practical Selection Rules

If you are implementing a basic non-cryptographic string hash, these rules are usually good enough:

Prefer an odd multiplier.
Avoid tiny values such as 1 or 2.
Match the choice to the modulus or bucket strategy.
Test on realistic input rather than trusting folklore alone.

If you are implementing a rolling hash for substring algorithms, choose:

a large modulus, often prime
a base that is not too small
sometimes multiple independent hashes if collision risk matters

For hash tables in production systems, it is often better to use a proven existing hash rather than hand-tune one multiplier in isolation.

When Not To Design Your Own Hash

If the goal is general-purpose key hashing for a production data structure, you usually should not invent a custom string hash unless you have a specific reason.

Well-tested implementations already account for:

adversarial collision patterns
platform-specific integer behavior
final mixing quality
performance tradeoffs on modern CPUs

Choosing a multiplier is a useful algorithm exercise, but it is not a substitute for a mature hashing strategy when correctness or security matters.

Common Pitfalls

The biggest mistake is assuming the multiplier alone guarantees a good hash. Poor modulus choice or poor bucket sizing can ruin the distribution even with a famous constant.

Another mistake is copying constants such as 31 or 33 without understanding the arithmetic model. A multiplier that behaves reasonably with one modulus or machine-word overflow pattern may behave differently in another setup.

People also underestimate the effect of input distribution. A multiplier that looks fine on random strings may perform badly on short identifiers, similar prefixes, or structured keys.

Finally, remember that simple polynomial hashes are not cryptographic. A good multiplier reduces accidental collisions, not deliberate attacks.

Summary

The multiplier in a string hash controls how earlier characters influence later positions.
Good choices are usually odd constants such as 31, 33, or 131, but none is universally best.
The right constant depends on the modulus, overflow behavior, and input distribution.
Measure collision behavior on realistic data instead of relying only on tradition.
For production systems, a proven hash function is often better than tuning one multiplier by hand.