Average of two strings in alphabetical/lexicographical order
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
The "average" of two strings is not a standard operation unless you first define the alphabet and the representation. Once that is fixed, the usual interpretation is the lexicographic midpoint: convert both strings into numbers in a positional alphabet, average those numbers, and convert the result back to a string of the same length.
Why the Problem Needs Constraints
For numbers, averaging is obvious. For strings, it is only meaningful if you answer questions like:
- which alphabet is allowed
- are uppercase and lowercase distinct
- do all strings have the same length
- do you want a midpoint or just any string between them
The cleanest version of the problem is:
- lowercase
athroughz - both strings have the same length
- find the midpoint under lexicographic order
With those rules, lexicographic order behaves like base-26 number order.
Convert Strings to Base-26 Numbers
Treat a as 0, b as 1, and so on until z as 25.
For example, with length three:
- '
aaais0' - '
aabis1' - '
abais26'
That means you can map a string to an integer, average the integers, and map the result back.
This computes the floor of the numerical midpoint.
Example Walkthrough
Take the strings aaaa and aaaz.
The four-character string space behaves like numbers from 0 to 26^4 - 1. Since:
- '
aaaamaps to0' - '
aaazmaps to25'
their midpoint is 12, which maps back to aaam.
That is the lexicographic midpoint under the chosen encoding.
What If the Lengths Differ
This is where the problem gets messy. Plain lexicographic order over variable-length strings is not a simple fixed-width number system. For example:
- '
acomes beforeaa' - '
aacomes beforeab'
To define an average cleanly, you need an extra rule such as:
- pad both strings to a fixed maximum length
- use a sentinel character
- restrict the problem to equal-length strings
In most programming problems, the last option is the sensible one because it keeps the mapping unambiguous.
Any String Between Two Strings Is Easier
Sometimes you do not need the exact midpoint. You just need a string strictly between two given strings. That can be easier than computing a full numeric midpoint.
For example, between cat and caz, a valid intermediate string is cau.
But if the title says "average," the midpoint interpretation is the most defensible one because it gives a deterministic answer instead of an arbitrary string that merely lies in the interval.
Integer Division Detail
When the numeric interval length is odd, the midpoint is not exact in integer space. Then (left + right) // 2 returns the lower midpoint.
That is usually fine, but it means the result may equal the left bound if the two strings are adjacent. For example:
- '
aaa' - '
aab'
The lower midpoint is still aaa.
So if your actual requirement is "strictly between," midpoint alone may not guarantee that such a string exists. Sometimes no equal-length string lies strictly between the two inputs.
Complexity
For a string of length n, the conversion and reconstruction both take O(n) time. That is efficient enough for typical use, and the algorithm is conceptually simple once the alphabet and length rules are fixed.
Common Pitfalls
- Trying to average strings character by character. Lexicographic order is positional, so independent per-character averaging does not preserve the ordering correctly.
- Ignoring string-length differences and assuming variable-length lexicographic order behaves like fixed-width base-26 numbers.
- Forgetting to define the alphabet explicitly. The answer changes if uppercase letters, digits, or locale-specific characters are allowed.
- Expecting the midpoint to be strictly between the inputs in every case. Adjacent strings may have no strict interior point.
- Solving "find any string between" with a midpoint algorithm when the problem only needed an easier separator string.
Summary
- An average of two strings only makes sense after you define alphabet and length rules.
- For equal-length lowercase strings, lexicographic midpoint can be computed by base-26 conversion.
- Convert both strings to integers, average the integers, and convert back.
- Variable-length strings need extra rules and are not as cleanly handled.
- If you only need a string between two bounds, that is a different and often easier problem than computing the true midpoint.

