Lexicographical Order
Alphabetical Sorting
String Manipulation
Average of Strings
Programming Techniques

Average of two strings in alphabetical/lexicographical order

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

The "average" of two strings is not a standard operation unless you first define the alphabet and the representation. Once that is fixed, the usual interpretation is the lexicographic midpoint: convert both strings into numbers in a positional alphabet, average those numbers, and convert the result back to a string of the same length.

Why the Problem Needs Constraints

For numbers, averaging is obvious. For strings, it is only meaningful if you answer questions like:

  • which alphabet is allowed
  • are uppercase and lowercase distinct
  • do all strings have the same length
  • do you want a midpoint or just any string between them

The cleanest version of the problem is:

  • lowercase a through z
  • both strings have the same length
  • find the midpoint under lexicographic order

With those rules, lexicographic order behaves like base-26 number order.

Convert Strings to Base-26 Numbers

Treat a as 0, b as 1, and so on until z as 25.

For example, with length three:

  • 'aaa is 0'
  • 'aab is 1'
  • 'aba is 26'

That means you can map a string to an integer, average the integers, and map the result back.

python
1def to_int(s):
2    value = 0
3    for ch in s:
4        value = value * 26 + (ord(ch) - ord("a"))
5    return value
6
7def from_int(value, length):
8    chars = ["a"] * length
9    for i in range(length - 1, -1, -1):
10        chars[i] = chr(ord("a") + value % 26)
11        value //= 26
12    return "".join(chars)
13
14def midpoint(a, b):
15    assert len(a) == len(b)
16    left = to_int(a)
17    right = to_int(b)
18    assert left <= right
19    return from_int((left + right) // 2, len(a))
20
21print(midpoint("aaaa", "aaaz"))
22print(midpoint("abcz", "abdf"))

This computes the floor of the numerical midpoint.

Example Walkthrough

Take the strings aaaa and aaaz.

The four-character string space behaves like numbers from 0 to 26^4 - 1. Since:

  • 'aaaa maps to 0'
  • 'aaaz maps to 25'

their midpoint is 12, which maps back to aaam.

That is the lexicographic midpoint under the chosen encoding.

What If the Lengths Differ

This is where the problem gets messy. Plain lexicographic order over variable-length strings is not a simple fixed-width number system. For example:

  • 'a comes before aa'
  • 'aa comes before ab'

To define an average cleanly, you need an extra rule such as:

  • pad both strings to a fixed maximum length
  • use a sentinel character
  • restrict the problem to equal-length strings

In most programming problems, the last option is the sensible one because it keeps the mapping unambiguous.

Any String Between Two Strings Is Easier

Sometimes you do not need the exact midpoint. You just need a string strictly between two given strings. That can be easier than computing a full numeric midpoint.

For example, between cat and caz, a valid intermediate string is cau.

But if the title says "average," the midpoint interpretation is the most defensible one because it gives a deterministic answer instead of an arbitrary string that merely lies in the interval.

Integer Division Detail

When the numeric interval length is odd, the midpoint is not exact in integer space. Then (left + right) // 2 returns the lower midpoint.

That is usually fine, but it means the result may equal the left bound if the two strings are adjacent. For example:

  • 'aaa'
  • 'aab'

The lower midpoint is still aaa.

So if your actual requirement is "strictly between," midpoint alone may not guarantee that such a string exists. Sometimes no equal-length string lies strictly between the two inputs.

Complexity

For a string of length n, the conversion and reconstruction both take O(n) time. That is efficient enough for typical use, and the algorithm is conceptually simple once the alphabet and length rules are fixed.

Common Pitfalls

  • Trying to average strings character by character. Lexicographic order is positional, so independent per-character averaging does not preserve the ordering correctly.
  • Ignoring string-length differences and assuming variable-length lexicographic order behaves like fixed-width base-26 numbers.
  • Forgetting to define the alphabet explicitly. The answer changes if uppercase letters, digits, or locale-specific characters are allowed.
  • Expecting the midpoint to be strictly between the inputs in every case. Adjacent strings may have no strict interior point.
  • Solving "find any string between" with a midpoint algorithm when the problem only needed an easier separator string.

Summary

  • An average of two strings only makes sense after you define alphabet and length rules.
  • For equal-length lowercase strings, lexicographic midpoint can be computed by base-26 conversion.
  • Convert both strings to integers, average the integers, and convert back.
  • Variable-length strings need extra rules and are not as cleanly handled.
  • If you only need a string between two bounds, that is a different and often easier problem than computing the true midpoint.

Course illustration
Course illustration

All Rights Reserved.