proper case algorithm
string formatting
text transformation
programming tips
coding help

Does anyone have a good Proper Case algorithm

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

A good proper-case algorithm is not just "capitalize every word." Real title-casing rules need to handle small connector words, punctuation, apostrophes, hyphenated words, and acronyms. The best algorithm therefore depends on whether you want a simple house style or a linguistically polished title-casing rule set.

Define What "Proper Case" Means

People use the phrase loosely. It can mean at least two different things:

  • capitalize the first letter of every word
  • apply title-style capitalization rules such as lowercasing short articles and prepositions unless they begin the title

For example:

  • simple word capitalization: The Wind In The Willows
  • title-style capitalization: The Wind in the Willows

A useful algorithm starts by choosing which behavior you actually want.

A Practical Rule Set

A common title-style algorithm works like this:

  • capitalize the first and last word
  • lowercase common short words such as and, of, in, the, unless they are first or last
  • preserve acronyms that are already uppercase
  • capitalize both sides of a hyphenated word

That does not solve every language nuance, but it is good enough for many applications.

A Runnable Python Example

python
1SMALL_WORDS = {
2    "a", "an", "and", "as", "at", "but", "by", "for",
3    "in", "nor", "of", "on", "or", "per", "the", "to"
4}
5
6
7def proper_case(title: str) -> str:
8    words = title.split()
9    result = []
10
11    for index, word in enumerate(words):
12        lower = word.lower()
13
14        if word.isupper() and len(word) > 1:
15            result.append(word)
16            continue
17
18        if "-" in word:
19            parts = word.split("-")
20            word = "-".join(part[:1].upper() + part[1:].lower() for part in parts)
21            lower = word.lower()
22
23        if index != 0 and index != len(words) - 1 and lower in SMALL_WORDS:
24            result.append(lower)
25        else:
26            result.append(word[:1].upper() + word[1:].lower())
27
28    return " ".join(result)
29
30
31print(proper_case("the wind in the willows"))
32print(proper_case("an analysis of AI systems"))
33print(proper_case("state-of-the-art design"))

Example output:

text
The Wind in the Willows
An Analysis of AI Systems
State-Of-The-Art Design

That last example reveals a style choice: some house styles capitalize every hyphenated segment, while others do not. The algorithm should follow your chosen rulebook.

Preserve What Should Stay Uppercase

One easy way to damage text is by lowercasing acronyms or brand names.

Examples:

  • 'API should usually stay API'
  • 'NASA should not become Nasa'
  • 'iOS should not be blindly changed to Ios'

That means a production-grade proper-case function often needs an exceptions dictionary or a preserve-as-is rule for known tokens.

Apostrophes and Other Punctuation

Words with apostrophes are another edge case. For example:

  • 'o'reilly'
  • 'rock 'n' roll'
  • 'don't stop'

A naive split-and-capitalize approach may mishandle these. Depending on your input, you may need tokenization that treats punctuation more carefully than split() does.

If your text is user-facing and messy, regular expressions or a real text-processing pipeline may be better than a tiny utility function.

Why There Is No Universally Perfect Algorithm

Title-casing conventions vary by:

  • publisher style guide
  • language
  • domain-specific branding
  • whether the text is a headline, title, or ordinary name

That is why the phrase "good proper case algorithm" always needs a little context. A clean house-style implementation is often better than chasing a mythical universal algorithm.

Common Pitfalls

The biggest mistake is capitalizing every word indiscriminately and calling it proper case. That usually ignores connector words and produces awkward titles.

Another mistake is lowercasing everything first and destroying acronyms, brand names, or intentional mixed-case tokens.

People also forget hyphenated words and apostrophes, which can make a simple algorithm look broken on realistic input.

Finally, avoid pretending the algorithm is language-neutral. Casing rules that work for English titles may be wrong for other languages or editorial styles.

Summary

  • Proper case can mean simple capitalization or fuller title-style casing; define the target behavior first.
  • A useful algorithm usually treats small connector words differently from major words.
  • Preserve acronyms and known mixed-case names instead of normalizing them blindly.
  • Hyphens, apostrophes, and punctuation are where naive implementations usually fail.
  • The best algorithm is often one that matches your style guide, not a universal one-size-fits-all rule.

Course illustration
Course illustration

All Rights Reserved.