Does anyone have a good Proper Case algorithm

proper case algorithm

string formatting

text transformation

programming tips

coding help

Does anyone have a good Proper Case algorithm

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

A good proper-case algorithm is not just "capitalize every word." Real title-casing rules need to handle small connector words, punctuation, apostrophes, hyphenated words, and acronyms. The best algorithm therefore depends on whether you want a simple house style or a linguistically polished title-casing rule set.

Define What "Proper Case" Means

People use the phrase loosely. It can mean at least two different things:

capitalize the first letter of every word
apply title-style capitalization rules such as lowercasing short articles and prepositions unless they begin the title

For example:

simple word capitalization: The Wind In The Willows
title-style capitalization: The Wind in the Willows

A useful algorithm starts by choosing which behavior you actually want.

A Practical Rule Set

A common title-style algorithm works like this:

capitalize the first and last word
lowercase common short words such as and, of, in, the, unless they are first or last
preserve acronyms that are already uppercase
capitalize both sides of a hyphenated word

That does not solve every language nuance, but it is good enough for many applications.

A Runnable Python Example

python

1SMALL_WORDS = {
2    "a", "an", "and", "as", "at", "but", "by", "for",
3    "in", "nor", "of", "on", "or", "per", "the", "to"
4}
5
6
7def proper_case(title: str) -> str:
8    words = title.split()
9    result = []
10
11    for index, word in enumerate(words):
12        lower = word.lower()
13
14        if word.isupper() and len(word) > 1:
15            result.append(word)
16            continue
17
18        if "-" in word:
19            parts = word.split("-")
20            word = "-".join(part[:1].upper() + part[1:].lower() for part in parts)
21            lower = word.lower()
22
23        if index != 0 and index != len(words) - 1 and lower in SMALL_WORDS:
24            result.append(lower)
25        else:
26            result.append(word[:1].upper() + word[1:].lower())
27
28    return " ".join(result)
29
30
31print(proper_case("the wind in the willows"))
32print(proper_case("an analysis of AI systems"))
33print(proper_case("state-of-the-art design"))

Example output:

text

The Wind in the Willows
An Analysis of AI Systems
State-Of-The-Art Design

That last example reveals a style choice: some house styles capitalize every hyphenated segment, while others do not. The algorithm should follow your chosen rulebook.

Preserve What Should Stay Uppercase

One easy way to damage text is by lowercasing acronyms or brand names.

Examples:

'API should usually stay API'
'NASA should not become Nasa'
'iOS should not be blindly changed to Ios'

That means a production-grade proper-case function often needs an exceptions dictionary or a preserve-as-is rule for known tokens.

Apostrophes and Other Punctuation

Words with apostrophes are another edge case. For example:

'o'reilly'
'rock 'n' roll'
'don't stop'

A naive split-and-capitalize approach may mishandle these. Depending on your input, you may need tokenization that treats punctuation more carefully than split() does.

If your text is user-facing and messy, regular expressions or a real text-processing pipeline may be better than a tiny utility function.

Why There Is No Universally Perfect Algorithm

Title-casing conventions vary by:

publisher style guide
language
domain-specific branding
whether the text is a headline, title, or ordinary name

That is why the phrase "good proper case algorithm" always needs a little context. A clean house-style implementation is often better than chasing a mythical universal algorithm.

Common Pitfalls

The biggest mistake is capitalizing every word indiscriminately and calling it proper case. That usually ignores connector words and produces awkward titles.

Another mistake is lowercasing everything first and destroying acronyms, brand names, or intentional mixed-case tokens.

People also forget hyphenated words and apostrophes, which can make a simple algorithm look broken on realistic input.

Finally, avoid pretending the algorithm is language-neutral. Casing rules that work for English titles may be wrong for other languages or editorial styles.

Summary

Proper case can mean simple capitalization or fuller title-style casing; define the target behavior first.
A useful algorithm usually treats small connector words differently from major words.
Preserve acronyms and known mixed-case names instead of normalizing them blindly.
Hyphens, apostrophes, and punctuation are where naive implementations usually fail.
The best algorithm is often one that matches your style guide, not a universal one-size-fits-all rule.