Does anyone have a good Proper Case algorithm
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
A good proper-case algorithm is not just "capitalize every word." Real title-casing rules need to handle small connector words, punctuation, apostrophes, hyphenated words, and acronyms. The best algorithm therefore depends on whether you want a simple house style or a linguistically polished title-casing rule set.
Define What "Proper Case" Means
People use the phrase loosely. It can mean at least two different things:
- capitalize the first letter of every word
- apply title-style capitalization rules such as lowercasing short articles and prepositions unless they begin the title
For example:
- simple word capitalization:
The Wind In The Willows - title-style capitalization:
The Wind in the Willows
A useful algorithm starts by choosing which behavior you actually want.
A Practical Rule Set
A common title-style algorithm works like this:
- capitalize the first and last word
- lowercase common short words such as
and,of,in,the, unless they are first or last - preserve acronyms that are already uppercase
- capitalize both sides of a hyphenated word
That does not solve every language nuance, but it is good enough for many applications.
A Runnable Python Example
Example output:
That last example reveals a style choice: some house styles capitalize every hyphenated segment, while others do not. The algorithm should follow your chosen rulebook.
Preserve What Should Stay Uppercase
One easy way to damage text is by lowercasing acronyms or brand names.
Examples:
- '
APIshould usually stayAPI' - '
NASAshould not becomeNasa' - '
iOSshould not be blindly changed toIos'
That means a production-grade proper-case function often needs an exceptions dictionary or a preserve-as-is rule for known tokens.
Apostrophes and Other Punctuation
Words with apostrophes are another edge case. For example:
- '
o'reilly' - '
rock 'n' roll' - '
don't stop'
A naive split-and-capitalize approach may mishandle these. Depending on your input, you may need tokenization that treats punctuation more carefully than split() does.
If your text is user-facing and messy, regular expressions or a real text-processing pipeline may be better than a tiny utility function.
Why There Is No Universally Perfect Algorithm
Title-casing conventions vary by:
- publisher style guide
- language
- domain-specific branding
- whether the text is a headline, title, or ordinary name
That is why the phrase "good proper case algorithm" always needs a little context. A clean house-style implementation is often better than chasing a mythical universal algorithm.
Common Pitfalls
The biggest mistake is capitalizing every word indiscriminately and calling it proper case. That usually ignores connector words and produces awkward titles.
Another mistake is lowercasing everything first and destroying acronyms, brand names, or intentional mixed-case tokens.
People also forget hyphenated words and apostrophes, which can make a simple algorithm look broken on realistic input.
Finally, avoid pretending the algorithm is language-neutral. Casing rules that work for English titles may be wrong for other languages or editorial styles.
Summary
- Proper case can mean simple capitalization or fuller title-style casing; define the target behavior first.
- A useful algorithm usually treats small connector words differently from major words.
- Preserve acronyms and known mixed-case names instead of normalizing them blindly.
- Hyphens, apostrophes, and punctuation are where naive implementations usually fail.
- The best algorithm is often one that matches your style guide, not a universal one-size-fits-all rule.

