Count the number of occurrences of a string in a VARCHAR field?
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Counting how many times a substring appears inside a VARCHAR column is a common reporting task. The cleanest SQL solution depends on whether you want simple substring counting, case-insensitive matching, or whole-word matching.
The Standard Length-Difference Trick
A portable way to count non-overlapping substring occurrences is to compare the string length before and after removing the target text.
If description is Red apple and green apple, then removing apple shortens the string by ten characters. Dividing by the length of apple, which is five, gives a count of two.
This technique is simple and works well when:
- the target is a literal string
- overlapping matches do not matter
- your database supports
LENGTHandREPLACE
Why It Works
Suppose the original value is banana apple apple.
- original length includes both copies of
apple - '
REPLACE(description, 'apple', '')removes both copies' - the difference in length equals
count * length(target)
That is why division gives the number of occurrences.
A Runnable Example
Expected result:
Case-Insensitive Counting
Many databases treat REPLACE as case-sensitive. If you want Apple, APPLE, and apple to count the same, normalize both sides.
This works because both the source text and the search text are converted to lower case before replacement.
Whole-Word Matching
The length-difference trick counts substrings, not words. That means searching for app would also match inside apple. If you need word boundaries, the better choice is a regular expression function, if your database supports one.
For PostgreSQL, for example, you could use a regex-based approach with regexp_matches and count the returned rows. Database-specific syntax varies, so check the functions available in your engine.
Important Limitation: Overlapping Matches
The length-difference approach does not count overlapping occurrences. For example, counting ana in banana should arguably produce two overlapping matches. The replace method returns one because the first replacement consumes characters needed by the second overlap.
If overlapping matches matter, you need a regex engine or iterative string logic specific to your SQL dialect.
Performance Considerations
String functions run per row, so they can become expensive on large text columns. If this query is used often in analytics, consider whether the count should be precomputed, materialized, or handled in a text-search system instead of being recalculated repeatedly.
Also note that applying functions such as LOWER to the column generally prevents normal index use on that expression unless you created a function-based index.
Common Pitfalls
The most common mistake is forgetting the division by LENGTH(target). Without that step, you get removed character count, not occurrence count.
Another mistake is assuming the query counts overlapping matches. It does not.
Case sensitivity is another easy source of wrong answers. If the data has inconsistent capitalization, normalize it explicitly.
Summary
- A standard SQL technique is
LENGTH(col) - LENGTH(REPLACE(col, target, '')), divided byLENGTH(target). - This counts non-overlapping literal substring matches.
- Use
LOWERif you need case-insensitive matching. - Use regex features when you need whole-word or overlapping-match logic.
- For large datasets, watch the cost of per-row string processing.

