SQL
string manipulation
VARCHAR
database query
text occurrences

Count the number of occurrences of a string in a VARCHAR field?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Counting how many times a substring appears inside a VARCHAR column is a common reporting task. The cleanest SQL solution depends on whether you want simple substring counting, case-insensitive matching, or whole-word matching.

The Standard Length-Difference Trick

A portable way to count non-overlapping substring occurrences is to compare the string length before and after removing the target text.

sql
1SELECT
2    description,
3    (LENGTH(description) - LENGTH(REPLACE(description, 'apple', ''))) / LENGTH('apple') AS apple_count
4FROM products;

If description is Red apple and green apple, then removing apple shortens the string by ten characters. Dividing by the length of apple, which is five, gives a count of two.

This technique is simple and works well when:

  • the target is a literal string
  • overlapping matches do not matter
  • your database supports LENGTH and REPLACE

Why It Works

Suppose the original value is banana apple apple.

  • original length includes both copies of apple
  • 'REPLACE(description, 'apple', '') removes both copies'
  • the difference in length equals count * length(target)

That is why division gives the number of occurrences.

A Runnable Example

sql
1CREATE TABLE products (
2    id INT PRIMARY KEY,
3    description VARCHAR(100)
4);
5
6INSERT INTO products (id, description) VALUES
7(1, 'Red apple and green apple'),
8(2, 'Banana, apple, and orange'),
9(3, 'Grapes and apple pie'),
10(4, 'No fruit here');
11
12SELECT
13    id,
14    description,
15    (LENGTH(description) - LENGTH(REPLACE(description, 'apple', ''))) / LENGTH('apple') AS apple_count
16FROM products;

Expected result:

text
11 | Red apple and green apple | 2
22 | Banana, apple, and orange | 1
33 | Grapes and apple pie      | 1
44 | No fruit here             | 0

Case-Insensitive Counting

Many databases treat REPLACE as case-sensitive. If you want Apple, APPLE, and apple to count the same, normalize both sides.

sql
1SELECT
2    id,
3    (LENGTH(LOWER(description)) - LENGTH(REPLACE(LOWER(description), 'apple', ''))) / LENGTH('apple') AS apple_count
4FROM products;

This works because both the source text and the search text are converted to lower case before replacement.

Whole-Word Matching

The length-difference trick counts substrings, not words. That means searching for app would also match inside apple. If you need word boundaries, the better choice is a regular expression function, if your database supports one.

For PostgreSQL, for example, you could use a regex-based approach with regexp_matches and count the returned rows. Database-specific syntax varies, so check the functions available in your engine.

Important Limitation: Overlapping Matches

The length-difference approach does not count overlapping occurrences. For example, counting ana in banana should arguably produce two overlapping matches. The replace method returns one because the first replacement consumes characters needed by the second overlap.

If overlapping matches matter, you need a regex engine or iterative string logic specific to your SQL dialect.

Performance Considerations

String functions run per row, so they can become expensive on large text columns. If this query is used often in analytics, consider whether the count should be precomputed, materialized, or handled in a text-search system instead of being recalculated repeatedly.

Also note that applying functions such as LOWER to the column generally prevents normal index use on that expression unless you created a function-based index.

Common Pitfalls

The most common mistake is forgetting the division by LENGTH(target). Without that step, you get removed character count, not occurrence count.

Another mistake is assuming the query counts overlapping matches. It does not.

Case sensitivity is another easy source of wrong answers. If the data has inconsistent capitalization, normalize it explicitly.

Summary

  • A standard SQL technique is LENGTH(col) - LENGTH(REPLACE(col, target, '')), divided by LENGTH(target).
  • This counts non-overlapping literal substring matches.
  • Use LOWER if you need case-insensitive matching.
  • Use regex features when you need whole-word or overlapping-match logic.
  • For large datasets, watch the cost of per-row string processing.

Course illustration
Course illustration

All Rights Reserved.