Redshift. Convert comma delimited values into rows

Astronomy

Cosmology

Galaxy Expansion

Universe

Physics

Redshift. Convert comma delimited values into rows

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

Despite the mistaken tags, this article’s title is clearly about Amazon Redshift string processing: turning a comma-delimited field into one row per value. Redshift does not give you a magical one-call “split this string into rows” operator in the same way some other SQL systems do, so the usual solution is to combine SPLIT_PART with a generated sequence of positions.

Example Problem

Suppose you have a table like this:

sql

1create table orders (
2    order_id int,
3    item_codes varchar(100)
4);
5
6insert into orders values
7(1, 'A1,B2,C3'),
8(2, 'X9'),
9(3, 'K1,K2');

You want the result:

text

Core Idea: Split by Position

Redshift’s SPLIT_PART(string, delimiter, position) returns one field at a given position. So if you can generate positions 1, 2, 3, ..., you can expand a delimited string into rows.

Count How Many Parts Exist

A simple way to estimate the number of comma-separated pieces is to count commas and add one.

sql

select order_id,
       regexp_count(item_codes, ',') + 1 as parts
from orders;

This works for simple well-formed comma-delimited data.

Use a Numbers CTE

Then join each row against a small numbers table or CTE.

sql

1with numbers as (
2    select 1 as n
3    union all select 2
4    union all select 3
5    union all select 4
6    union all select 5
7)
8select
9    o.order_id,
10    split_part(o.item_codes, ',', n.n) as item_code
11from orders o
12join numbers n
13  on n.n <= regexp_count(o.item_codes, ',') + 1
14order by o.order_id, n.n;

This is the standard pattern: generate candidate positions and use SPLIT_PART for each valid one.

Make the Output Cleaner

If your source data may contain spaces such as A1, B2, C3, trim the extracted value.

sql

1with numbers as (
2    select 1 as n
3    union all select 2
4    union all select 3
5    union all select 4
6    union all select 5
7)
8select
9    o.order_id,
10    trim(split_part(o.item_codes, ',', n.n)) as item_code
11from orders o
12join numbers n
13  on n.n <= regexp_count(o.item_codes, ',') + 1
14order by o.order_id, n.n;

That keeps downstream joins and filters cleaner.

Use a Permanent Numbers Table for Real Workloads

For production workloads, a permanent numbers table is usually better than hardcoding a tiny CTE. It is more reusable and scales to longer lists.

sql

create table dim_numbers (n int);

Populate it once with enough integers for your expected maximum list length, then reuse it anywhere you need position-based expansion logic.

Why Not Keep Comma-Delimited Data?

A comma-separated column is often a sign that the data should really be normalized into a child table. Splitting into rows during query time works, but it is usually a workaround, not the ideal long-term schema.

If you control the data model, storing one value per row is almost always simpler for filtering, joining, and analytics.

Common Pitfalls

A common mistake is assuming Redshift supports the exact same row-splitting functions as PostgreSQL or another SQL engine. Another is forgetting to trim whitespace after splitting. Developers also often underestimate how messy comma-delimited data becomes when values can be empty, malformed, or contain commas themselves. Finally, if the list can be longer than the hardcoded numbers CTE, rows silently disappear from the result.

Summary

In Redshift, a common way to turn comma-delimited text into rows is SPLIT_PART plus generated positions.
'REGEXP_COUNT helps determine how many parts a row contains.'
A reusable numbers table is better than a tiny hardcoded sequence for real workloads.
'TRIM is useful when delimiters are followed by spaces.'
If possible, normalize the data model so you do not need to split delimited strings at query time.