pandas
python
data manipulation
string operations
data analysis

Add a string prefix to each value in a pandas string column

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Adding a prefix to every value in a Pandas string column is a common data transformation. The simplest approach is string concatenation with the + operator: df['col'] = 'prefix_' + df['col']. The .str accessor provides cat() for more complex operations, and apply() with a lambda works for custom formatting. All these methods are vectorized (except apply) and handle the operation without Python loops.

Method 1: String Concatenation with +

python
1import pandas as pd
2
3df = pd.DataFrame({'id': ['001', '002', '003'], 'name': ['Alice', 'Bob', 'Charlie']})
4
5# Add prefix to id column
6df['id'] = 'USR-' + df['id']
7print(df)
8#       id     name
9# 0  USR-001    Alice
10# 1  USR-002      Bob
11# 2  USR-003  Charlie

This is the fastest and most readable approach for simple prefixes.

Method 2: .str.cat() (Concatenation Accessor)

python
1# Using str accessor
2df['name'] = df['name'].str.cat(['Dr. '] * len(df), sep='')
3# Alternatively:
4df['name'] = 'Dr. ' + df['name']

str.cat() is more useful for joining two columns or adding a separator:

python
# Combine two columns with a separator
df['full'] = df['first_name'].str.cat(df['last_name'], sep=' ')

Method 3: apply() with Lambda

python
1df['id'] = df['id'].apply(lambda x: f'USR-{x}')
2
3# Or with a custom function
4def add_prefix(value):
5    return f'ID_{value.upper()}'
6
7df['id'] = df['id'].apply(add_prefix)

apply is slower than vectorized operations but useful for complex transformations.

Method 4: map() with Format String

python
1df['id'] = df['id'].map('USR-{}'.format)
2print(df)
3#       id     name
4# 0  USR-001    Alice
5# 1  USR-002      Bob
6# 2  USR-003  Charlie
7
8# f-string equivalent (Python 3.6+)
9df['id'] = df['id'].map(lambda x: f'USR-{x}')

Adding Both Prefix and Suffix

python
1df['name'] = '[' + df['name'] + ']'
2# [Alice], [Bob], [Charlie]
3
4# Or with str methods
5df['name'] = df['name'].apply(lambda x: f'({x})')

Conditional Prefix

Add prefix only to rows that meet a condition:

python
1df = pd.DataFrame({
2    'name': ['Alice', 'Bob', 'Charlie'],
3    'role': ['admin', 'user', 'admin']
4})
5
6# Add prefix only for admins
7df.loc[df['role'] == 'admin', 'name'] = 'Admin: ' + df.loc[df['role'] == 'admin', 'name']
8print(df)
9#            name   role
10# 0  Admin: Alice  admin
11# 1           Bob   user
12# 2  Admin: Charlie  admin
13
14# Using np.where
15import numpy as np
16df['name'] = np.where(df['role'] == 'admin', 'Admin: ' + df['name'], df['name'])

Handling Non-String Columns

If the column contains numbers, convert to string first:

python
1df = pd.DataFrame({'id': [1, 2, 3]})
2
3# This fails: TypeError
4# df['id'] = 'USR-' + df['id']
5
6# Convert to string first
7df['id'] = 'USR-' + df['id'].astype(str)
8# USR-1, USR-2, USR-3
9
10# With zero-padding
11df['id'] = 'USR-' + df['id'].astype(str).str.zfill(3)
12# USR-001, USR-002, USR-003

Handling NaN Values

python
1df = pd.DataFrame({'name': ['Alice', None, 'Charlie']})
2
3# Concatenation with NaN produces NaN
4result = 'Hello ' + df['name']
5# 0    Hello Alice
6# 1           NaN
7# 2  Hello Charlie
8
9# Fill NaN first if needed
10df['name'] = 'Hello ' + df['name'].fillna('Unknown')
11# Hello Alice, Hello Unknown, Hello Charlie
12
13# Or skip NaN rows
14mask = df['name'].notna()
15df.loc[mask, 'name'] = 'Hello ' + df.loc[mask, 'name']

Performance Comparison

python
1import timeit
2
3df = pd.DataFrame({'col': [f'val_{i}' for i in range(100_000)]})
4
5# Fastest: string concatenation
6%timeit 'prefix_' + df['col']          # ~1.5 ms
7
8# Fast: map with format
9%timeit df['col'].map('prefix_{}'.format)  # ~15 ms
10
11# Slower: apply with lambda
12%timeit df['col'].apply(lambda x: f'prefix_{x}')  # ~25 ms
13
14# Slowest: list comprehension (not vectorized)
15%timeit [f'prefix_{x}' for x in df['col']]  # ~30 ms

String concatenation with + is 10-20x faster than apply because it uses Pandas' internal C-optimized string operations.

Common Pitfalls

  • TypeError with non-string columns: 'prefix' + df['int_column'] raises TypeError. Convert the column to string first with .astype(str) before concatenation.
  • NaN propagation: String concatenation with NaN produces NaN, not 'prefix_nan'. Use .fillna('') before concatenation if you want to preserve the prefix for missing values.
  • Using apply for simple prefix: apply(lambda x: 'prefix_' + x) works but is 10-20x slower than 'prefix_' + df['col']. Use vectorized string operations for simple transformations.
  • Modifying a copy instead of the original: df['col'].str.upper() returns a new Series. Assigning back with df['col'] = ... is required. Without assignment, the original DataFrame is unchanged.
  • Mixed types in column: If a column has both strings and numbers (object dtype with mixed types), string concatenation may fail on numeric rows. Use df['col'].astype(str) to normalize the column first.

Summary

  • Use 'prefix_' + df['col'] for the fastest and simplest prefix operation
  • Use .astype(str) first if the column contains non-string values
  • Use .fillna() to handle NaN values before concatenation
  • Use np.where or df.loc for conditional prefix application
  • Avoid apply() for simple string operations — vectorized concatenation is 10-20x faster

Course illustration
Course illustration

All Rights Reserved.