How to apply a function to two columns of Pandas dataframe

pandas

data manipulation

python

apply function

dataframe columns

How to apply a function to two columns of Pandas dataframe

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

Applying logic across two DataFrame columns is a common pandas task. The right solution depends on whether your function can work on whole Series objects at once or whether it truly needs to inspect one row at a time.

Prefer Vectorized Operations When Possible

If your function performs element-by-element math or string processing, pass the two columns directly. This is faster than DataFrame.apply(axis=1) because pandas can operate on whole arrays instead of building a Python object for every row.

python

1import pandas as pd
2
3df = pd.DataFrame(
4    {
5        "price": [10.0, 25.0, 12.5],
6        "quantity": [2, 1, 4],
7    }
8)
9
10def line_total(price, quantity):
11    return price * quantity
12
13df["total"] = line_total(df["price"], df["quantity"])
14print(df)

Output:

text

1   price  quantity  total
20   10.0         2   20.0
31   25.0         1   25.0
42   12.5         4   50.0

Even though line_total looks like a normal Python function, it works because pandas Series support vectorized arithmetic. For simple expressions, you can skip the helper and write df["total"] = df["price"] * df["quantity"].

Use `apply(axis=1)` for Row-Specific Logic

Sometimes the calculation depends on branching or multiple conditions that are easier to express per row. In that case, use apply with axis=1.

python

1import pandas as pd
2
3df = pd.DataFrame(
4    {
5        "score_a": [91, 72, 65, 88],
6        "score_b": [85, 70, 80, 90],
7    }
8)
9
10def choose_best_label(row):
11    best = max(row["score_a"], row["score_b"])
12    return "pass" if best >= 75 else "review"
13
14df["status"] = df.apply(choose_best_label, axis=1)
15print(df)

This style is more flexible because the function receives the entire row. The tradeoff is speed. On large datasets, row-wise apply can be much slower than a vectorized expression.

Passing Two Columns into a Custom Function

If you want a custom function that clearly accepts two inputs, you can call it with the column Series directly:

python

1import pandas as pd
2
3df = pd.DataFrame(
4    {
5        "start": [3, 10, 8],
6        "end": [7, 12, 9],
7    }
8)
9
10def span_length(start, end):
11    return end - start
12
13df["length"] = span_length(df["start"], df["end"])
14print(df)

This pattern is often the cleanest answer to the question "How do I apply a function to two columns?" The function does not need to know anything about rows. It simply receives two aligned pandas Series and returns another Series.

Handling Missing Values

Real datasets often contain missing entries. If either column can contain NaN, decide up front how the function should behave. You can fill missing values before applying the function or handle them inside the function.

python

1import pandas as pd
2
3df = pd.DataFrame(
4    {
5        "a": [5, None, 9],
6        "b": [2, 4, None],
7    }
8)
9
10def safe_sum(a, b):
11    return a.fillna(0) + b.fillna(0)
12
13df["sum"] = safe_sum(df["a"], df["b"])
14print(df)

Being explicit here avoids hard-to-debug output later, especially when downstream code assumes the new column has numeric values.

Common Pitfalls

The most common mistake is using df.apply when plain vectorized arithmetic would do the job. That makes the code slower and harder to read.

Another frequent error is forgetting axis=1. Without it, pandas applies the function column by column instead of row by row, and the function receives Series objects you did not expect.

Shape mismatches are also common. If a custom function returns more than one value per row, pandas may create a wider result instead of a single column. When you want one new column, make sure the function returns a single scalar for each row.

Missing values deserve attention as well. Arithmetic involving NaN usually produces NaN, so decide whether that is correct or whether you should fill defaults first.

Summary

For element-wise logic on two columns, pass the Series directly into a custom function.
Use vectorized expressions whenever possible because they are usually faster than row-wise apply.
Use df.apply(..., axis=1) only when the function genuinely needs full row context.
Handle missing values deliberately so the new column does not fill with unexpected NaN values.
Keep functions simple and return one value per row when assigning to a single output column.

How to apply a function to two columns of Pandas dataframe

Master System Design with Codemia

Introduction

Prefer Vectorized Operations When Possible

Use apply(axis=1) for Row-Specific Logic

Passing Two Columns into a Custom Function

Handling Missing Values

Common Pitfalls

Summary

Use `apply(axis=1)` for Row-Specific Logic