How to apply a function to two columns of Pandas dataframe
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Applying logic across two DataFrame columns is a common pandas task. The right solution depends on whether your function can work on whole Series objects at once or whether it truly needs to inspect one row at a time.
Prefer Vectorized Operations When Possible
If your function performs element-by-element math or string processing, pass the two columns directly. This is faster than DataFrame.apply(axis=1) because pandas can operate on whole arrays instead of building a Python object for every row.
Output:
Even though line_total looks like a normal Python function, it works because pandas Series support vectorized arithmetic. For simple expressions, you can skip the helper and write df["total"] = df["price"] * df["quantity"].
Use apply(axis=1) for Row-Specific Logic
Sometimes the calculation depends on branching or multiple conditions that are easier to express per row. In that case, use apply with axis=1.
This style is more flexible because the function receives the entire row. The tradeoff is speed. On large datasets, row-wise apply can be much slower than a vectorized expression.
Passing Two Columns into a Custom Function
If you want a custom function that clearly accepts two inputs, you can call it with the column Series directly:
This pattern is often the cleanest answer to the question "How do I apply a function to two columns?" The function does not need to know anything about rows. It simply receives two aligned pandas Series and returns another Series.
Handling Missing Values
Real datasets often contain missing entries. If either column can contain NaN, decide up front how the function should behave. You can fill missing values before applying the function or handle them inside the function.
Being explicit here avoids hard-to-debug output later, especially when downstream code assumes the new column has numeric values.
Common Pitfalls
The most common mistake is using df.apply when plain vectorized arithmetic would do the job. That makes the code slower and harder to read.
Another frequent error is forgetting axis=1. Without it, pandas applies the function column by column instead of row by row, and the function receives Series objects you did not expect.
Shape mismatches are also common. If a custom function returns more than one value per row, pandas may create a wider result instead of a single column. When you want one new column, make sure the function returns a single scalar for each row.
Missing values deserve attention as well. Arithmetic involving NaN usually produces NaN, so decide whether that is correct or whether you should fill defaults first.
Summary
- For element-wise logic on two columns, pass the Series directly into a custom function.
- Use vectorized expressions whenever possible because they are usually faster than row-wise
apply. - Use
df.apply(..., axis=1)only when the function genuinely needs full row context. - Handle missing values deliberately so the new column does not fill with unexpected
NaNvalues. - Keep functions simple and return one value per row when assigning to a single output column.

