Python: columns must be same length as key when splitting a column

Question

I have two address columns and I want to extract the last word from the first column and the first word from the second column. In the provided example there aren't two words in column 'Address2', but I want to build the code in such a way that it will work regardless of how the dataset will look like. Sometimes the address2 can be one word, something it will have 2, etc..

data = {
    'Address1': ['3 Steel Street', '1 Arnprior Crescent', '40 Bargeddie Street Blackhill'],
    'Address2': ['Saltmarket', 'Castlemilk', 'Blackhill']
}

df = pd.DataFrame(data)

I have no problem with column 'Address1':

df[['StringStart', 'LastWord']] = df['Address1'].str.rsplit(' ', n=1, expand=True)

The problem comes with column 'Address2' where if I apply the above code I an error: Columns must be same length as key

I understand where the problem is coming from - I am trying to split one column which has one element into two columns. I am sure there is a way in which this can be handled to allow the split anyway and return Null if there isn't a word and a value if there is.

Subir Chowdhury · Accepted Answer · 2025-05-01 12:20:10Z

3

Using str.extract() might be better for several reasons: it handles all cases, offers precision with regular expressions, and eliminates the risk of value errors.

import pandas as pd

data = {
    'Address1': ['3 Steel Street', '1 Arnprior Crescent', '40 Bargeddie Street Blackhill'],
    'Address2': ['Saltmarket', 'Castlemilk East', 'Blackhill']
}
df = pd.DataFrame(data)

df[['StringStart', 'LastWord']] = df['Address1'].str.rsplit(' ', n=1, expand=True)

df[['FirstWord_Address2', 'Remaining_Address2']] = (
    df['Address2'].str.extract(r'^(\S+)\s*(.*)$')
)

print(df)

Or:

df[['Address1_Prefix', 'Address1_LastWord']] = df['Address1'].str.extract(r'^(.*\b)\s+(\S+)$')

df[['Address2_FirstWord', 'Address2_Remaining']] = df['Address2'].str.extract(r'^(\S+)\s*(.*)$')

Output:

                        Address1         Address2          StringStart   LastWord FirstWord_Address2 Remaining_Address2
0                 3 Steel Street       Saltmarket              3 Steel     Street         Saltmarket
1            1 Arnprior Crescent  Castlemilk East           1 Arnprior   Crescent         Castlemilk               East
2  40 Bargeddie Street Blackhill        Blackhill  40 Bargeddie Street  Blackhill          Blackhill

answered May 1 at 12:20

Subir Chowdhury

1,7331 gold badge1 silver badge10 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

MariaT May 1 at 12:25

Hi, thanks for this - it worked! Could you please explain me what does this bit do (r'^(\S+)\s*(.*)$') ? These are regular expressions, but what doe we actually say in the brackets? Also, where do we specify that we want the first word in the columns? I want to understand this, so that I can handle it in the future

Subir Chowdhury May 1 at 12:33

^(\S+) = Grabs the first word (e.g., "Saltmarket" from "Saltmarket", or "Castlemilk" from "Castlemilk East"). \s(.)** = Matches the whitespace (if any), then captures the remaining text (e.g., "East" in "Castlemilk East"). If no text follows, it captures an empty string. This is saved in the Remaining column. Please don't hesitate to ask me more if I am not clear. Thank you. @MariaT

MariaT May 1 at 12:35

Hi, thanks for this. I am also wondering - if ^ symbol marks the start of the string, how are we not exporting the first word from column Address1 in here: df['Address1'].str.extract(r'^(.*\b)\s+(\S+)$')

Subir Chowdhury May 1 at 12:47

^ matches the start of "3 Steel Street". "str.extract(r'^(.*\b)\s+(\S+)$')" is not omitting the first word from the string — it’s actually trying to split the string into two parts. Group 1: (.*\b) – everything up to the last word boundary before the last word. Group 2: (\S+) – the last word (non-whitespace characters at the end). Thank you. @MariaT

alec_djinn · Accepted Answer · 2025-05-01 12:18:43Z

I would use df.apply() with a custom function.

This is a straightforward example.

import numpy as np
from functools import partial

def split_addresses(row, col):
    r = row[col].split(' ')
    if len(r) < 2:
        first_word = " ".join(r)
        last_word = np.nan
    else:
        first_word = " ".join(r[:-1])
        last_word = r[-1]
    return first_word, last_word

_fun = partial(split_addresses, col='Address2') #chose which columns you want to process

splits = df.apply(_fun, axis=1)
df["StringStart"] = pd.Series([s[0] for s in splits])
df["StringEnd"] = pd.Series([s[1] for s in splits])

print(df)

                        Address1    Address2 StringStart   LastWord  StringEnd
0                 3 Steel Street  Saltmarket  Saltmarket     Street        NaN
1            1 Arnprior Crescent  Castlemilk  Castlemilk   Crescent        NaN
2  40 Bargeddie Street Blackhill   Blackhill   Blackhill  Blackhill        NaN

ouroboros1 · Accepted Answer · 2025-05-02 08:28:16Z

1

TL;DR

You can use .reindex to add missing columns:

import pandas as pd

(
    pd.Series(['Hello', 'world'])
      .str.split(n=1, expand=True)
      .reindex(pd.RangeIndex(2), axis=1)
)

       0   1
0  Hello NaN
1  world NaN

With expand=True both Series.str.split and .rsplit will return a pd.DataFrame with a default pd.RangeIndex. Hence, with n=1, the result has either one column (0) or two (0, 1, or: pd.RangeIndex(n+1)).

Realizing this, you can use df.reindex with axis=1 to ensure a consistent number of output columns. Missing columns get added with NaN values. Here's a wrapper:

def split_expand(series, n=1, rsplit=False):
    splitter = series.str.rsplit if rsplit else series.str.split
    result = splitter(n=n, expand=True)
    if result.shape[1] < n+1:
        return result.reindex(pd.RangeIndex(n+1), axis=1)
    return result

df[['StringStart', 'LastWord']] = split_expand(df['Address1'], rsplit=True)
df[['FirstWord', 'StringEnd']] = split_expand(df['Address2'])

Output:

                        Address1    Address2          StringStart   LastWord  \
0                 3 Steel Street  Saltmarket              3 Steel     Street   
1            1 Arnprior Crescent  Castlemilk           1 Arnprior   Crescent   
2  40 Bargeddie Street Blackhill   Blackhill  40 Bargeddie Street  Blackhill   

    FirstWord  StringEnd  
0  Saltmarket        NaN  
1  Castlemilk        NaN  
2   Blackhill        NaN

edited May 2 at 8:28

answered May 1 at 20:21

ouroboros1

15.2k7 gold badges49 silver badges59 bronze badges

1 Comment

MariaT May 2 at 12:46

Hi, thanks for providing a different solution which such detailed explanation!

Collectives™ on Stack Overflow

Python: columns must be same length as key when splitting a column

3 Answers 3

4 Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related