Python Pandas: Split column and add new column next to current [duplicate]

Question

I have a excel sheet similar to this, although with a lot more columns:

Team	Members
Team1 (553)	95435
Team2 (443)	872

I want to split the team column into Team and a new column, named Team ID. I currently do this with the following code:

df[['Team', 'Team ID']] = df['Team'].str.split(r"\s\(+(?=\S*$)", expand=True)
df['Team ID'] = df['Team ID'].str[:-1]

This works fine (note that Team name can include numbers, spaces and paranthesis). So while this might not be perfect, I gets the job done.

My issue is that the new column, "Team ID" is placed at the end of the dataset. So it would be "Team - Members - Team ID". While not an issue with 3 columns, sometimes there is 10 columns where 7 needs to be split.

So the question: Is there any way to split a column in 2, and place the newly created column next to the old one?

@jezrael not sure this is a dupe, one can insert directly on the correct spot — mozway
– mozway, Commented Feb 4, 2022 at 9:27
@mozway - I think using list of columns names for last ordering in correct way. — jezrael
– jezrael, Commented Feb 4, 2022 at 9:28
Or using df.insert(df.columns.get_loc('Team')+1, 'Team ID', df.pop('Team ID')) should working — jezrael
– jezrael, Commented Feb 4, 2022 at 9:30

mozway · Accepted Answer · 2022-02-04 09:24:19Z

2

You can use str.extract with a regex.

To insert on the correct position you could use insert:

out = df['Team'].str.extract('(\w+) \((\d+)\)')

df['Team'] = out[0]
df.insert(df.columns.get_loc('Team')+1, 'Team ID', out[1])

output:

    Team  Team ID  Members
0  Team1      553    95435
1  Team2      443      872

regex:

(\w+)      # match word
\((\d+)\)  # match digits surrounded by parentheses

edited Feb 4, 2022 at 9:24

answered Feb 4, 2022 at 9:18

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

suprimos Over a year ago

How would that regex work if the Team name is "Team1 (Boston)"? Or even "Team1 (99)" followed by ID?

mozway Over a year ago

can you provide exhaustive example of full names? If you really want to match anything initially and know that the ID is the last thing, use an anchor '(.*) $(\d+)$$'

suprimos Over a year ago

Team might have been over simplified. An example of a string could be "BE-AMZ-V34489-Ford Motors (58837)-Web-Standard-New product range (Demo name) (12345679)". But ID is always last, in parenthesis, and after a space. Your last regex seems to work fine. Thanks. Your solution to add the new column next to the old one also works. Thanks again!

suprimos Over a year ago

Sorry to bring this back again, but after using this for a while, an issue have arised. In some rare instances, the ID is not present in the string, for instance no name has been given to the "team", so the name is simply the string "NoName" (with no ID) in the dataframe. With this setup both Team and Team ID columns return blank. Any way to keep Team column as "NoName"? (Or set both Team and Team ID to "NoName")

mozway Over a year ago

Maybe open a new question with a reproducible example?

Collectives™ on Stack Overflow

Python Pandas: Split column and add new column next to current [duplicate]

1 Answer 1

5 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Linked

Related