1

I have a excel sheet similar to this, although with a lot more columns:

Team Members
Team1 (553) 95435
Team2 (443) 872

I want to split the team column into Team and a new column, named Team ID. I currently do this with the following code:

df[['Team', 'Team ID']] = df['Team'].str.split(r"\s\(+(?=\S*$)", expand=True)
df['Team ID'] = df['Team ID'].str[:-1]

This works fine (note that Team name can include numbers, spaces and paranthesis). So while this might not be perfect, I gets the job done.

My issue is that the new column, "Team ID" is placed at the end of the dataset. So it would be "Team - Members - Team ID". While not an issue with 3 columns, sometimes there is 10 columns where 7 needs to be split.

So the question: Is there any way to split a column in 2, and place the newly created column next to the old one?

3
  • @jezrael not sure this is a dupe, one can insert directly on the correct spot Commented Feb 4, 2022 at 9:27
  • @mozway - I think using list of columns names for last ordering in correct way. Commented Feb 4, 2022 at 9:28
  • Or using df.insert(df.columns.get_loc('Team')+1, 'Team ID', df.pop('Team ID')) should working Commented Feb 4, 2022 at 9:30

1 Answer 1

2

You can use str.extract with a regex.

To insert on the correct position you could use insert:

out = df['Team'].str.extract('(\w+) \((\d+)\)')

df['Team'] = out[0]
df.insert(df.columns.get_loc('Team')+1, 'Team ID', out[1])

output:

    Team  Team ID  Members
0  Team1      553    95435
1  Team2      443      872

regex:

(\w+)      # match word
\((\d+)\)  # match digits surrounded by parentheses
Sign up to request clarification or add additional context in comments.

5 Comments

How would that regex work if the Team name is "Team1 (Boston)"? Or even "Team1 (99)" followed by ID?
can you provide exhaustive example of full names? If you really want to match anything initially and know that the ID is the last thing, use an anchor '(.*) \((\d+)\)$'
Team might have been over simplified. An example of a string could be "BE-AMZ-V34489-Ford Motors (58837)-Web-Standard-New product range (Demo name) (12345679)". But ID is always last, in parenthesis, and after a space. Your last regex seems to work fine. Thanks. Your solution to add the new column next to the old one also works. Thanks again!
Sorry to bring this back again, but after using this for a while, an issue have arised. In some rare instances, the ID is not present in the string, for instance no name has been given to the "team", so the name is simply the string "NoName" (with no ID) in the dataframe. With this setup both Team and Team ID columns return blank. Any way to keep Team column as "NoName"? (Or set both Team and Team ID to "NoName")
Maybe open a new question with a reproducible example?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.