0

I have a Pandas dataframe with 28 columns in total. Each one has a unique number after a name. I want to drop all the numbers from the columns but keep the name. How can I do that best?

Here is an example of the columns:

Miscellaneous group | 00002928  Alcoholic Beverages | 0000292   Animal fats group | 000029

I tried .rename() already but to do this for 28 columns isn't efficient and is time consuming. It also creates a very long coding cell in Google Colab Notebook.

2 Answers 2

1

Using df.columns.str.split:

columns = ["Miscellaneous group | 00002928",  
           "Alcoholic Beverages | 0000292",
           "Animal fats group | 000029"]

df = pd.DataFrame(columns=columns)

df.columns = df.columns.str.split(r'\s+\|', regex=True).str[0]

Or df.columns.str.replace:

df.columns = df.columns.str.replace(r'\s+\|.*$', '', regex=True)

Also possible via map and re.sub:

import re

df.columns = map(lambda x: re.sub(r'\s+\|.*$', '', x), df.columns)

With df.rename you could apply logic like:

df = df.rename(columns=lambda x: x.split(' |')[0])

Or indeed via re.split:

df = df.rename(columns=lambda x: re.split(r'\s+\|', x)[0])

For the regex pattern, see regex101.

Sign up to request clarification or add additional context in comments.

Comments

0

Assuming you're starting off with, e.g.

df.columns = ["Miscellaneous group | 00002928",  "Alcoholic Beverages | 0000292",   "Animal fats group | 000029"]

The simplest solution looks like it would be to use a list comprehension to iterate over the column names and split on the | in your string and keep the first part of the resulting list, so:

df.columns = [col.split(" | ")[0] for col in columns]

This returns:

['Miscellaneous group', 'Alcoholic Beverages', 'Animal fats group']

Alternatively, you could do this with a regex:

import re

df.columns = [re.sub(r'\s*\|.*', '', col) for col in columns]

This looks for a string that begins with whitespace, followed by |, followed by anything and replaces it all with an empty string.

Final alternative:

columns = [re.sub(r'\s*\d+$', '', s) for s in columns]

This looks for whitespace followed by digits at the end of each string, so this would remove the trailing digits regardless of what preceded them (in case the | isn't always present), so it would produce:

['Miscellaneous group |', 'Alcoholic Beverages |', 'Animal fats group |']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.