204

The data I have to work with is a bit messy.. It has header names inside of its data. How can I choose a row from an existing pandas dataframe and make it (rename it to) a column header?

I want to do something like:

header = df[df['old_header_name1'] == 'new_header_name1']

df.columns = header

6 Answers 6

313
In [21]: df = pd.DataFrame([(1,2,3), ('foo','bar','baz'), (4,5,6)])

In [22]: df
Out[22]: 
     0    1    2
0    1    2    3
1  foo  bar  baz
2    4    5    6

Set the column labels to equal the values in the 2nd row (index location 1):

In [23]: df.columns = df.iloc[1]

If the index has unique labels, you can drop the 2nd row using:

In [24]: df.drop(df.index[1])
Out[24]: 
1 foo bar baz
0   1   2   3
2   4   5   6

If the index is not unique, you could use:

In [133]: df.iloc[pd.RangeIndex(len(df)).drop(1)]
Out[133]: 
1 foo bar baz
0   1   2   3
2   4   5   6

Using df.drop(df.index[1]) removes all rows with the same label as the second row. Because non-unique indexes can lead to stumbling blocks (or potential bugs) like this, it's often better to take care that the index is unique (even though Pandas does not require it).

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you so much for your quick response! How can I choose a row by value in stead of index location to make it header? So for your example something like.. df.columns = df[df[0] == 'foo']
The problem with that is there could be more than one row which has the value "foo". One way around that problem is to explicitly choose the first such row: df.columns = df.iloc[np.where(df[0] == 'foo')[0][0]].
Ah I see why you did that way. For my case, I know there is only one row that has the value "foo". So it is ok. I just did this way I guess it is the same as the one you gave me above. idx_loc = df[df[0] == 'foo'].index.tolist()[0] df.columns = df.iloc[idx_loc]
125

This works (pandas v'0.19.2'):

df.rename(columns=df.iloc[0])

3 Comments

You can remove the "header" row by adding .drop(df.index[0])
I like this better than the actual accepted answer. I love the short oneline solutions.
Please keep in mind that after dropping the first row, index would start from 1, so you probably would like to add .reset_index(drop=True).
41

It would be easier to recreate the data frame. This would also interpret the columns types from scratch.

headers = df.iloc[0]
new_df  = pd.DataFrame(df.values[1:], columns=headers)

1 Comment

Simple and easy. Nice!
20

To rename the header without reassign df:

df.rename(columns=df.iloc[0], inplace = True)

To drop the row without reassign df:

df.drop(df.index[0], inplace = True)

Comments

8

You can specify the row index in the read_csv or read_html constructors via the header parameter which represents Row number(s) to use as the column names, and the start of the data. This has the advantage of automatically dropping all the preceding rows which supposedly are junk.

import pandas as pd
from io import StringIO

In[1]
    csv = '''junk1, junk2, junk3, junk4, junk5
    junk1, junk2, junk3, junk4, junk5
    pears, apples, lemons, plums, other
    40, 50, 61, 72, 85
    '''

    df = pd.read_csv(StringIO(csv), header=2)
    print(df)

Out[1]
       pears   apples   lemons   plums   other
    0     40       50       61      72      85

2 Comments

This does not address the question itself, which is asking about an already existing DataFrame.
some of the users who found this question (possibly the majority) would have a more generic use case than the OP; this answer is for that group
0

Keeping it Python simple

Padas DataFrames have columns attribute why not use it with standard Python, it is much clearer what you are doing:

table = [['name', 'Rf', 'Rg', 'Rf,skin', 'CRI'],
 ['testsala.cxf', '86', '95', '92', '87'],
 ['testsala.cxf: 727037 lm', '86', '95', '92', '87'],
 ['630.cxf', '18', '8', '11', '18'],
 ['Huawei stk-lx1.cxf', '86', '96', '88', '83'],
 ['dedo uv no filtro.cxf', '52', '93', '48', '58']]

import pandas as pd
data = pd.DataFrame(table[1:],columns=table[0])

or in the case is not the first row, but the 10th for instance:

columns = table.pop(10)
data = pd.DataFrame(table,columns=columns)

2 Comments

Tested for performance, although we know that the creation of a new DataFrame is "time-consuming" Anyhow this approach took 40X more time
@gbox thanks for you comment! If you want edit the answer

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.