Python Pandas: Sorting Columns

Question

I'm trying to sort the columns of a .csv file. These are the names and the order of the columns:

'Unnamed: 0', 'Unnamed: 1', 
'25Mg BLK', '25Mg 1', '25Mg 2', 
'44Ca BLK', '44Ca 1', '44Ca 2', 
'137Ba BLK', '137Ba 1', '137Ba 2', 
'25Mg 3', '25Mg 4', '25Mg 5', 
'44Ca 3', '44Ca 4', 44Ca 5', 
'137Ba 3', '137Ba 4', '137Ba 5',

This is the order I would like to have:

'Unnamed: 0', 'Unnamed: 1', 
'25Mg BLK', '25Mg 1', '25Mg 2', '25Mg 3', '25Mg 4', '25Mg 5',
'44Ca BLK', '44Ca 1', '44Ca 2', '44Ca 3', '44Ca 4', 44Ca 5',
'137Ba BLK', '137Ba 1', '137Ba 2', '137Ba 3', '137Ba 4', '137Ba 5',

Currently my code looks like this:

import pandas as pd

df = pd.read_csv("real_data.csv", header=2)

df2 = df.reindex_axis(sorted(df.columns), axis=1)

print(df2)

df2.to_csv("sorted.csv")

With my current code I get the following result for the order of the columns:

'137Ba 1', '137Ba 2', '137Ba 3', '137Ba 4', '137Ba 5', '137Ba BLK',
'25Mg 1', '25Mg 2', '25Mg 3', '25Mg 4', '25Mg 5', '25Mg BLK', 
'44Ca 1', '44Ca 2', '44Ca 3', '44Ca 4', '44Ca 5', '44Ca BLK'

So I already figured out that I have to pass a function to the sorted function to specify how I want it to sort it, but I can't figure out a function which would do that.

Any input is highly appreciated!

Can you explain the logic behind your sorting more? Why does 137Ba BLK come before 137Ba 1? Unless you specify a clear sorting logic, it's hard for us (or for you) to write a good sorting function. — ASGM
– ASGM, Commented Oct 30, 2017 at 14:36
The file is the output of a device which measures different isotopes. Here 137Ba is the specific isotope. BLK stands for blank or background value and 1,2,3,... is series of measurements for that isotope. — qawert
– qawert, Commented Oct 30, 2017 at 15:02

jezrael · Accepted Answer · 2017-10-30 15:06:31Z

3

Use helper DataFrame, sort columns and then reindex by a.index:

c = df.columns
a = c[2:].to_series().str.extract('(\d+)([a-zA-Z]+)\s+(\d*)', expand=True)
#convert ints
a[0] = a[0].astype(int)
#convert to floats, non exis numbers generate NaNs
a[2] = pd.to_numeric(a[2], errors='coerce')
a = a.sort_values([0,1,2], na_position='first')
print (a)
             0   1    2
25Mg BLK    25  Mg  NaN
25Mg 1      25  Mg  1.0
25Mg 2      25  Mg  2.0
25Mg 3      25  Mg  3.0
25Mg 4      25  Mg  4.0
25Mg 5      25  Mg  5.0
44Ca BLK    44  Ca  NaN
44Ca 1      44  Ca  1.0
44Ca 2      44  Ca  2.0
44Ca 3      44  Ca  3.0
44Ca 4      44  Ca  4.0
44Ca 5      44  Ca  5.0
137Ba BLK  137  Ba  NaN
137Ba 1    137  Ba  1.0
137Ba 2    137  Ba  2.0
137Ba 3    137  Ba  3.0
137Ba 4    137  Ba  4.0
137Ba 5    137  Ba  5.0

df = df.reindex_axis(c[:2].tolist() + a.index.tolist(), axis=1)
print (df)

edited Oct 30, 2017 at 15:06

answered Oct 30, 2017 at 14:42

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

jezrael Over a year ago

oops, I forget for it, need c[:2].tolist() + a.index.tolist()

qawert Over a year ago

Thanks for your respons! a = c[2:].to_series().str.extract('(\d+)([a-zA-Z]+)\s+(\d*)', expand=True) What is the c in this line?

jezrael Over a year ago

c = df.columns

qawert Over a year ago

Works exactly the way I wanted! Thanks a lot!

Keith Cargill · Accepted Answer · 2017-10-30 14:43:09Z

1

See this answer here: https://stackoverflow.com/a/33555435/8239103 It seems to do what you want. For clarity I'll post the code here.

sequence = [Your sequence as a list as above]
your_dataframe = your_dataframe.reindex(columns=sequence)

answered Oct 30, 2017 at 14:43

Keith Cargill

824 bronze badges

1 Comment

qawert Over a year ago

Thanks for your response. I would like to have a program which sorts the column without any input, as the files I'm working with may have different numbers of elements.

BENY · Accepted Answer · 2017-10-30 15:04:09Z

1

from natsort import natsorted, ns

l1=list(map(lambda x: x.replace('BLK', '0000000'), l1))
l1=natsorted(l1)
l1=list(map(lambda x: x.replace('0000000', 'BLK'), l1))

l1
Out[1125]: 
['25Mg BLK',
 '25Mg 1',
 '25Mg 2',
 '25Mg 3',
 '25Mg 4',
 '25Mg 5',
 '44Ca BLK',
 '44Ca 1',
 '44Ca 2',
 '44Ca 3',
 '44Ca 4',
 '44Ca 5',
 '137Ba BLK',
 '137Ba 1',
 '137Ba 2',
 '137Ba 3',
 '137Ba 4',
 '137Ba 5']

Then doing df.reindex(l1)

edited Oct 30, 2017 at 15:04

answered Oct 30, 2017 at 14:48

BENY

324k22 gold badges176 silver badges250 bronze badges

Collectives™ on Stack Overflow

Python Pandas: Sorting Columns

3 Answers 3

4 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related