4

I'm trying to sort the columns of a .csv file. These are the names and the order of the columns:

'Unnamed: 0', 'Unnamed: 1', 
'25Mg BLK', '25Mg 1', '25Mg 2', 
'44Ca BLK', '44Ca 1', '44Ca 2', 
'137Ba BLK', '137Ba 1', '137Ba 2', 
'25Mg 3', '25Mg 4', '25Mg 5', 
'44Ca 3', '44Ca 4', 44Ca 5', 
'137Ba 3', '137Ba 4', '137Ba 5',

This is the order I would like to have:

'Unnamed: 0', 'Unnamed: 1', 
'25Mg BLK', '25Mg 1', '25Mg 2', '25Mg 3', '25Mg 4', '25Mg 5',
'44Ca BLK', '44Ca 1', '44Ca 2', '44Ca 3', '44Ca 4', 44Ca 5',
'137Ba BLK', '137Ba 1', '137Ba 2', '137Ba 3', '137Ba 4', '137Ba 5',

Currently my code looks like this:

import pandas as pd

df = pd.read_csv("real_data.csv", header=2)

df2 = df.reindex_axis(sorted(df.columns), axis=1)

print(df2)

df2.to_csv("sorted.csv")

With my current code I get the following result for the order of the columns:

'137Ba 1', '137Ba 2', '137Ba 3', '137Ba 4', '137Ba 5', '137Ba BLK',
'25Mg 1', '25Mg 2', '25Mg 3', '25Mg 4', '25Mg 5', '25Mg BLK', 
'44Ca 1', '44Ca 2', '44Ca 3', '44Ca 4', '44Ca 5', '44Ca BLK'

So I already figured out that I have to pass a function to the sorted function to specify how I want it to sort it, but I can't figure out a function which would do that.

Any input is highly appreciated!

2
  • Can you explain the logic behind your sorting more? Why does 137Ba BLK come before 137Ba 1? Unless you specify a clear sorting logic, it's hard for us (or for you) to write a good sorting function. Commented Oct 30, 2017 at 14:36
  • The file is the output of a device which measures different isotopes. Here 137Ba is the specific isotope. BLK stands for blank or background value and 1,2,3,... is series of measurements for that isotope. Commented Oct 30, 2017 at 15:02

3 Answers 3

3

Use helper DataFrame, sort columns and then reindex by a.index:

c = df.columns
a = c[2:].to_series().str.extract('(\d+)([a-zA-Z]+)\s+(\d*)', expand=True)
#convert ints
a[0] = a[0].astype(int)
#convert to floats, non exis numbers generate NaNs
a[2] = pd.to_numeric(a[2], errors='coerce')
a = a.sort_values([0,1,2], na_position='first')
print (a)
             0   1    2
25Mg BLK    25  Mg  NaN
25Mg 1      25  Mg  1.0
25Mg 2      25  Mg  2.0
25Mg 3      25  Mg  3.0
25Mg 4      25  Mg  4.0
25Mg 5      25  Mg  5.0
44Ca BLK    44  Ca  NaN
44Ca 1      44  Ca  1.0
44Ca 2      44  Ca  2.0
44Ca 3      44  Ca  3.0
44Ca 4      44  Ca  4.0
44Ca 5      44  Ca  5.0
137Ba BLK  137  Ba  NaN
137Ba 1    137  Ba  1.0
137Ba 2    137  Ba  2.0
137Ba 3    137  Ba  3.0
137Ba 4    137  Ba  4.0
137Ba 5    137  Ba  5.0

df = df.reindex_axis(c[:2].tolist() + a.index.tolist(), axis=1)
print (df)
Sign up to request clarification or add additional context in comments.

4 Comments

oops, I forget for it, need c[:2].tolist() + a.index.tolist()
Thanks for your respons! a = c[2:].to_series().str.extract('(\d+)([a-zA-Z]+)\s+(\d*)', expand=True) What is the c in this line?
c = df.columns
Works exactly the way I wanted! Thanks a lot!
1

See this answer here: https://stackoverflow.com/a/33555435/8239103 It seems to do what you want. For clarity I'll post the code here.

sequence = [Your sequence as a list as above]
your_dataframe = your_dataframe.reindex(columns=sequence)

1 Comment

Thanks for your response. I would like to have a program which sorts the column without any input, as the files I'm working with may have different numbers of elements.
1
from natsort import natsorted, ns

l1=list(map(lambda x: x.replace('BLK', '0000000'), l1))
l1=natsorted(l1)
l1=list(map(lambda x: x.replace('0000000', 'BLK'), l1))

l1
Out[1125]: 
['25Mg BLK',
 '25Mg 1',
 '25Mg 2',
 '25Mg 3',
 '25Mg 4',
 '25Mg 5',
 '44Ca BLK',
 '44Ca 1',
 '44Ca 2',
 '44Ca 3',
 '44Ca 4',
 '44Ca 5',
 '137Ba BLK',
 '137Ba 1',
 '137Ba 2',
 '137Ba 3',
 '137Ba 4',
 '137Ba 5']

Then doing df.reindex(l1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.