Sorting multiple keys with Unix sort

Question

I have potentially large files that need to be sorted by 1-n keys. Some of these keys might be numeric and some of them might not be. This is a fixed-width columnar file so there are no delimiters.

Is there a good way to do this with Unix sort? With one key it is as simple as using '-n'. I have read the man page and searched Google briefly, but didn't find a good example. How would I go about accomplishing this?

Note: I have ruled out Perl because of the file size potential. It would be a last resort.

One or two lines of example data would be really helpful for to create example command line. Also, does "1-n" keys mean that you need to sort by a variable number of keys? Doing that without scripting is gonna be fun... — Ken Gentle
– Ken Gentle, Commented Dec 10, 2008 at 20:58
I have a PHP wrapper around the sort command to enable the 1-n feature. — Chris Kloberdanz
– Chris Kloberdanz, Commented Dec 10, 2008 at 21:28

ndemou · Accepted Answer · 2020-01-20 17:12:17Z

370

Take care though:

If you want to sort the file primarily by field 3, and secondarily by field 2 you want this:

sort -k 3,3 -k 2,2 < inputfile

Not this: sort -k 3 -k 2 < inputfile which sorts the file by the string from the beginning of field 3 to the end of line (which is potentially unique).

-k, --key=POS1[,POS2]     start a key at POS1 (origin 1), end it at POS2
                          (default end of line)

edited Jan 20, 2020 at 17:12

ndemou

5,7162 gold badges37 silver badges34 bronze badges

answered Jul 15, 2011 at 15:26

andras

6,7496 gold badges29 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Arun Over a year ago

Nice! Now, what if I want fleld 3 to be numerically and reverse sorted whereas field 2 to be non-numerically and normal (ascending) sorted? :)

andras Over a year ago

@Arun POS is explained at the end of the man page. You just append the ordering options to the field number like this: sort -k 3,3nr -k 2,2

android.weasel Over a year ago

Aargh. What a counterintuitive interface: -k2 should be -k2,2 and a trailing comma -k2, should be 'magical default end of line or whatever'.

HongboZhu Over a year ago

why the angle bracket <? Should sort -k3,3 -k2,2 inputfile not do the job?

Andy · Accepted Answer · 2013-09-06 11:12:17Z

105

The -k option is what you want.

-k 1.4,1.5n -k 1.14,1.15n

Would use character positions 4-5 in the first field (it's all one field for fixed width) and sort numerically as the first key.

The second key would be characters 14-15 in the first field also.

(edit)

Example (all I have is DOS/cygwin handy):

dir | \cygwin\bin\sort.exe -k 1.4,1.5n -k 1.40,1.60r

for the data:

12/10/2008  01:10 PM         1,564,990 outfile.txt

Sorts the directory listing by month number (pos 4-5) numerically, and then by filename (pos 40-60) in reverse. Since there are no tabs, it's all field 1 to sort.

edited Sep 6, 2013 at 11:12

Andy

17.8k9 gold badges55 silver badges70 bronze badges

answered Dec 10, 2008 at 21:03

Clinton Pierce

13.3k15 gold badges66 silver badges90 bronze badges

3 Comments

Jonathan Leffler Over a year ago

It is only one field if there are no blanks in the input data. Nevertheless, your example is useful.

Clinton Pierce Over a year ago

Correction: if there are no /tabs/ in the input data. In DOS's 'dir' command output, there are no tabs.

msb Over a year ago

The examples on how to use the options (numeric, reverse) are extremely helpful, as it's nearly impossible to find out how to use just from the man page and the other answers didn't mention it. I wish I could +2 for this. ;)

Ken Gentle · Accepted Answer · 2008-12-10 20:54:59Z

74

Use the -k option (or --key=POS1[,POS2]). It can appear multiple times and each key can have global options (such as n for numeric sort)

answered Dec 10, 2008 at 20:54

Ken Gentle

13.4k2 gold badges44 silver badges49 bronze badges

3 Comments

Adam Rosenfield Over a year ago

From the sort man page: "POS is F[.C][OPTS], where F is the field number and C the character position in the field; both are origin 1." See man page for full documentation.

ron Over a year ago

Also see andras's answer if you don't want to get insane.

Ken Gentle Over a year ago

Both comments above are accurate and additive. Thanks, gentlemen.

Patryk · Accepted Answer · 2016-10-16 15:39:07Z

25

Here is one to sort various columns in a csv file by numeric and dictionary order, columns 5 and after as dictionary order

~/test>sort -t, -k1,1n -k2,2n -k3,3d -k4,4n -k5d  sort.csv
1,10,b,22,Ga
2,2,b,20,F
2,2,b,22,Ga
2,2,c,19,Ga
2,2,c,19,Gb,hi
2,2,c,19,Gb,hj
2,3,a,9,C

~/test>cat sort.csv
2,3,a,9,C
2,2,b,20,F
2,2,c,19,Gb,hj
2,2,c,19,Gb,hi
2,2,c,19,Ga
2,2,b,22,Ga
1,10,b,22,Ga

Note the -k1,1n means numeric starting at column 1 and ending at column 1. If I had done below, it would have concatenated column 1 and 2 making 1,10 sorted as 110

~/test>sort -t, -k1,2n -k3,3 -k4,4n -k5d  sort.csv
2,2,b,20,F
2,2,b,22,Ga
2,2,c,19,Ga
2,2,c,19,Gb,hi
2,2,c,19,Gb,hj
2,3,a,9,C
1,10,b,22,Ga

edited Oct 16, 2016 at 15:39

Patryk

24.4k47 gold badges145 silver badges258 bronze badges

answered Mar 7, 2014 at 21:50

EdW

2,34327 silver badges16 bronze badges

1 Comment

xaxa Over a year ago

This is the best answer because it shows how to use different switches for different columns

Dong Hoon · Accepted Answer · 2008-12-10 21:11:40Z

12

I believe in your case something like

sort -t@ -k1.1,1.4 -k1.5,1.7 ... <inputfile

will work better. @ is the field separator, make sure it is a character that appears nowhere. then your input is considered as consisting of one column.

Edit: apparently clintp already gave a similar answer, sorry. As he points out, the flags 'n' and 'r' can be added to every -k.... option.

answered Dec 10, 2008 at 21:11

Dong Hoon

8891 gold badge8 silver badges13 bronze badges

1 Comment

Brad Dre Over a year ago

Even though the default separator accordinding to docs gnu.org/software/coreutils/manual/html_node/… is space, sometimes the field count is not what you'd expect. Perhaps as others have said here because of the LC_CTYPE locale setting. When in doubt count from the beginning of the line!

ron · Accepted Answer · 2011-08-30 08:52:27Z

8

Note that is may also be desired to stabilize the sort with the -s switch, so that equally ranked lines maintain their original relative order in the output too.

answered Aug 30, 2011 at 8:52

ron

9,4585 gold badges45 silver badges73 bronze badges

Comments

jianpx · Accepted Answer · 2011-12-30 16:08:04Z

2

I just want to add some tips, when you using sort , be careful about your locale that effects the order of the key comparison. I usually explicitly use LC_ALL=C to make locale what I want.

answered Dec 30, 2011 at 16:08

jianpx

3,3381 gold badge33 silver badges27 bronze badges

1 Comment

mat kelcey Over a year ago

LC_ALL=C can also result in quite a speedup!

Collectives™ on Stack Overflow

Sorting multiple keys with Unix sort

7 Answers 7

4 Comments

3 Comments

3 Comments

1 Comment

1 Comment

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

4 Comments

3 Comments

3 Comments

1 Comment

1 Comment

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related