161

I have potentially large files that need to be sorted by 1-n keys. Some of these keys might be numeric and some of them might not be. This is a fixed-width columnar file so there are no delimiters.

Is there a good way to do this with Unix sort? With one key it is as simple as using '-n'. I have read the man page and searched Google briefly, but didn't find a good example. How would I go about accomplishing this?

Note: I have ruled out Perl because of the file size potential. It would be a last resort.

2
  • One or two lines of example data would be really helpful for to create example command line. Also, does "1-n" keys mean that you need to sort by a variable number of keys? Doing that without scripting is gonna be fun... Commented Dec 10, 2008 at 20:58
  • I have a PHP wrapper around the sort command to enable the 1-n feature. Commented Dec 10, 2008 at 21:28

7 Answers 7

370

Take care though:

If you want to sort the file primarily by field 3, and secondarily by field 2 you want this:

sort -k 3,3 -k 2,2 < inputfile

Not this: sort -k 3 -k 2 < inputfile which sorts the file by the string from the beginning of field 3 to the end of line (which is potentially unique).

-k, --key=POS1[,POS2]     start a key at POS1 (origin 1), end it at POS2
                          (default end of line)
Sign up to request clarification or add additional context in comments.

4 Comments

Nice! Now, what if I want fleld 3 to be numerically and reverse sorted whereas field 2 to be non-numerically and normal (ascending) sorted? :)
@Arun POS is explained at the end of the man page. You just append the ordering options to the field number like this: sort -k 3,3nr -k 2,2
Aargh. What a counterintuitive interface: -k2 should be -k2,2 and a trailing comma -k2, should be 'magical default end of line or whatever'.
why the angle bracket <? Should sort -k3,3 -k2,2 inputfile not do the job?
105

The -k option is what you want.

-k 1.4,1.5n -k 1.14,1.15n

Would use character positions 4-5 in the first field (it's all one field for fixed width) and sort numerically as the first key.

The second key would be characters 14-15 in the first field also.

(edit)

Example (all I have is DOS/cygwin handy):

dir | \cygwin\bin\sort.exe -k 1.4,1.5n -k 1.40,1.60r

for the data:

12/10/2008  01:10 PM         1,564,990 outfile.txt

Sorts the directory listing by month number (pos 4-5) numerically, and then by filename (pos 40-60) in reverse. Since there are no tabs, it's all field 1 to sort.

3 Comments

It is only one field if there are no blanks in the input data. Nevertheless, your example is useful.
Correction: if there are no /tabs/ in the input data. In DOS's 'dir' command output, there are no tabs.
The examples on how to use the options (numeric, reverse) are extremely helpful, as it's nearly impossible to find out how to use just from the man page and the other answers didn't mention it. I wish I could +2 for this. ;)
74

Use the -k option (or --key=POS1[,POS2]). It can appear multiple times and each key can have global options (such as n for numeric sort)

3 Comments

From the sort man page: "POS is F[.C][OPTS], where F is the field number and C the character position in the field; both are origin 1." See man page for full documentation.
Also see andras's answer if you don't want to get insane.
Both comments above are accurate and additive. Thanks, gentlemen.
25

Here is one to sort various columns in a csv file by numeric and dictionary order, columns 5 and after as dictionary order

~/test>sort -t, -k1,1n -k2,2n -k3,3d -k4,4n -k5d  sort.csv
1,10,b,22,Ga
2,2,b,20,F
2,2,b,22,Ga
2,2,c,19,Ga
2,2,c,19,Gb,hi
2,2,c,19,Gb,hj
2,3,a,9,C

~/test>cat sort.csv
2,3,a,9,C
2,2,b,20,F
2,2,c,19,Gb,hj
2,2,c,19,Gb,hi
2,2,c,19,Ga
2,2,b,22,Ga
1,10,b,22,Ga

Note the -k1,1n means numeric starting at column 1 and ending at column 1. If I had done below, it would have concatenated column 1 and 2 making 1,10 sorted as 110

~/test>sort -t, -k1,2n -k3,3 -k4,4n -k5d  sort.csv
2,2,b,20,F
2,2,b,22,Ga
2,2,c,19,Ga
2,2,c,19,Gb,hi
2,2,c,19,Gb,hj
2,3,a,9,C
1,10,b,22,Ga

1 Comment

This is the best answer because it shows how to use different switches for different columns
12

I believe in your case something like

sort -t@ -k1.1,1.4 -k1.5,1.7 ... <inputfile

will work better. @ is the field separator, make sure it is a character that appears nowhere. then your input is considered as consisting of one column.

Edit: apparently clintp already gave a similar answer, sorry. As he points out, the flags 'n' and 'r' can be added to every -k.... option.

1 Comment

Even though the default separator accordinding to docs gnu.org/software/coreutils/manual/html_node/… is space, sometimes the field count is not what you'd expect. Perhaps as others have said here because of the LC_CTYPE locale setting. When in doubt count from the beginning of the line!
8

Note that is may also be desired to stabilize the sort with the -s switch, so that equally ranked lines maintain their original relative order in the output too.

Comments

2

I just want to add some tips, when you using sort , be careful about your locale that effects the order of the key comparison. I usually explicitly use LC_ALL=C to make locale what I want.

1 Comment

LC_ALL=C can also result in quite a speedup!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.