Unix "sort" command for a CSV file

Question

I have a .csv file with entries that look like:

"29 January 2016 19:33 EST","Mary Z Allen",...
"01 February 2016 16:29 EST","Kendra A Zimmerman",...
"22 February 2016 12:08 EST","Shawn Baker",...

The first CSV field (date/time) is assigned by the system, and always has exactly five words. The second CSV field(name), consists of one or more words.

I want to sort by the final word in the second field. For this example, the desired order after sort would be

"29 January 2016 19:33 EST","Mary Z Allen",...
"22 February 2016 12:08 EST","Shawn Baker",...
"01 February 2016 16:29 EST","Kendra A Zimmerman",...

No doubt, with a little effort, one could come up with a bash, awk, or python script to perform this kind of sort. But is there a way to use the sort command directly?

The specific Unix version I am using (from /proc/version) is

Linux version 3.13.0-79-generic (buildd@lcy01-11) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #123-Ubuntu SMP Fri Feb 19 14:28:32 UTC 2016

karakfa · Accepted Answer · 2016-02-24 20:35:45Z

2

awk to the rescue! with decorate/sort/un-decorate pattern.

$ awk -F, '{t=$2; sub(/.+ /,"",t); print t"\t"$0}' file | sort | cut -f2-

"29 January 2016 19:33 EST","Mary Z Allen",...
"22 February 2016 12:08 EST","Shawn Baker",...
"01 February 2016 16:29 EST","Kendra A Zimmerman",...

print the last word of the second field as key, sort and remove the dummy key.

edited Feb 24, 2016 at 20:35

answered Feb 24, 2016 at 15:59

karakfa

67.8k8 gold badges45 silver badges59 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

dannysauer Over a year ago

How did it take me three hours to type essentially the same response? That'll teach me to get distracted by a long lunch. :) I'll just contribute that the cut should probably be "-f2-" in case there's a tab in the data.

Curious George Over a year ago

thanks for the answer - probably would have taken me 3+ hours to work it out.

Curious George Over a year ago

Tried it out on the full database, realized that some of the records had space(s) at the end of the second field. Taking a hint from your answer, changed the awk script to '{t=$2; sub(/ +"/,"",t);sub(/.+ /,"",t); print t"\t"$0}' (space at the beginning of the first regexp). Seems to work well.

dannysauer · Accepted Answer · 2016-02-24 19:58:43Z

No. The sort command can split into fields, so if you just wanted to sort by name, you could do something like sort -t, -k2. But for this, what you'll have to do is to split the lines out. Here's a very simplistic example of extracting the thing you want to sort upon, prepending it to the line, sorting on only the first field, then removing that field.

user@machine[/home/user/dev]
$ cat testfile
"22 February 2016 12:08 EST","Shawn Baker",...
"29 January 2016 19:33 EST","Mary Z Allen",...
"01 February 2016 16:29 EST","Kendra A Zimmerman",...
user@machine[/home/user/dev]
$ paste <(cut -d, -f2 testfile | awk '$0=$NF') testfile | sort -k1,1 | cut -f2-
"29 January 2016 19:33 EST","Mary Z Allen",...
"22 February 2016 12:08 EST","Shawn Baker",...
"01 February 2016 16:29 EST","Kendra A Zimmerman",...

Note that this code to extract the desired field makes the bad assumption that the the first and second fields won't contain a comma: cut -d, -f2 testfile | awk '$0=$NF' If they may, then you'll want to replace it with something smarter. The rest of the code should be fine, as paste and cut defualt to tabs, and sort/awk are using whitespace.

Walter A · Accepted Answer · 2016-02-24 21:10:17Z

0

You can use sed to copy the lastline in front of your line. That way sorting is easy and you only need to delete the extra data. The sed command will need to look for strings without a double quote using [^"]*, resulting in

sed 's/\("[^"]*","[^"]* \)\([^"]*"\)/\2=\1\2/' testfile | sort | cut -d= -f2

answered Feb 24, 2016 at 21:10

Walter A

20.2k2 gold badges29 silver badges46 bronze badges

Collectives™ on Stack Overflow

Unix "sort" command for a CSV file

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related