0

I have a .csv file with entries that look like:

"29 January 2016 19:33 EST","Mary Z Allen",...
"01 February 2016 16:29 EST","Kendra A Zimmerman",...
"22 February 2016 12:08 EST","Shawn Baker",...

The first CSV field (date/time) is assigned by the system, and always has exactly five words. The second CSV field(name), consists of one or more words.

I want to sort by the final word in the second field. For this example, the desired order after sort would be

"29 January 2016 19:33 EST","Mary Z Allen",...
"22 February 2016 12:08 EST","Shawn Baker",...
"01 February 2016 16:29 EST","Kendra A Zimmerman",...

No doubt, with a little effort, one could come up with a bash, awk, or python script to perform this kind of sort. But is there a way to use the sort command directly?

The specific Unix version I am using (from /proc/version) is

Linux version 3.13.0-79-generic (buildd@lcy01-11) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #123-Ubuntu SMP Fri Feb 19 14:28:32 UTC 2016

3 Answers 3

2

awk to the rescue! with decorate/sort/un-decorate pattern.

$ awk -F, '{t=$2; sub(/.+ /,"",t); print t"\t"$0}' file | sort | cut -f2-

"29 January 2016 19:33 EST","Mary Z Allen",...
"22 February 2016 12:08 EST","Shawn Baker",...
"01 February 2016 16:29 EST","Kendra A Zimmerman",...

print the last word of the second field as key, sort and remove the dummy key.

Sign up to request clarification or add additional context in comments.

3 Comments

How did it take me three hours to type essentially the same response? That'll teach me to get distracted by a long lunch. :) I'll just contribute that the cut should probably be "-f2-" in case there's a tab in the data.
thanks for the answer - probably would have taken me 3+ hours to work it out.
Tried it out on the full database, realized that some of the records had space(s) at the end of the second field. Taking a hint from your answer, changed the awk script to '{t=$2; sub(/ +"/,"",t);sub(/.+ /,"",t); print t"\t"$0}' (space at the beginning of the first regexp). Seems to work well.
0

No. The sort command can split into fields, so if you just wanted to sort by name, you could do something like sort -t, -k2. But for this, what you'll have to do is to split the lines out. Here's a very simplistic example of extracting the thing you want to sort upon, prepending it to the line, sorting on only the first field, then removing that field.

user@machine[/home/user/dev]
$ cat testfile
"22 February 2016 12:08 EST","Shawn Baker",...
"29 January 2016 19:33 EST","Mary Z Allen",...
"01 February 2016 16:29 EST","Kendra A Zimmerman",...
user@machine[/home/user/dev]
$ paste <(cut -d, -f2 testfile | awk '$0=$NF') testfile | sort -k1,1 | cut -f2-
"29 January 2016 19:33 EST","Mary Z Allen",...
"22 February 2016 12:08 EST","Shawn Baker",...
"01 February 2016 16:29 EST","Kendra A Zimmerman",...

Note that this code to extract the desired field makes the bad assumption that the the first and second fields won't contain a comma: cut -d, -f2 testfile | awk '$0=$NF' If they may, then you'll want to replace it with something smarter. The rest of the code should be fine, as paste and cut defualt to tabs, and sort/awk are using whitespace.

Comments

0

You can use sed to copy the lastline in front of your line. That way sorting is easy and you only need to delete the extra data. The sed command will need to look for strings without a double quote using [^"]*, resulting in

sed 's/\("[^"]*","[^"]* \)\([^"]*"\)/\2=\1\2/' testfile | sort | cut -d= -f2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.