1

I have the following dataset. I want to sort it by second column.

dat <- read.table(header=TRUE, text="
                  ID  LFrom LTo It1 It2 It3 It4
                  ab7    1   2   47  152 259 140
                  ab8   1.1   2.1   88  236 251 145
                  ab21   1.2   2.1  72  263 331 147
                  ab3    1   2   71  207 290 242
                  ab300    1   2   47  152 259 140
                  ab4    1.2   2.1  72  263 331 147
                  ab10    1.1   2   71  207 290 242
                  ab501    1   2   47  152 259 140
                  ")

dat
     ID LFrom LTo It1 It2 It3 It4
1   ab7   1.0 2.0  47 152 259 140
2   ab8   1.1 2.1  88 236 251 145
3  ab21   1.2 2.1  72 263 331 147
4   ab3   1.0 2.0  71 207 290 242
5 ab300   1.0 2.0  47 152 259 140
6   ab4   1.2 2.1  72 263 331 147
7  ab10   1.1 2.0  71 207 290 242
8 ab501   1.0 2.0  47 152 259 140

By using the following code, I find:

dat[with(dat, order(LFrom, ID)),]
     ID LFrom LTo It1 It2 It3 It4
4   ab3   1.0 2.0  71 207 290 242
5 ab300   1.0 2.0  47 152 259 140
8 ab501   1.0 2.0  47 152 259 140
1   ab7   1.0 2.0  47 152 259 140
7  ab10   1.1 2.0  71 207 290 242
2   ab8   1.1 2.1  88 236 251 145
3  ab21   1.2 2.1  72 263 331 147
6   ab4   1.2 2.1  72 263 331 147

The sorting in ID is not really sorted according to the number value. I rewrite the data by putting extra 00 and 0 (manually) like the following:

dat1 <- read.table(header=TRUE, text="
                  ID  LFrom LTo It1 It2 It3 It4
                  ab007    1   2   47  152 259 140
                  ab008   1.1   2.1   88  236 251 145
                  ab021   1.2   2.1  72  263 331 147
                  ab003    1   2   71  207 290 242
                  ab300    1   2   47  152 259 140
                  ab004    1.2   2.1  72  263 331 147
                  ab010    1.1   2   71  207 290 242
                  ab501    1   2   47  152 259 140
                  ")
dat1
     ID LFrom LTo It1 It2 It3 It4
1 ab007   1.0 2.0  47 152 259 140
2 ab008   1.1 2.1  88 236 251 145
3 ab021   1.2 2.1  72 263 331 147
4 ab003   1.0 2.0  71 207 290 242
5 ab300   1.0 2.0  47 152 259 140
6 ab004   1.2 2.1  72 263 331 147
7 ab010   1.1 2.0  71 207 290 242
8 ab501   1.0 2.0  47 152 259 140

Now the following code works fine:

dat1[with(dat1, order(LFrom, ID)), ]
     ID LFrom LTo It1 It2 It3 It4
4 ab003   1.0 2.0  71 207 290 242
1 ab007   1.0 2.0  47 152 259 140
5 ab300   1.0 2.0  47 152 259 140
8 ab501   1.0 2.0  47 152 259 140
2 ab008   1.1 2.1  88 236 251 145
7 ab010   1.1 2.0  71 207 290 242
6 ab004   1.2 2.1  72 263 331 147
3 ab021   1.2 2.1  72 263 331 147

I have a large list of dataset. Manually changing the ID is tough. All I need to get the ID sorted (with including 00 and 0).

4
  • It orders the columns by the order of the arguments, LFrom first, then ID. It looks like it's working fine. Not sure what you are asking. Commented Sep 18, 2015 at 18:56
  • Does ID follow a consistent format that you know in advance? i.e., set of characters values by a set of numeric values. Commented Sep 18, 2015 at 18:57
  • @mispelled, The numeric value of ID is 3 digit (max) and it has a consistent form in the beginning as "ab". Commented Sep 18, 2015 at 18:59
  • 1
    @RichardScriven, I think he is asking a way to programmatically pad the string such that he will get consistent sorting behavior. He mentioned manually changing the ID is tough with a large list of datasets. Commented Sep 18, 2015 at 18:59

2 Answers 2

3

You can change the with a combination of substr and sprintf as follows:

dat$ID <- paste0(substr(dat$ID,1,2),sprintf("%03d",as.numeric(substr(dat$ID,3,5))))

this gives:

> dat[with(dat, order(LFrom, ID)), ]
     ID LFrom LTo It1 It2 It3 It4
4 ab003   1.0 2.0  71 207 290 242
1 ab007   1.0 2.0  47 152 259 140
5 ab300   1.0 2.0  47 152 259 140
8 ab501   1.0 2.0  47 152 259 140
2 ab008   1.1 2.1  88 236 251 145
7 ab010   1.1 2.0  71 207 290 242
6 ab004   1.2 2.1  72 263 331 147
3 ab021   1.2 2.1  72 263 331 147
Sign up to request clarification or add additional context in comments.

Comments

1

use data.table:

library(data.table)

dat <- read.table(header=TRUE, text="
                 ID  LFrom LTo It1 It2 It3 It4
                  ab7    1   2   47  152 259 140
                  ab8   1.1   2.1   88  236 251 145
                  ab21   1.2   2.1  72  263 331 147
                  ab3    1   2   71  207 290 242
                  ab300    1   2   47  152 259 140
                  ab4    1.2   2.1  72  263 331 147
                  ab10    1.1   2   71  207 290 242
                  ab501    1   2   47  152 259 140
                  ")
DT = as.data.table(dat1)

 DT[, newID:=gsub("ab", "", ID)]
   DT[order(LFrom, newID),]
      ID LFrom LTo It1 It2 It3 It4 newID
1: ab003   1.0 2.0  71 207 290 242   003
2: ab007   1.0 2.0  47 152 259 140   007
3: ab300   1.0 2.0  47 152 259 140   300
4: ab501   1.0 2.0  47 152 259 140   501
5: ab008   1.1 2.1  88 236 251 145   008
6: ab010   1.1 2.0  71 207 290 242   010
7: ab004   1.2 2.1  72 263 331 147   004
8: ab021   1.2 2.1  72 263 331 147   021

Or just

library(data.table)
DT = as.data.table(dat1)
DT[order(LFrom, gsub("ab", "", ID)),]

Without data.table it would be:

dat1[with(dat1, order(LFrom, gsub("ab", "", ID))), ]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.