2

Consider the following data in the wide format

df<-data.frame("id"=c(1,2,3,4),
           "ex"=c(1,0,0,1),
           "aQL"=c(5,4,NA,6),
           "bQL"=c(5,7,NA,9),
           "cQL"=c(5,7,NA,9),
           "bST"=c(3,7,8,9),
           "cST"=c(8,7,5,3),
           "aXY"=c(1,9,4,4),
           "cXY"=c(5,3,1,4))

I want to keep the column (or variable) names "id" and "ex" and rename the remaining columns, e.g. "aQL", "bQL" and "cQL" as "QL.1", "QL.2" and "QL.3", respectively. The other columns with names ending with "ST" and "XY" are expected to be renamed in the same manner, also having the order .1, .2 and .3. Of note is "aST" and "bXY" are missing from the data set, but I want them to be included and renamed as ST.1 and XY.2, with each having NAs as their entries. The expected output would look like

df
  id ex QL.1 QL.2 QL.3 ST.1 ST.2 ST.3 XY.1 XY.2 XY.3
1  1  1    5    5    5   NA    3    8    1   NA    5
2  2  0    4    7    7   NA    7    7    9   NA    3
3  3  0   NA   NA   NA   NA    8    5    4   NA    1
4  4  1    6    9    9   NA    9    3    4   NA    4

The main data set has many variables, so I would like the renaming to be done in an automated manner. I tried the following code

renameCol <- function(x) {
setNames(x, paste0("QL.", seq_len(ncol(x))))
}
renameCol(df)

but it does not work as expected. Thus, it renames "id" and "ex" that I want to maintain and it is not flexible on the renaming of multiple variable (i.e. QL, ST, XY). Any help is greatly appreciated.

5 Answers 5

2

I would suggest a tidyverse approach where there is no need of a function. In this solution you can extract the first letter of each variable name as id and then assign a number with cur_group_id so that the order is kept. Finally, with this new number you transform the variable containing the names and then you format to wide in order to obtain the expected output:

library(tidyverse)
#Data
df<-data.frame("id"=c(1,2,3,4),
               "ex"=c(1,0,0,1),
               "aQL"=c(5,4,NA,6),
               "bQL"=c(5,7,NA,9),
               "cQL"=c(5,7,NA,9),
               "bST"=c(3,7,8,9),
               "cST"=c(8,7,5,3),
               "aXY"=c(1,9,4,4),
               "cXY"=c(5,3,1,4))
#Reshape
df %>% pivot_longer(cols = -c(1,2)) %>%
  #Extract first letter as id
  mutate(id2=substring(name,1,1)) %>%
  #Create the number id
  group_by(id2) %>%
  mutate(id3=cur_group_id()) %>%
  #Clean name
  mutate(name=substring(name,2,nchar(name))) %>%
  #Create final var
  mutate(name2=paste0(name,'.',id3)) %>% ungroup() %>%
  dplyr::select(-c(name,id2,id3)) %>%
  #Format to wide
  pivot_wider(names_from = name2,values_from=value)

Output:

# A tibble: 4 x 9
     id    ex  QL.1  QL.2  QL.3  ST.2  ST.3  XY.1  XY.3
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     1     5     5     5     3     8     1     5
2     2     0     4     7     7     7     7     9     3
3     3     0    NA    NA    NA     8     5     4     1
4     4     1     6     9     9     9     3     4     4
Sign up to request clarification or add additional context in comments.

Comments

2

in base R you could do:

names(df) <- sub("(\\d)([A-Z]{2})$","\\2.\\1", chartr("abc","123",names(df)))
 df
  id ex QL.1 QL.2 QL.3 ST.2 ST.3 XY.1 XY.3
1  1  1    5    5    5    3    8    1    5
2  2  0    4    7    7    7    7    9    3
3  3  0   NA   NA   NA    8    5    4    1
4  4  1    6    9    9    9    3    4    4

If you need the NA columns:

names(df) <- sub("(\\d)([A-Z]{2})$","\\2.\\1", chartr("abc","123",names(df)))
a <- read.table(text=grep("\\.\\d",names(df),value = TRUE), sep=".")
b <- subset(aggregate(.~V1, a, function(x) setdiff(1:3,x)), V2>0)
df[do.call(paste, c(sep = ".", b))] <- NA
(df1 <- df[c(1, 2, order(names(df)[-(1:2)]) + 2)])

  id ex QL.1 QL.2 QL.3 ST.1 ST.2 ST.3 XY.1 XY.2 XY.3
1  1  1    5    5    5   NA    3    8    1   NA    5
2  2  0    4    7    7   NA    7    7    9   NA    3
3  3  0   NA   NA   NA   NA    8    5    4   NA    1
4  4  1    6    9    9   NA    9    3    4   NA    4

2 Comments

thanks! Could you please explain how the renaming works? e.g. if you're considering to rename as a.QL, b.QL, c.QL, a.ST, e.t.c.
@TRichard what happens here is that chartr changes abc into 123 etc try doing chartr("abcde", "12345", "you are my king") you will note that the aeiou have respectively changed to 12345. then use sub to capture a \\d (digit) followed by [A-Z]{2} 2 capital letters, and swith the order of the captured groups while placing a point between them.ie try doing sub("(\\d)([A-Z]{2})", "\\2-\\1", "2DR")
1

Another way you can try

colnames(df)[grepl("QL", colnames(df))] <- str_c("QL.", 1:3)

colnames(df)[grepl("ST", colnames(df))] <- str_c("ST.", 2:3)

colnames(df)[grepl("XY", colnames(df))] <- str_c("XY.", c(1,3))

#   id ex QL.1 QL.2 QL.3 ST.2 ST.3 XY.1 XY.3
# 1  1  1    5    5    5    3    8    1    5
# 2  2  0    4    7    7    7    7    9    3
# 3  3  0   NA   NA   NA    8    5    4    1
# 4  4  1    6    9    9    9    3    4    4

Comments

1

Here is a solution that uses regular expressions via the stringr package:

library(stringr)

df<-data.frame("id"=c(1,2,3,4),
               "ex"=c(1,0,0,1),
               "aQL"=c(5,4,NA,6),
               "bQL"=c(5,7,NA,9),
               "cQL"=c(5,7,NA,9),
               "bST"=c(3,7,8,9),
               "cST"=c(8,7,5,3),
               "aXY"=c(1,9,4,4),
               "cXY"=c(5,3,1,4))

renameCol <- function(x) {
  col_names <- colnames(x)
  index_ql <- str_detect(col_names,
                         "^[a-z]{1}QL")
  index_st <- str_detect(col_names,
                         "^[a-z]{1}ST")
  index_xy <- str_detect(col_names,
                         "^[a-z]{1}XY")
  
  replace_fun <- function(x) {which(letters %in% x)}
  
  col_names[index_ql] <- paste0("QL.", str_replace(substr(col_names[index_ql], 1, 1),
                                                  "[a-z]", replace_fun))
  col_names[index_st] <- paste0("ST.", str_replace(substr(col_names[index_st], 1, 1),
                                                   "[a-z]", replace_fun))
  col_names[index_xy] <- paste0("XY.", str_replace(substr(col_names[index_xy], 1, 1),
                                                   "[a-z]", replace_fun))
  
  col_names
  
}

colnames(df) <- renameCol(df)

df
#>   id ex QL.1 QL.2 QL.3 ST.2 ST.3 XY.1 XY.3
#> 1  1  1    5    5    5    3    8    1    5
#> 2  2  0    4    7    7    7    7    9    3
#> 3  3  0   NA   NA   NA    8    5    4    1
#> 4  4  1    6    9    9    9    3    4    4

Created on 2020-09-07 by the reprex package (v0.3.0)

Edit

The function above was adapted so that it takes the order into account.

1 Comment

thanks! There is some order. The prefixes a, b, c of the variables takes on .1, .2, .3 order. Is it possible to recode "bST" and "cXY" as ST.2 and XY.3?
0

using base pattern matching:

you need to define a function that does what you want on one single column name:

f = function(x){
  beg <- str_extract(x,"[a-z](?=[A-Z]{2})")
  num <- which(letters == beg)
  output <- paste0(str_extract(x,"(?<=[a-z])[A-Z]{2}"),".",num)
  return(output)
}

here extract the lower case letter if you have two upper case letters after, find it position in alphabet, and paste the found number back to the upper case letters.

> f("cQL")
[1] "QL.3"

You can then use regmatches and regular expression directly on the name of your data frame:

m <- gregexpr("[a-z][A-Z]{2}", names(df),perl = T)
regmatches(names(df), m) <- lapply(regmatches(names(df), m), f)
names(df)

> names(df)
[1] "id"   "ex"   "QL.1" "QL.2" "QL.3" "ST.2" "ST.3" "XY.1" "XY.3"

It solves only the renaming part, not the the "including missing column number" part of your question

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.