How to automate renaming of columns in wide data using R

Question

Consider the following data in the wide format

df<-data.frame("id"=c(1,2,3,4),
           "ex"=c(1,0,0,1),
           "aQL"=c(5,4,NA,6),
           "bQL"=c(5,7,NA,9),
           "cQL"=c(5,7,NA,9),
           "bST"=c(3,7,8,9),
           "cST"=c(8,7,5,3),
           "aXY"=c(1,9,4,4),
           "cXY"=c(5,3,1,4))

I want to keep the column (or variable) names "id" and "ex" and rename the remaining columns, e.g. "aQL", "bQL" and "cQL" as "QL.1", "QL.2" and "QL.3", respectively. The other columns with names ending with "ST" and "XY" are expected to be renamed in the same manner, also having the order .1, .2 and .3. Of note is "aST" and "bXY" are missing from the data set, but I want them to be included and renamed as ST.1 and XY.2, with each having NAs as their entries. The expected output would look like

df
  id ex QL.1 QL.2 QL.3 ST.1 ST.2 ST.3 XY.1 XY.2 XY.3
1  1  1    5    5    5   NA    3    8    1   NA    5
2  2  0    4    7    7   NA    7    7    9   NA    3
3  3  0   NA   NA   NA   NA    8    5    4   NA    1
4  4  1    6    9    9   NA    9    3    4   NA    4

The main data set has many variables, so I would like the renaming to be done in an automated manner. I tried the following code

renameCol <- function(x) {
setNames(x, paste0("QL.", seq_len(ncol(x))))
}
renameCol(df)

but it does not work as expected. Thus, it renames "id" and "ex" that I want to maintain and it is not flexible on the renaming of multiple variable (i.e. QL, ST, XY). Any help is greatly appreciated.

Duck · Accepted Answer · 2020-09-07 19:22:32Z

I would suggest a tidyverse approach where there is no need of a function. In this solution you can extract the first letter of each variable name as id and then assign a number with cur_group_id so that the order is kept. Finally, with this new number you transform the variable containing the names and then you format to wide in order to obtain the expected output:

library(tidyverse)
#Data
df<-data.frame("id"=c(1,2,3,4),
               "ex"=c(1,0,0,1),
               "aQL"=c(5,4,NA,6),
               "bQL"=c(5,7,NA,9),
               "cQL"=c(5,7,NA,9),
               "bST"=c(3,7,8,9),
               "cST"=c(8,7,5,3),
               "aXY"=c(1,9,4,4),
               "cXY"=c(5,3,1,4))
#Reshape
df %>% pivot_longer(cols = -c(1,2)) %>%
  #Extract first letter as id
  mutate(id2=substring(name,1,1)) %>%
  #Create the number id
  group_by(id2) %>%
  mutate(id3=cur_group_id()) %>%
  #Clean name
  mutate(name=substring(name,2,nchar(name))) %>%
  #Create final var
  mutate(name2=paste0(name,'.',id3)) %>% ungroup() %>%
  dplyr::select(-c(name,id2,id3)) %>%
  #Format to wide
  pivot_wider(names_from = name2,values_from=value)

Output:

# A tibble: 4 x 9
     id    ex  QL.1  QL.2  QL.3  ST.2  ST.3  XY.1  XY.3
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     1     5     5     5     3     8     1     5
2     2     0     4     7     7     7     7     9     3
3     3     0    NA    NA    NA     8     5     4     1
4     4     1     6     9     9     9     3     4     4

Onyambu · Accepted Answer · 2020-09-07 22:26:36Z

2

in base R you could do:

names(df) <- sub("(\\d)([A-Z]{2})$","\\2.\\1", chartr("abc","123",names(df)))
 df
  id ex QL.1 QL.2 QL.3 ST.2 ST.3 XY.1 XY.3
1  1  1    5    5    5    3    8    1    5
2  2  0    4    7    7    7    7    9    3
3  3  0   NA   NA   NA    8    5    4    1
4  4  1    6    9    9    9    3    4    4

If you need the NA columns:

names(df) <- sub("(\\d)([A-Z]{2})$","\\2.\\1", chartr("abc","123",names(df)))
a <- read.table(text=grep("\\.\\d",names(df),value = TRUE), sep=".")
b <- subset(aggregate(.~V1, a, function(x) setdiff(1:3,x)), V2>0)
df[do.call(paste, c(sep = ".", b))] <- NA
(df1 <- df[c(1, 2, order(names(df)[-(1:2)]) + 2)])

  id ex QL.1 QL.2 QL.3 ST.1 ST.2 ST.3 XY.1 XY.2 XY.3
1  1  1    5    5    5   NA    3    8    1   NA    5
2  2  0    4    7    7   NA    7    7    9   NA    3
3  3  0   NA   NA   NA   NA    8    5    4   NA    1
4  4  1    6    9    9   NA    9    3    4   NA    4

edited Sep 7, 2020 at 22:26

answered Sep 7, 2020 at 22:04

Onyambu

80.3k3 gold badges29 silver badges65 bronze badges

2 Comments

T Richard Over a year ago

thanks! Could you please explain how the renaming works? e.g. if you're considering to rename as a.QL, b.QL, c.QL, a.ST, e.t.c.

Onyambu Over a year ago

@TRichard what happens here is that chartr changes abc into 123 etc try doing chartr("abcde", "12345", "you are my king") you will note that the aeiou have respectively changed to 12345. then use sub to capture a \\d (digit) followed by [A-Z]{2} 2 capital letters, and swith the order of the captured groups while placing a point between them.ie try doing sub("(\\d)([A-Z]{2})", "\\2-\\1", "2DR")

Mike V · Accepted Answer · 2020-09-07 19:31:42Z

1

Another way you can try

colnames(df)[grepl("QL", colnames(df))] <- str_c("QL.", 1:3)

colnames(df)[grepl("ST", colnames(df))] <- str_c("ST.", 2:3)

colnames(df)[grepl("XY", colnames(df))] <- str_c("XY.", c(1,3))

#   id ex QL.1 QL.2 QL.3 ST.2 ST.3 XY.1 XY.3
# 1  1  1    5    5    5    3    8    1    5
# 2  2  0    4    7    7    7    7    9    3
# 3  3  0   NA   NA   NA    8    5    4    1
# 4  4  1    6    9    9    9    3    4    4

answered Sep 7, 2020 at 19:31

Mike V

1,3642 gold badges11 silver badges22 bronze badges

Comments

starja · Accepted Answer · 2020-09-07 19:50:02Z

Here is a solution that uses regular expressions via the stringr package:

library(stringr)

df<-data.frame("id"=c(1,2,3,4),
               "ex"=c(1,0,0,1),
               "aQL"=c(5,4,NA,6),
               "bQL"=c(5,7,NA,9),
               "cQL"=c(5,7,NA,9),
               "bST"=c(3,7,8,9),
               "cST"=c(8,7,5,3),
               "aXY"=c(1,9,4,4),
               "cXY"=c(5,3,1,4))

renameCol <- function(x) {
  col_names <- colnames(x)
  index_ql <- str_detect(col_names,
                         "^[a-z]{1}QL")
  index_st <- str_detect(col_names,
                         "^[a-z]{1}ST")
  index_xy <- str_detect(col_names,
                         "^[a-z]{1}XY")
  
  replace_fun <- function(x) {which(letters %in% x)}
  
  col_names[index_ql] <- paste0("QL.", str_replace(substr(col_names[index_ql], 1, 1),
                                                  "[a-z]", replace_fun))
  col_names[index_st] <- paste0("ST.", str_replace(substr(col_names[index_st], 1, 1),
                                                   "[a-z]", replace_fun))
  col_names[index_xy] <- paste0("XY.", str_replace(substr(col_names[index_xy], 1, 1),
                                                   "[a-z]", replace_fun))
  
  col_names
  
}

colnames(df) <- renameCol(df)

df
#>   id ex QL.1 QL.2 QL.3 ST.2 ST.3 XY.1 XY.3
#> 1  1  1    5    5    5    3    8    1    5
#> 2  2  0    4    7    7    7    7    9    3
#> 3  3  0   NA   NA   NA    8    5    4    1
#> 4  4  1    6    9    9    9    3    4    4

^{Created on 2020-09-07 by the reprex package (v0.3.0)}

Edit

The function above was adapted so that it takes the order into account.

thanks! There is some order. The prefixes a, b, c of the variables takes on .1, .2, .3 order. Is it possible to recode "bST" and "cXY" as ST.2 and XY.3?

denis · Accepted Answer · 2020-09-07 19:45:09Z

using base pattern matching:

you need to define a function that does what you want on one single column name:

f = function(x){
  beg <- str_extract(x,"[a-z](?=[A-Z]{2})")
  num <- which(letters == beg)
  output <- paste0(str_extract(x,"(?<=[a-z])[A-Z]{2}"),".",num)
  return(output)
}

here extract the lower case letter if you have two upper case letters after, find it position in alphabet, and paste the found number back to the upper case letters.

> f("cQL")
[1] "QL.3"

You can then use regmatches and regular expression directly on the name of your data frame:

m <- gregexpr("[a-z][A-Z]{2}", names(df),perl = T)
regmatches(names(df), m) <- lapply(regmatches(names(df), m), f)
names(df)

> names(df)
[1] "id"   "ex"   "QL.1" "QL.2" "QL.3" "ST.2" "ST.3" "XY.1" "XY.3"

It solves only the renaming part, not the the "including missing column number" part of your question

Collectives™ on Stack Overflow

How to automate renaming of columns in wide data using R

5 Answers 5

Comments

2 Comments

Comments

Edit

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

2 Comments

Comments

Edit

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related