Combine two or more columns in a dataframe into a new column with a new name

Question

For example if I have this:

n = c(2, 3, 5) 
s = c("aa", "bb", "cc") 
b = c(TRUE, FALSE, TRUE) 
df = data.frame(n, s, b)

  n  s     b
1 2 aa  TRUE
2 3 bb FALSE
3 5 cc  TRUE

Then how do I combine the two columns n and s into a new column named x such that it looks like this:

  n  s     b     x
1 2 aa  TRUE  2 aa
2 3 bb FALSE  3 bb
3 5 cc  TRUE  5 cc

thelatemail · Accepted Answer · 2013-08-07 23:46:50Z

179

Use paste.

 df$x <- paste(df$n,df$s)
 df
#   n  s     b    x
# 1 2 aa  TRUE 2 aa
# 2 3 bb FALSE 3 bb
# 3 5 cc  TRUE 5 cc

edited Aug 7, 2013 at 23:46

thelatemail

94.3k12 gold badges140 silver badges197 bronze badges

answered Aug 7, 2013 at 23:40

mnel

116k28 gold badges269 silver badges255 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Chetan Arvind Patil Over a year ago

.@thelatemail - How to add a special character between data points using paste()? For above example, x column should have data as 2-aa, then 3-bb and 5-cc.

Chetan Arvind Patil Over a year ago

.@thelatemail - This worked for me: paste(df$n,df$s,sep="-")

Cina Over a year ago

how can you omit NA if column s has NA value? (I don't like to see 3 NA if df$s[2]=NA)

zx8754 · Accepted Answer · 2019-03-27 07:31:05Z

57

For inserting a separator:

df$x <- paste(df$n, "-", df$s)

edited Mar 27, 2019 at 7:31

zx8754

56.7k12 gold badges131 silver badges229 bronze badges

answered Feb 27, 2017 at 21:10

Little Bee

1,2252 gold badges14 silver badges22 bronze badges

4 Comments

Chetan Arvind Patil Over a year ago

.@LittleBee - This adds a space between two data. Final output for example is like: A - B instead of A-B. Is it possible to remove this extra space?

Chetan Arvind Patil Over a year ago

.@LittleBee - This worked for me: paste(df$n,df$s,sep="-")

Ferroao Over a year ago

use paste0 instead of paste

Cath Over a year ago

This won't give the desired output : OP asks for a space in between the elements, not another separator (which, by the way, would be better put as the sep argument...). The other answer, posted almost 4 years prior to yours, is however perfectly answering the question.

Quentin Perrier · Accepted Answer · 2018-04-16 14:58:15Z

35

As already mentioned in comments by Uwe and UseR, a general solution in the tidyverse format would be to use the command unite:

library(tidyverse)

n = c(2, 3, 5) 
s = c("aa", "bb", "cc") 
b = c(TRUE, FALSE, TRUE) 

df = data.frame(n, s, b) %>% 
  unite(x, c(n, s), sep = " ", remove = FALSE)

answered Apr 16, 2018 at 14:58

Quentin Perrier

5564 silver badges5 bronze badges

4 Comments

Levi Over a year ago

What is x in this example?

Vesanen Over a year ago

@Levi, that x represents the name of the new column that contains the combined values. Think of dplyr's mutate: df %>% dplyr::mutate(x = "your operations")

jdcode Over a year ago

Could you please explain why mutate is incorrect but unite is correct? I think this had been explained in comments by Uwe and UseR, but I can't find seem to find those comments--I think they were deleted. Thank you!

jdenn0514 Over a year ago

@jdcode If you are still wondering, it's not that one is right and one is wrong, they simply do different things. dplyr::unite() unites multiple columns together in a new column. dplyr::mutate() just makes a new column. If you want to use dplyr::mutate() you have to use paste(), glue::glue(), or strings::str_c() inside the call. I think people may say it's "wrong" because dplyr::mutate() on its own doesn't solve OP's question. However, you can definitely use it in order to answer the question. Hope this helps!

sbha · Accepted Answer · 2018-03-10 17:17:15Z

22

Using dplyr::mutate:

library(dplyr)
df <- mutate(df, x = paste(n, s)) 

df 
> df
  n  s     b    x
1 2 aa  TRUE 2 aa
2 3 bb FALSE 3 bb
3 5 cc  TRUE 5 cc

answered Mar 10, 2018 at 17:17

sbha

10.5k2 gold badges78 silver badges64 bronze badges

4 Comments

zx8754 Over a year ago

No, as already existing answers, you are using paste, not mutate.

sbha Over a year ago

I thought I was demonstrating how columns could be combined as a part of a dplyr::mutate(). Sorry, just trying to be helpful - I won't pollute the site anymore and abstain from future postings.

zx8754 Over a year ago

Sorry, if it came out as rude. OP's problem is not solved by using mutate, question is not about how to use dplyr, but how to combine column values. I am simply pointing out that they need paste not mutate. If we want to demonstrate dplyr correct way is using the function unite.

jdcode Over a year ago

@zx8754, why mutate is incorrect but unite is correct? The answer you shared has referenced comments by Uwe and UseR, but it looks like those comments have been deleted.

Ferroao · Accepted Answer · 2019-03-11 20:32:19Z

16

Some examples with NAs and their removal using apply

n = c(2, NA, NA) 
s = c("aa", "bb", NA) 
b = c(TRUE, FALSE, NA) 
c = c(2, 3, 5) 
d = c("aa", NA, "cc") 
e = c(TRUE, NA, TRUE) 
df = data.frame(n, s, b, c, d, e)

paste_noNA <- function(x,sep=", ") {
gsub(", " ,sep, toString(x[!is.na(x) & x!="" & x!="NA"] ) ) }

sep=" "
df$x <- apply( df[ , c(1:6) ] , 1 , paste_noNA , sep=sep)
df

edited Mar 11, 2019 at 20:32

answered Dec 6, 2016 at 11:58

Ferroao

3,14834 silver badges62 bronze badges

1 Comment

malajisi Over a year ago

@Ferroao Thanks, you saved my life. pls move paste_noNA function before df$x <-apply.

zx8754 · Accepted Answer · 2019-03-27 07:33:33Z

14

We can use paste0:

df$combField <- paste0(df$x, df$y)

If you do not want any padding space introduced in the concatenated field. This is more useful if you are planning to use the combined field as a unique id that represents combinations of two fields.

edited Mar 27, 2019 at 7:33

zx8754

56.7k12 gold badges131 silver badges229 bronze badges

answered Apr 8, 2017 at 0:25

yanes

4406 silver badges11 bronze badges

Comments

avallecam · Accepted Answer · 2020-04-10 13:48:35Z

8

Instead of

paste (default spaces),
paste0 (force the inclusion of missing NA as character) or
unite (constrained to 2 columns and 1 separator),

I'd suggest an alternative as flexible as paste0 but more careful with NA: stringr::str_c

library(tidyverse)

# check the missing value!!
df <- tibble(
  n = c(2, 2, 8),
  s = c("aa", "aa", NA_character_),
  b = c(TRUE, FALSE, TRUE)
)

df %>% 
  mutate(
    paste = paste(n,"-",s,".",b),
    paste0 = paste0(n,"-",s,".",b),
    str_c = str_c(n,"-",s,".",b)
  ) %>% 

  # convert missing value to ""
  mutate(
    s_2=str_replace_na(s,replacement = "")
  ) %>% 
  mutate(
    str_c_2 = str_c(n,"-",s_2,".",b)
  )
#> # A tibble: 3 x 8
#>       n s     b     paste          paste0     str_c      s_2   str_c_2   
#>   <dbl> <chr> <lgl> <chr>          <chr>      <chr>      <chr> <chr>     
#> 1     2 aa    TRUE  2 - aa . TRUE  2-aa.TRUE  2-aa.TRUE  "aa"  2-aa.TRUE 
#> 2     2 aa    FALSE 2 - aa . FALSE 2-aa.FALSE 2-aa.FALSE "aa"  2-aa.FALSE
#> 3     8 <NA>  TRUE  8 - NA . TRUE  8-NA.TRUE  <NA>       ""    8-.TRUE

^{Created on 2020-04-10 by the reprex package (v0.3.0)}

extra note from str_c documentation

Like most other R functions, missing values are "infectious": whenever a missing value is combined with another string the result will always be missing. Use str_replace_na() to convert NA to "NA"

edited Apr 10, 2020 at 13:48

answered Aug 14, 2018 at 15:42

avallecam

6998 silver badges8 bronze badges

4 Comments

Axeman Over a year ago

paste0(n,"-",s,".",b) and str_c(n,"-",s,".",b) are exactly the same, both use a default separator that is the empty string ''. I also don't know why paste is "tidy", you mean you don't like spaces?

avallecam Over a year ago

paste0 and str_c are not exactly the same. take a look to these links: (1) rdocumentation.org/packages/stringr/versions/1.3.1/topics/str_c (2) stackoverflow.com/questions/53118271/…

Axeman Over a year ago

Ah I see! Thanks! How they are different would be a good addition to this answer (and the str_c documentation could be more explitic too!).

avallecam Over a year ago

@Axeman thanks for your suggestion. I've simplified the answer plus added an extra note on the issue

Ben Ernest · Accepted Answer · 2020-04-15 03:28:34Z

6

There are other great answers, but in the case where you don't know the column names or the number of columns you want to concatenate beforehand, the following is useful.

df = data.frame(x = letters[1:5], y = letters[6:10], z = letters[11:15])
colNames = colnames(df) # could be any number of column names here
df$newColumn = apply(df[, colNames, drop = F], MARGIN = 1, FUN = function(i) paste(i, collapse = ""))

answered Apr 15, 2020 at 3:28

Ben Ernest

5083 silver badges16 bronze badges

Comments

Iyar Lin · Accepted Answer · 2022-07-10 08:26:44Z

I'd like to also propose a method for concatenating a large/unknown number of columns. The solution proposed by Ben Ernest can be pretty slow on large datasets.

Below is my proposed solution:

# setup data.frame - Making it large for the time benchmarking
n = rep(c(2, 3, 5), 1000000)
s = rep(c("aa", "bb", "cc"), 1000000)
b = rep(c(TRUE, FALSE, TRUE), 1000000) 
df = data.frame(n, s, b)

# The proposed solution:
colNames = c("n", "s") # could be any number of column names here
df$x <- do.call(paste0, c(df[,colNames], sep=" "))

# running system.time on this yields:
# user  system elapsed 
# 1.861   0.005   1.865 

# compare with alternative method:
df$x <- apply(df[, colNames, drop = F], MARGIN = 1, 
                         FUN = function(i) paste(i, collapse = ""))
# running system.time on this yields:
# user  system elapsed 
#  16.127   0.147  16.304

Collectives™ on Stack Overflow

Combine two or more columns in a dataframe into a new column with a new name

9 Answers 9

3 Comments

4 Comments

4 Comments

4 Comments

1 Comment

Comments

4 Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

9 Answers 9

3 Comments

4 Comments

4 Comments

4 Comments

1 Comment

Comments

4 Comments

Comments

Comments

Linked

Related