how to filter rows in r in a dataframe with multiple columns based on names in a column from another dataframe?

Question

I have a two dataframes of names, where dataframe one contains a single column of names whereas dataframe two contains multiple columns of names. How can I filter the second dataframe to only contain rows that have a match in the first dataframe?

ex/

df1

names
John
Joan

df2

Column A	Column B	Column C	Column D
Gerry	Jim	Brian	Joan
John	John	John	John
Ron	Greg	Sam	Maisy

desired output

names	Column A	Column B	Column C	Column D
Joan	Gerry	Jim	Brian	Joan
John	John	John	John	John

I've tried using a left_join from dplyr but the output was not what I needed

In your desired output, including the first column from df1 doesn't make a lot of sense since if you have "Joan" in column B row 2 instead of "John", and "John" instead of "Jim" in row 1, then there is no unique correspondence to the rows of df2 (you could easily argue that the first column should have "John" and "Joan" instead of "Joan" and "John"). — Edward
– Edward, Commented Sep 10, 2024 at 6:42

Edward · Accepted Answer · 2024-09-10 03:57:53Z

1

One way is to find the rows of df2 that match the names of df1 and use the unique indices to index df2.

df2[unique(unlist(lapply(1:ncol(df2), \(x) which(df2[,x] %in% df1$name)))),]

      A    B     C    D
2  John John  John John
1 Gerry  Jim Brian Joan

This gives you a filtered dataframe that "only contains rows that have a match in the first dataframe", as requested.

Data:

df1 <- structure(list(name = c("John", "Joan")), class = "data.frame", row.names = c(NA, 
-2L))

df2 <- structure(list(A = c("Gerry", "John", "Ron"), B = c("Jim", "John", 
"Greg"), C = c("Brian", "John", "Sam"), D = c("Joan", "John", 
"Maisy")), class = "data.frame", row.names = c(NA, -3L))

answered Sep 10, 2024 at 3:57

Edward

22.2k3 gold badges18 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Yuriy Saraykin · Accepted Answer · 2024-09-10 13:48:58Z

library(tidyverse)

df1 <- data.frame(
  stringsAsFactors = FALSE,
             names = c("John", "Joan")
)


df2 <- data.frame(
  stringsAsFactors = FALSE,
          Column.A = c("Gerry", "John", "Ron"),
          Column.B = c("Jim", "John", "Greg"),
          Column.C = c("Brian", "John", "Sam"),
          Column.D = c("Joan", "John", "Maisy")
)

df2 %>% 
  rownames_to_column() %>% 
  pivot_longer(-rowname) %>% 
  filter(any(df1$names %in% value), .by = rowname) %>% 
  pivot_wider(id_cols = rowname, names_from = name, values_from = value)
#> # A tibble: 2 x 5
#>   rowname Column.A Column.B Column.C Column.D
#>   <chr>   <chr>    <chr>    <chr>    <chr>   
#> 1 1       Gerry    Jim      Brian    Joan    
#> 2 2       John     John     John     John

^{Created on 2024-09-10 with reprex v2.0.2}

Collectives™ on Stack Overflow

how to filter rows in r in a dataframe with multiple columns based on names in a column from another dataframe?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related