0

I have a two dataframes of names, where dataframe one contains a single column of names whereas dataframe two contains multiple columns of names. How can I filter the second dataframe to only contain rows that have a match in the first dataframe?

ex/

df1

names
John
Joan

df2

Column A Column B Column C Column D
Gerry Jim Brian Joan
John John John John
Ron Greg Sam Maisy

desired output

names Column A Column B Column C Column D
Joan Gerry Jim Brian Joan
John John John John John

I've tried using a left_join from dplyr but the output was not what I needed

1
  • 1
    In your desired output, including the first column from df1 doesn't make a lot of sense since if you have "Joan" in column B row 2 instead of "John", and "John" instead of "Jim" in row 1, then there is no unique correspondence to the rows of df2 (you could easily argue that the first column should have "John" and "Joan" instead of "Joan" and "John"). Commented Sep 10, 2024 at 6:42

2 Answers 2

1

One way is to find the rows of df2 that match the names of df1 and use the unique indices to index df2.

df2[unique(unlist(lapply(1:ncol(df2), \(x) which(df2[,x] %in% df1$name)))),]

      A    B     C    D
2  John John  John John
1 Gerry  Jim Brian Joan

This gives you a filtered dataframe that "only contains rows that have a match in the first dataframe", as requested.


Data:

df1 <- structure(list(name = c("John", "Joan")), class = "data.frame", row.names = c(NA, 
-2L))

df2 <- structure(list(A = c("Gerry", "John", "Ron"), B = c("Jim", "John", 
"Greg"), C = c("Brian", "John", "Sam"), D = c("Joan", "John", 
"Maisy")), class = "data.frame", row.names = c(NA, -3L))
Sign up to request clarification or add additional context in comments.

Comments

0
library(tidyverse)

df1 <- data.frame(
  stringsAsFactors = FALSE,
             names = c("John", "Joan")
)


df2 <- data.frame(
  stringsAsFactors = FALSE,
          Column.A = c("Gerry", "John", "Ron"),
          Column.B = c("Jim", "John", "Greg"),
          Column.C = c("Brian", "John", "Sam"),
          Column.D = c("Joan", "John", "Maisy")
)

df2 %>% 
  rownames_to_column() %>% 
  pivot_longer(-rowname) %>% 
  filter(any(df1$names %in% value), .by = rowname) %>% 
  pivot_wider(id_cols = rowname, names_from = name, values_from = value)
#> # A tibble: 2 x 5
#>   rowname Column.A Column.B Column.C Column.D
#>   <chr>   <chr>    <chr>    <chr>    <chr>   
#> 1 1       Gerry    Jim      Brian    Joan    
#> 2 2       John     John     John     John

Created on 2024-09-10 with reprex v2.0.2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.