1

I have a function to deduplicate a data frame so that each person (indexed by PatID) is represented once by the latest record (largest RecID):

dedupit <- function(x) {
        x <- x[order(x$PatID, -x$RecID),]
        x <- x[ !duplicated(x$PatID), ]
        return(x)
        }

It can deduplicate and replace a dataframe if I do:

df <- dedupit(df)

But I have multiple data frames that need deduplication. Rather than write the above code for each individual data frame, I would like to apply a the dedupit function across multiple dataframes at once so that it replaces the unduplicated dataframe with the duplicated version.

I was able to make a list of the dataframes and lapply the function across each element in the list with:

listofdifs <- list(df1, df2, ....)
listofdfs <- lapply(trial, function(x) dedupit(x))

Though, it only modifies the elements of the list and does not replace the unduplicated dataframes. How do I apply this function to modify and replace multiple dataframes?

1
  • 1
    This is the recommended way of handling multiple dataframes. Keeping them in a list is cleaner than filling your global environment with dataframes. Commented May 1, 2014 at 19:54

1 Answer 1

1

Does it work? Name your dataframes when creating the list, so you can recover them afterwards

list.df <- list(df1 = df1, df2 = df2, df3 = df3)

list2env(lapply(list.df, dedupit), .GlobalEnv)

As a result your dataframes df1, df2, df3 will be the deduplicate version.

unlist a list of dataframes

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.