21

I want to create a new column in a data.frame where its value is equal to the value in another data.frame where a particular condition is satisfied between two columns in each data frame.

The R pseudo-code being something like this:

DF1$Activity <- DF2$Activity where DF2$NAME == DF1$NAME

In each data.frame values for $NAME are unique in the column.

2
  • 4
    Please share a minimal reproducible example along with the expected output. Commented Jan 11, 2016 at 4:25
  • Okay, for this problem, you can do some sort of operation in R that will give the expected output. Commented Jan 11, 2016 at 5:31

3 Answers 3

26

Use the ifelse function. Here, I put NA when the condition is not met. However, you may choose any value or values from any vector. Recycling rules1 apply.

DF1$Activity_new <- ifelse(DF2$NAME == DF1$NAME, DF2$Activity, NA)
Sign up to request clarification or add additional context in comments.

2 Comments

I think OP wants to not change the left-hand column when the condition is not met. So the NA should be DF1$Activity
He said it was to be "a new column".
7

I'm not sure this one actually needs an example. What happens when you create a column with a set of NA values and then assign the required rows with the same logical vector on both sides:

DF1$Activity <- NA
DF1$Activity[DF2$NAME == DF1$NAME] <- DF2$Activity[DF2$NAME == DF1$NAME]

5 Comments

I think the OP is asking for a more canonical way to write this, ideally without repeating the condition.
The lack of reproducible example is damnable, but my understanding is that line 1 isn't necessary. Activity already exists in DF1, and DF1 and DF2 are the same size. This is more of a "case where" solution in SQL than a join. This should the canonical R answer.
@AdamO I thought (7 years ago) that an assignment to a non-existent column with indexing might fail, but I guess I was wrong about that. I just convinced myself that you were correct.
@IRTFM most certainly a useful feature, although built-in redundancies such as assigning "NA" or 0 or "" are still good ideas. The point is... regarding this mystery application... I believe the OP had a pre-existing value for ACTIVITY in DF1, so the problem boils down to re-assigning some values but not others. Trivial R. In fact, ifelse as proposed in the top answer, while intuitive, is not optimized and can bog down R very badly.
We can agree to disagree about the stated underlying situation. I still read the request as implying that there was no Activity column in DF1.
4

without an example its quite hard to tell. But from your description it sounds like a base::merge or dplyr::inner_join operation. Those are quite fast in comparison to if statements.

Cheers

2 Comments

Both methods (base::merge and dplyr::inner_join) worked. Had a slight hiccup with incorrect column names, but those were resolved with rename(DF2, c("NAME"="xy.NAME")) before merging and using by="xy.NAME" in the merge call or using the by parameter for the inner_join i.e. ij <- inner_join(DF1, DF2, by = c("xy.NAME" = "NAME")).
merge is going to give you an "adventure" when DF1$NAME = c('a', 'a') and DF2$NAME = c('a', 'a').

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.