0

I have a dataset where I would like to count the occurences of a number of variables for multiple subsets. Ideally, this would be automatic.

My dataset looks somewhat like this:

var1 <- c("Checked", "Checked", "Unchecked")
var2 <- c("Unchecked", "Checked", "Unchecked")
var3 <- c("Checked", "Unchecked", "Unchecked")

varA <- c("Unchecked", "Checked", "Checked")
varB <- c("Unchecked", "Checked", "Checked")
varC <- c("Checked", "Unchecked", "Checked")


dummy <- cbind(var1,var2,var3,varA,varB,varC) 

For everyone who checked box var1, I would like to count the amount of "checked" boxes for varA, varB and varC. Same for var2 and var3.

The ideal end result would be a dataframe that looks somewhat like this: where the rows indicate the subsets and the rows the counts of "Checked" for varA, varB and varC respectively.

        varA   varB  varC
var1      1     1     1
var2      1     1     0
var3      0     0     1

Bonus points for being able to easily convert this to proportions (eta: of checked vs unchecked)!

I figured out I should convert the "Checked" "Unchecked" to 0 and 1, and these should be (converted to) numeric:

dummy[dummy == "Checked"] <- 1
dummy[dummy == "Unchecked"] <- 0

dummy <- as.data.frame(apply(dummy, 2, as.numeric))

dummy now looks like this, so far so good.

  var1 var2 var3 varA varB varC
1    1    0    1    0    0    1
2    1    1    0    1    1    0
3    0    0    0    1    1    1

However, now I am stuck. I can of course manually calculate the sum of columns 4:6 with the subset function and compile all of that in a new dataframe, but since my real dataset has way more variables and subsets, this is not an ideal solution.

Thanks! First Q here, so I tried to be precise but will fine-tune the Q if needed :)

2
  • Re your proportions questions. What proportions do you want? Checked compared to unchecked or proportions of checked per column? … Commented Feb 14, 2022 at 12:48
  • proportion of checked vs unchecked indeed. So for row 1 of the intended result, this would be "0.5 0.5 0.5" and for row 2 "1 1 0" row three "0 0 1" Commented Feb 14, 2022 at 13:14

1 Answer 1

1

You can do:

dummy <- data.frame(var1,var2,var3,varA,varB,varC)

dummy %>%
  pivot_longer(cols = matches('\\d$')) %>%
  group_by(name) %>%
  summarize(across(starts_with('var'), ~sum(. == 'Checked' & value == 'Checked')))

# A tibble: 3 x 4
  name   varA  varB  varC
  <chr> <int> <int> <int>
1 var1      1     1     1
2 var2      1     1     0
3 var3      0     0     1
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.