R automatically counting values for multiple variables and multiple subsets

Question

I have a dataset where I would like to count the occurences of a number of variables for multiple subsets. Ideally, this would be automatic.

My dataset looks somewhat like this:

var1 <- c("Checked", "Checked", "Unchecked")
var2 <- c("Unchecked", "Checked", "Unchecked")
var3 <- c("Checked", "Unchecked", "Unchecked")

varA <- c("Unchecked", "Checked", "Checked")
varB <- c("Unchecked", "Checked", "Checked")
varC <- c("Checked", "Unchecked", "Checked")


dummy <- cbind(var1,var2,var3,varA,varB,varC)

For everyone who checked box var1, I would like to count the amount of "checked" boxes for varA, varB and varC. Same for var2 and var3.

The ideal end result would be a dataframe that looks somewhat like this: where the rows indicate the subsets and the rows the counts of "Checked" for varA, varB and varC respectively.

        varA   varB  varC
var1      1     1     1
var2      1     1     0
var3      0     0     1

Bonus points for being able to easily convert this to proportions (eta: of checked vs unchecked)!

I figured out I should convert the "Checked" "Unchecked" to 0 and 1, and these should be (converted to) numeric:

dummy[dummy == "Checked"] <- 1
dummy[dummy == "Unchecked"] <- 0

dummy <- as.data.frame(apply(dummy, 2, as.numeric))

dummy now looks like this, so far so good.

  var1 var2 var3 varA varB varC
1    1    0    1    0    0    1
2    1    1    0    1    1    0
3    0    0    0    1    1    1

However, now I am stuck. I can of course manually calculate the sum of columns 4:6 with the subset function and compile all of that in a new dataframe, but since my real dataset has way more variables and subsets, this is not an ideal solution.

Thanks! First Q here, so I tried to be precise but will fine-tune the Q if needed :)

Re your proportions questions. What proportions do you want? Checked compared to unchecked or proportions of checked per column? … — deschen
– deschen, Commented Feb 14, 2022 at 12:48
proportion of checked vs unchecked indeed. So for row 1 of the intended result, this would be "0.5 0.5 0.5" and for row 2 "1 1 0" row three "0 0 1" — Emma
– Emma, Commented Feb 14, 2022 at 13:14

deschen · Accepted Answer · 2022-02-14 12:46:24Z

1

You can do:

dummy <- data.frame(var1,var2,var3,varA,varB,varC)

dummy %>%
  pivot_longer(cols = matches('\\d$')) %>%
  group_by(name) %>%
  summarize(across(starts_with('var'), ~sum(. == 'Checked' & value == 'Checked')))

# A tibble: 3 x 4
  name   varA  varB  varC
  <chr> <int> <int> <int>
1 var1      1     1     1
2 var2      1     1     0
3 var3      0     0     1

answered Feb 14, 2022 at 12:46

deschen

11.6k5 gold badges32 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

R automatically counting values for multiple variables and multiple subsets

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related