Sum up different columns for different groups

Question

I have a dataframe containing information about the activities of some organisations in different countries. The column orga contains the name of the organisations, c1 to c4 are country-columns containing the number of activities an organistion is doing in the country, and home is the organisation's country of residence. Values in home correspond to numbers in the column names of c1 to c4.

orga <- c("AA", "AB", "AC", "BA", "BB", "BC", "BD")
c1 <- c(3,1,0,0,2,0,1)
c2 <- c(0,2,2,0,1,0,1)
c3 <- c(1,0,0,1,0,2,0)
c4 <- c(0,1,1,0,0,0,0)
home <- c(1,2,3,2,1,3,1)
df <- data.frame(orga, c1, c2, c3, c4, home)

I know want to add an additional column foreign, containing information about all of an organisations foreign activities, summing up all activities mentioned in c1 to c4 but not in the column of the own country. So, the function should not sum up all the country-columns, but only the ones that are not the home-country. For example, if home=1 it should leave out c1, if home=2 leave out c2, etc.

In the example-case foreign should look like this:

df$foreign <- c(1,2,3,1,1,0,1)

Is there a way to sum up columns for different groups, leaving out a different column for every group, and add the sums as new column to a dataframe?

I already looked at the group by function of the dplyr-package, as well as aggregate and tapply in base-r, but couldn't come up with a solution. I would thus very much appreciate your help. Thank you!

Did you got answer to your question ? If yes, you can select the answer as answered. — bhansa
– bhansa, Commented Mar 18, 2017 at 10:32

Sotos · Accepted Answer · 2017-03-18 10:25:40Z

3

One way to do it using rowSums,

diag(as.matrix(rowSums(df[2:5])- df[2:5][df$home]))
#[1] 1 2 3 1 1 0 1

answered Mar 18, 2017 at 10:25

Sotos

51.6k6 gold badges35 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

www · Accepted Answer · 2017-03-18 11:33:41Z

1

Here is a solution using the dplyr and tidyr package.

library(dplyr)
library(tidyr)

df2 <- df %>%
  # Change the home column from number to character,
  # Make the ID (c1, c2, c3, c4) consistent to the column names from c1 to c4
  mutate(home = paste0("c", home)) %>%
  # Convert the data frame from wide format to long format
  # activity contains the columns names from c1 to c4 as labels
  # number is the original number for each
  gather(activity, number, -orga, -home) %>%
  # Remove rows when home and activity number are the same
  filter(home != activity) %>%
  # Group by the organization
  group_by(orga) %>%
  # Calculate the total number of activities, call it foreign
  summarise(foreign = sum(number)) %>%
  # Join the results back with df by organization
  left_join(df, by = "orga") %>%
  # Re-organiza the column
  select(orga, c1:home, foreign)

Here is the end result. The information you want is in the foreign column of the data frame df2.

# A tibble: 7 × 7
    orga    c1    c2    c3    c4  home foreign
  <fctr> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
1     AA     3     0     1     0     1       1
2     AB     1     2     0     1     2       2
3     AC     0     2     0     1     3       3
4     BA     0     0     1     0     2       1
5     BB     2     1     0     0     1       1
6     BC     0     0     2     0     3       0
7     BD     1     1     0     0     1       1

edited Mar 18, 2017 at 11:33

answered Mar 18, 2017 at 11:22

www

39.3k12 gold badges52 silver badges93 bronze badges

1 Comment

uyanik Over a year ago

This is brilliant as it seems the most flexibel solution to me. Thank you for the nice explanation!

akrun · Accepted Answer · 2017-03-18 11:06:07Z

1

Here is another option using rowSums. Using row/column indexing, we replace the values to NA in a copy of the dataset and then with rowSums and na.rm=TRUE get the sum of the rows to exclude the 'home' column

df1 <- df
df1[-1][cbind(1:nrow(df), df$home)] <- NA
df$foreign <- rowSums(df1[2:5],na.rm=TRUE) 
df$foreign
#[1] 1 2 3 1 1 0 1

Or using apply

df$foreign <- apply(df[-1], 1, function(x) sum(head(x, -1)[-x[5]]))
df$foreign
#[1] 1 2 3 1 1 0 1

answered Mar 18, 2017 at 11:06

akrun

891k38 gold badges590 silver badges700 bronze badges

Collectives™ on Stack Overflow

Sum up different columns for different groups

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related