1

I have a dataframe containing information about the activities of some organisations in different countries. The column orga contains the name of the organisations, c1 to c4 are country-columns containing the number of activities an organistion is doing in the country, and home is the organisation's country of residence. Values in home correspond to numbers in the column names of c1 to c4.

orga <- c("AA", "AB", "AC", "BA", "BB", "BC", "BD")
c1 <- c(3,1,0,0,2,0,1)
c2 <- c(0,2,2,0,1,0,1)
c3 <- c(1,0,0,1,0,2,0)
c4 <- c(0,1,1,0,0,0,0)
home <- c(1,2,3,2,1,3,1)
df <- data.frame(orga, c1, c2, c3, c4, home)

I know want to add an additional column foreign, containing information about all of an organisations foreign activities, summing up all activities mentioned in c1 to c4 but not in the column of the own country. So, the function should not sum up all the country-columns, but only the ones that are not the home-country. For example, if home=1 it should leave out c1, if home=2 leave out c2, etc.

In the example-case foreign should look like this:

df$foreign <- c(1,2,3,1,1,0,1)

Is there a way to sum up columns for different groups, leaving out a different column for every group, and add the sums as new column to a dataframe?

I already looked at the group by function of the dplyr-package, as well as aggregate and tapply in base-r, but couldn't come up with a solution. I would thus very much appreciate your help. Thank you!

1
  • Did you got answer to your question ? If yes, you can select the answer as answered. Commented Mar 18, 2017 at 10:32

3 Answers 3

3

One way to do it using rowSums,

diag(as.matrix(rowSums(df[2:5])- df[2:5][df$home]))
#[1] 1 2 3 1 1 0 1
Sign up to request clarification or add additional context in comments.

Comments

1

Here is a solution using the dplyr and tidyr package.

library(dplyr)
library(tidyr)

df2 <- df %>%
  # Change the home column from number to character,
  # Make the ID (c1, c2, c3, c4) consistent to the column names from c1 to c4
  mutate(home = paste0("c", home)) %>%
  # Convert the data frame from wide format to long format
  # activity contains the columns names from c1 to c4 as labels
  # number is the original number for each
  gather(activity, number, -orga, -home) %>%
  # Remove rows when home and activity number are the same
  filter(home != activity) %>%
  # Group by the organization
  group_by(orga) %>%
  # Calculate the total number of activities, call it foreign
  summarise(foreign = sum(number)) %>%
  # Join the results back with df by organization
  left_join(df, by = "orga") %>%
  # Re-organiza the column
  select(orga, c1:home, foreign)

Here is the end result. The information you want is in the foreign column of the data frame df2.

# A tibble: 7 × 7
    orga    c1    c2    c3    c4  home foreign
  <fctr> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
1     AA     3     0     1     0     1       1
2     AB     1     2     0     1     2       2
3     AC     0     2     0     1     3       3
4     BA     0     0     1     0     2       1
5     BB     2     1     0     0     1       1
6     BC     0     0     2     0     3       0
7     BD     1     1     0     0     1       1

1 Comment

This is brilliant as it seems the most flexibel solution to me. Thank you for the nice explanation!
1

Here is another option using rowSums. Using row/column indexing, we replace the values to NA in a copy of the dataset and then with rowSums and na.rm=TRUE get the sum of the rows to exclude the 'home' column

df1 <- df
df1[-1][cbind(1:nrow(df), df$home)] <- NA
df$foreign <- rowSums(df1[2:5],na.rm=TRUE) 
df$foreign
#[1] 1 2 3 1 1 0 1

Or using apply

df$foreign <- apply(df[-1], 1, function(x) sum(head(x, -1)[-x[5]]))
df$foreign
#[1] 1 2 3 1 1 0 1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.