Add row to a data frame with total sum for each column

Question

I have a data frame where I would like to add an additional row that totals up the values for each column. For example, Let's say I have this data:

x <- data.frame(Language=c("C++", "Java", "Python"), 
                Files=c(4009, 210, 35), 
                LOC=c(15328,876, 200), 
                stringsAsFactors=FALSE)

Data looks like this:

  Language Files   LOC
1      C++  4009 15328
2     Java   210   876
3   Python    35   200

My instinct is to do this:

y <- rbind(x, c("Total", colSums(x[,2:3])))

And this works, it computes the totals:

> y
  Language Files   LOC
1      C++  4009 15328
2     Java   210   876
3   Python    35   200
4    Total  4254 16404

The problem is that the Files and LOC columns have all been converted to strings:

> y$LOC
[1] "15328" "876"   "200"   "16404"

I understand that this is happening because I created a vector c("Total", colSums(x[,2:3]) with inputs that are both numbers and strings, and it's converting all the elements to a common type so that all of the vector elements are the same. Then the same thing happens to the Files and LOC columns.

What's a better way to do this?

Sam Firke · Accepted Answer · 2018-04-13 14:12:19Z

127

See adorn_totals() from the janitor package:

library(janitor)
x %>%
  adorn_totals("row")

#>  Language Files   LOC
#>       C++  4009 15328
#>      Java   210   876
#>    Python    35   200
#>     Total  4254 16404

The numeric columns remain of class numeric.

Disclaimer: I created this package, including adorn_totals() which is made for precisely this task.

answered Apr 13, 2018 at 14:12

Sam Firke

23.4k11 gold badges100 silver badges117 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Barry DeCicco Over a year ago

Note that one problem with this (the only one) is that it's now hard to sort by row totals, which I usually want to do. The 'Total' row ends up on top.

Matifou · Accepted Answer · 2020-11-27 20:06:00Z

85

A tidyverse way to do this would be to use bind_rows (or eventually add_row) and summarise to compute the sums. Here the issue is that we want sums for all but one, so a trick would be:

summarise_all(x, ~if(is.numeric(.)) sum(.) else "Total")

In one line:

x %>%
  bind_rows(summarise_all(., ~if(is.numeric(.)) sum(.) else "Total"))

Edit with dplyr >=1.0

One can also use across(), which is slightly more verbose in this case:

x %>%
  bind_rows(summarise(.,
                      across(where(is.numeric), sum),
                      across(where(is.character), ~"Total")))

edited Nov 27, 2020 at 20:06

answered May 14, 2018 at 2:02

Matifou

9,1024 gold badges54 silver badges63 bronze badges

8 Comments

petzi Over a year ago

Thanks, you were right: My solution was not the required answer. Your answer is a correct one. I voted you up and deleted my entry.

Mako212 Over a year ago

Nice, I appreciate keeping it in the tidyverse, seems silly to load another package just for this.

user2493970 Over a year ago

Nice answer, how is about only sum for a certain no. of columns as other columsn might not need sum but Average instead.

Matifou Over a year ago

I am afraid that if you want to use different functions on different columns, you would need to run manually summarise(var1=mean(var1), var2= sum(var2), var = "Total")

Sasha Poda Over a year ago

Nice, solution. Wondering how to write right code with across instead of summarise_all?

|

Jaap · Accepted Answer · 2020-05-12 08:41:25Z

30

Here's a way that gets you what you want, but there may very well be a more elegant solution.

rbind(x, data.frame(Language = "Total", t(colSums(x[, -1]))))

For the record, I prefer Chase's answer if you don't absolutely need the Language column.

edited May 12, 2020 at 8:41

Jaap

83.7k36 gold badges190 silver badges203 bronze badges

answered Feb 9, 2011 at 15:43

Joshua Ulrich

177k33 gold badges357 silver badges429 bronze badges

Comments

Jaap · Accepted Answer · 2020-05-12 08:43:15Z

26

Do you need the Language column in your data, or is it more appropriate to think of that column as the row.names? That would change your data.frame from 4 observations of 3 variables to 4 observations of 2 variables (Files & LOC).

x <- data.frame(Files = c(4009, 210, 35), LOC = c(15328,876, 200),
                row.names = c("C++", "Java", "Python"), stringsAsFactors = FALSE)    
x["Total" ,] <- colSums(x)


> x
       Files   LOC
C++     4009 15328
Java     210   876
Python    35   200
Total   4254 16404

edited May 12, 2020 at 8:43

Jaap

83.7k36 gold badges190 silver badges203 bronze badges

answered Feb 9, 2011 at 15:41

Chase

69.4k18 gold badges147 silver badges164 bronze badges

3 Comments

hadley Over a year ago

Personally, I don't recommend storing data in rownames - that's what variables are for!

Chase Over a year ago

In general, I agree. I also tend to follow @csgillespie's advice of not mixing raw data and summary statistics in the same object. As the OP pointed out however, it isn't really an issue in this instance since the question revolves around the presentation of data, not any further analysis.

thadk Over a year ago

What is the tidyverse equivalent?

nstjhp · Accepted Answer · 2021-06-08 10:38:09Z

14

Extending the answer of Nicolas Ratto, if you were to have a lot more columns you could use

x %>% add_row(Language = "Total", summarise(., across(where(is.numeric), sum)))

answered Jun 8, 2021 at 10:38

nstjhp

6587 silver badges14 bronze badges

3 Comments

Angelo Over a year ago

This solution is good but what we don't know at time of execution what's the name of the first column?

nstjhp Over a year ago

@Angelo Not sure how robust this is, or if there is a far simpler way, but it seems to work for this example at least x %>% add_row(!!rlang::as_name(names(.)[1]) := "Total", summarise(., across(where(is.numeric), sum)))

nstjhp Over a year ago

In fact no need for rlang::as_name() i.e. !!names(.)[1] := "Total" works

Tunaki · Accepted Answer · 2016-07-30 12:08:11Z

11

Try this

y[4,] = c("Total", colSums(y[,2:3]))

edited Jul 30, 2016 at 12:08

Tunaki

138k46 gold badges370 silver badges443 bronze badges

answered Jul 30, 2016 at 7:46

Prateek Joshi

1521 silver badge6 bronze badges

Comments

G. Grothendieck · Accepted Answer · 2017-11-09 15:36:41Z

7

If (1) we don't need the "Language" heading on the first column then we can represent it using row names and if (2) it is ok to label the last row as "Sum" rather than "Total" then we can use addmargins like this:

rownames(x) <- x$Language
addmargins(as.table(as.matrix(x[-1])), 1)

giving:

       Files   LOC
C++     4009 15328
Java     210   876
Python    35   200
Sum     4254 16404

If we do want the first column labelled "Language" and the total row labelled "Total" then its a bit longer:

rownames(x) <- x$Language
Total <- sum
xa <- addmargins(as.table(as.matrix(x[-1])), 1, FUN = Total)
data.frame(Language = rownames(xa), as.matrix(xa[]), row.names = NULL)

giving:

  Language Files   LOC
1      C++  4009 15328
2     Java   210   876
3   Python    35   200
4    Total  4254 16404

edited Nov 9, 2017 at 15:36

answered Feb 9, 2011 at 16:04

G. Grothendieck

273k18 gold badges220 silver badges365 bronze badges

Comments

Nicolas Ratto · Accepted Answer · 2020-07-24 00:55:14Z

5

Try this

library(tibble)
x %>% add_row( Language="Total",Files = sum(.$Files),LOC = sum(.$LOC) )

answered Jul 24, 2020 at 0:55

Nicolas Ratto

1451 silver badge5 bronze badges

Comments

Dharman · Accepted Answer · 2020-12-31 16:30:27Z

4

df %>% bind_rows(purrr::map_dbl(.,sum))

edited Dec 31, 2020 at 16:30

Dharman♦

33.9k27 gold badges105 silver badges157 bronze badges

answered Dec 31, 2020 at 12:31

Manish

411 bronze badge

1 Comment

BMLopes Over a year ago

Good and elegant solution, but you have to drop the first column, and then pass it to map_dbl. A way to do that is to use the [] operator. x %>% bind_rows(x[,-1] %>% map_dbl(.,sum))

csgillespie · Accepted Answer · 2011-02-09 16:33:47Z

1

Are you sure you really want to have the column totals in your data frame? To me, the data frame's interpretation now depends on the row. For example,

Rows 1-(n-1): how many files are associated with a particular language
Row n: how many files are associated with all languages

This gets more confusing if you start to subset your data. For example, suppose you want to know which languages have more than 100 Files:

> x = data.frame(Files=c(4009, 210, 35), 
                LOC=c(15328,876, 200), 
                row.names=c("C++", "Java", "Python"), 
                stringsAsFactors=FALSE)    
> x["Total" ,] = colSums(x)
> x[x$Files > 100,]
       Files   LOC
C++    4009 15328
Java    210   876
Total  4254 16404#But this refers to all languages!

The Total row is now wrong!

Personally I would work out the column sums and store them in a separate vector.

answered Feb 9, 2011 at 16:33

csgillespie

60.8k15 gold badges160 silver badges188 bronze badges

1 Comment

Lorin Hochstein Over a year ago

Typically I wouldn't do this for analysis, but this is for presentation. This is the last step before I generate a table in a LaTeX document with Sweave.

BobD59 · Accepted Answer · 2014-11-09 04:19:32Z

1

Since you mention this is a last step before exporting for presentation, you may have column names that will include spaces in them for clarity (i.e. "Grand Total"). If so, the following will insure that the created data.frame will rbind to the original dataset without an error caused by mismatched column names:

dfTotals <- data.frame(Language="Total",t(colSums(x[,-1]))))

colnames(dfTotals) <- names(x)  

rbind(x, dfTotals)

answered Nov 9, 2014 at 4:19

BobD59

2212 silver badges4 bronze badges

Comments

Brandon Bertelsen · Accepted Answer · 2011-02-09 17:32:49Z

0

Your original instinct would work if you coerced your columns to numeric:

y$LOC <- as.numeric(y$LOC)
y$Files <- as.numeric(y$Files)

And then apply colSums() and rbind().

answered Feb 9, 2011 at 17:32

Brandon Bertelsen

44.8k37 gold badges170 silver badges261 bronze badges

Collectives™ on Stack Overflow

Add row to a data frame with total sum for each column

12 Answers 12

1 Comment

Edit with dplyr >=1.0

8 Comments

Comments

3 Comments

3 Comments

Comments

Comments

Comments

1 Comment

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

12 Answers 12

1 Comment

Edit with dplyr >=1.0

8 Comments

Comments

3 Comments

3 Comments

Comments

Comments

Comments

1 Comment

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related