1

I'm attempting to write some code in order to do some analyses on ecological data. What I'm currently doing is calculating the diversity index for certain stretches of a stream. I have a data frame of information for 5 stream sections (Site), over three periods of time (Survey), and the diversity is calculated from the number (Count) of each species (Species) we caught at whatever site during whatever survey.

This is the code I'm currently running.

library(tidyverse)
library(vegan)

# SHANNON DIVERSITY ----

# Species count by survey and site
count_sum_norun <- df %>% 
  group_by(Survey, Site, Species) %>%
  summarise(Count = sum(Count))%>%
  ungroup

## SPECIFY survey and site ---- 
sp_count <- count_sum_norun %>%
  filter(Survey == 1, Site == "B")

## Calculate Diversity ----
shannon_diversity_vegan <- diversity(sp_count$Count, index="shannon")

this is an example of my starting dataframe:

Survey Site    Run Species Count
    <dbl> <chr> <dbl> <chr>   <dbl>
 1      1 A         1 rbt         1
 2      1 A         1 rbt         1
 3      1 A         1 rbt         1
 4      1 A         1 rbt         1
 5      1 A         1 rbt         1
 6      1 A         1 rbt         1
 7      1 A         1 rbt         1
 8      1 A         1 rbt         1
 9      1 A         1 rbt         1
10      1 A         1 rbt         1
# ℹ 1,963 more rows

This is the kind of dataframe that first chunk of code gives me:

A tibble: 92 × 4
   Survey Site  Species Count
    <dbl> <chr> <chr>   <dbl>
 1      1 A     brt         1
 2      1 A     cmm         1
 3      1 A     lnd         1
 4      1 A     rbt        95
 5      1 A     scul       92
 6      1 A     ws          2
 7      1 B     bnm         1
 8      1 B     brt         6
 9      1 B     rbt        95
10      1 B     scul       35
# ℹ 82 more rows

This is the kind of dataframe I get from the second chunk of code, which specifies the section of stream, and the time period:

 A tibble: 7 × 4
  Survey Site  Species Count
   <dbl> <chr> <chr>   <dbl>
1      3 B     brt       294
2      3 B     bsb         2
3      3 B     coho      176
4      3 B     rbt       381
5      3 B     scul      327
6      3 B     wbnd        1
7      3 B     ws          5

this is the kind of output I get, calculating the diversity from the above selected data:

shannon_diversity_vegan
[1] 1.388683

All of the above is correct, and runs great.

But I have further analyses to do with the diversity data I get out of this and I'd like to be able to have a dataframe of the diversity value per every survey and site combination, so I can easily plot it against other data points I have.

Something that looks like this ideally, but with that last column being my Diversity values:

 Survey Site  Richness
    <dbl> <chr>   <int>
 1      1 A           6
 2      1 B           5
 3      1 C           4
 4      1 D           3
 5      2 A          11
 6      2 B           7
 7      2 C           6
 8      2 D           3
 9      2 E           9
10      3 A          13
11      3 B           7
12      3 D           5
13      3 E          13

So my question is, is there a straightforward way to run that second chunk of code I have

## SPECIFY survey and site ---- 
sp_count <- count_sum_norun %>%
  filter(Survey == 1, Site == "B")

over and over with the different combinations of Survey and Site filtered, so I can throw them in a nifty little dataframe to work with, instead of manually changing the parameters over and over and copy pasting the diversity values I get into excel to work with. Which is what I'm currently doing, and is super annoying and inefficient.

There seems like there should be a relatively easy way to do this, but I'm very new to R and don't know where to start to figure it out.

EDIT: I've been told that having a better dataframe in here as an example would be helpful (sorry, I did not know how to create a dataframe that would work with my code, but I've puzzled it out), so I think this code will be able to be fully run as a simplified example with some random numbers:

library(tidyverse)
library(vegan

set.seed(1)     

df <- data.frame(
  Species = rep(c("brt", "rbt", "scul"), times = 15),
  Survey = rep(c(1,2,3), each = 15),
  Site = rep(c("a","b","c","d","e"), times = 3, each = 3),
  Count = sample(1:15)
)

## SPECIFY survey and site ---- 
sp_count_pe <- df %>%
  filter(Survey == 1, Site == "b")


## Calculate Diversity ----
shannon_diversity_vegan <- diversity(sp_count_pe$Count, index="shannon")

so it's still this code section that I'm trying to be able to automatically replicate, with all the different site and survey combos, without having to do it manually:

## SPECIFY survey and site ---- 
sp_count_pe <- df %>%
  filter(Survey == 1, Site == "b")

with the goal of being able to throw the diversity values I can get from each combination into a data frame

New contributor
Ray is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
5
  • 1
    It's easier to help you if you include a simple reproducible example with sample input and desired output that can be used to test and verify possible solutions. Share data with dput() or generate some sample data in the question itself so we can run the code and test it ourselves. Commented Nov 18 at 18:54
  • 1
    Maybe count_sum_norun %>% dplyr::mutate(Diversisty = diversity(sp_count$Count, index="shannon"), .by=c(Survey, Site)) does what you want? Commented Nov 18 at 19:00
  • @MrFlick Thank you, I've edited my post and added modified code at the end so hopefully it can be run like you asked. I'll try your suggestion to see if it helps any. Commented Nov 18 at 20:37
  • If you use sample to set-up example data, make sure to use set.seed() to make it reproducible. Commented Nov 18 at 20:49
  • @Friede Ok, thank you! Commented Nov 18 at 21:13

1 Answer 1

3

Aggregation based on combinations present.

# base R
> aggregate(cbind(div_index=Count)~Survey+Site, X,  
+           vegan::diversity, index='shannon')
  Survey Site div_index
1      1    A 0.9002561
2      2    A 1.0114043
3      3    A 0.6365142
4      1    B 0.6829081
5      2    B 0.5004024
6      3    B 0.5982696
7      1    C 0.9002561
8      2    C 0.6730117
9      3    D 0.0000000
> 
> # dplyr
> dplyr::summarise(X, div_index=vegan::diversity(Count, index='shannon'), 
+                  .by=c(Survey, Site))
# A tibble: 9 × 3
  Survey Site div_index
   <dbl> <chr>    <dbl>
1      1 A        0.900
2      1 B        0.683
3      1 C        0.900
4      2 A        1.01 
5      2 B        0.500
6      2 C        0.673
7      3 A        0.637
8      3 B        0.598
9      3 D        0  

Constructed Data

X = tibble::tribble(
  ~Survey, ~Site, ~Species, ~Count,
  1, "A", "rbt", 5,
  1, "A", "bnc", 2,
  1, "A", "sts", 1,
  1, "B", "rbt", 3,
  1, "B", "scu", 4,
  1, "C", "rbt", 1,
  1, "C", "scb", 2,
  1, "C", "sts", 5,
  2, "A", "rbt", 2,
  2, "A", "bnc", 1,
  2, "A", "scu", 3,
  2, "B", "rbt", 4,
  2, "B", "scb", 1,
  2, "C", "sts", 3,
  2, "C", "scu", 2,
  3, "A", "rbt", 1,
  3, "A", "bnc", 2,
  3, "B", "rbt", 2,
  3, "B", "scb", 5,
  3, "D", "sts", 3
)

(If this is identified as simple aggregation question it should be a duplicate. Question is about an hour old and no dupe votes.)

Sign up to request clarification or add additional context in comments.

1 Comment

This worked for me, thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.