I'm attempting to write some code in order to do some analyses on ecological data. What I'm currently doing is calculating the diversity index for certain stretches of a stream. I have a data frame of information for 5 stream sections (Site), over three periods of time (Survey), and the diversity is calculated from the number (Count) of each species (Species) we caught at whatever site during whatever survey.
This is the code I'm currently running.
library(tidyverse)
library(vegan)
# SHANNON DIVERSITY ----
# Species count by survey and site
count_sum_norun <- df %>%
group_by(Survey, Site, Species) %>%
summarise(Count = sum(Count))%>%
ungroup
## SPECIFY survey and site ----
sp_count <- count_sum_norun %>%
filter(Survey == 1, Site == "B")
## Calculate Diversity ----
shannon_diversity_vegan <- diversity(sp_count$Count, index="shannon")
this is an example of my starting dataframe:
Survey Site Run Species Count
<dbl> <chr> <dbl> <chr> <dbl>
1 1 A 1 rbt 1
2 1 A 1 rbt 1
3 1 A 1 rbt 1
4 1 A 1 rbt 1
5 1 A 1 rbt 1
6 1 A 1 rbt 1
7 1 A 1 rbt 1
8 1 A 1 rbt 1
9 1 A 1 rbt 1
10 1 A 1 rbt 1
# ℹ 1,963 more rows
This is the kind of dataframe that first chunk of code gives me:
A tibble: 92 × 4
Survey Site Species Count
<dbl> <chr> <chr> <dbl>
1 1 A brt 1
2 1 A cmm 1
3 1 A lnd 1
4 1 A rbt 95
5 1 A scul 92
6 1 A ws 2
7 1 B bnm 1
8 1 B brt 6
9 1 B rbt 95
10 1 B scul 35
# ℹ 82 more rows
This is the kind of dataframe I get from the second chunk of code, which specifies the section of stream, and the time period:
A tibble: 7 × 4
Survey Site Species Count
<dbl> <chr> <chr> <dbl>
1 3 B brt 294
2 3 B bsb 2
3 3 B coho 176
4 3 B rbt 381
5 3 B scul 327
6 3 B wbnd 1
7 3 B ws 5
this is the kind of output I get, calculating the diversity from the above selected data:
shannon_diversity_vegan
[1] 1.388683
All of the above is correct, and runs great.
But I have further analyses to do with the diversity data I get out of this and I'd like to be able to have a dataframe of the diversity value per every survey and site combination, so I can easily plot it against other data points I have.
Something that looks like this ideally, but with that last column being my Diversity values:
Survey Site Richness
<dbl> <chr> <int>
1 1 A 6
2 1 B 5
3 1 C 4
4 1 D 3
5 2 A 11
6 2 B 7
7 2 C 6
8 2 D 3
9 2 E 9
10 3 A 13
11 3 B 7
12 3 D 5
13 3 E 13
So my question is, is there a straightforward way to run that second chunk of code I have
## SPECIFY survey and site ----
sp_count <- count_sum_norun %>%
filter(Survey == 1, Site == "B")
over and over with the different combinations of Survey and Site filtered, so I can throw them in a nifty little dataframe to work with, instead of manually changing the parameters over and over and copy pasting the diversity values I get into excel to work with. Which is what I'm currently doing, and is super annoying and inefficient.
There seems like there should be a relatively easy way to do this, but I'm very new to R and don't know where to start to figure it out.
EDIT: I've been told that having a better dataframe in here as an example would be helpful (sorry, I did not know how to create a dataframe that would work with my code, but I've puzzled it out), so I think this code will be able to be fully run as a simplified example with some random numbers:
library(tidyverse)
library(vegan
set.seed(1)
df <- data.frame(
Species = rep(c("brt", "rbt", "scul"), times = 15),
Survey = rep(c(1,2,3), each = 15),
Site = rep(c("a","b","c","d","e"), times = 3, each = 3),
Count = sample(1:15)
)
## SPECIFY survey and site ----
sp_count_pe <- df %>%
filter(Survey == 1, Site == "b")
## Calculate Diversity ----
shannon_diversity_vegan <- diversity(sp_count_pe$Count, index="shannon")
so it's still this code section that I'm trying to be able to automatically replicate, with all the different site and survey combos, without having to do it manually:
## SPECIFY survey and site ----
sp_count_pe <- df %>%
filter(Survey == 1, Site == "b")
with the goal of being able to throw the diversity values I can get from each combination into a data frame
dput()or generate some sample data in the question itself so we can run the code and test it ourselves.count_sum_norun %>% dplyr::mutate(Diversisty = diversity(sp_count$Count, index="shannon"), .by=c(Survey, Site))does what you want?sampleto set-up example data, make sure to useset.seed()to make it reproducible.