2

I'm currently teaching myself R, and I'm attempting to write code to give me a population estimate of different fish species in different stretches of a stream (Site) over during different periods of time (Survey). The code I'm using to do so is this:

library(tidyverse)
library(FSA)

set.seed(1)     

#random sample dataframe
df <- data.frame(
  Survey = rep(c(1,2,3), each = 45),
  Site = rep(c("a","b","c","d","e"), times = 3, each = 9),
  Run = rep (c(1,2,3), 15, each = 3),
  Species = rep(c("brt", "rbt", "scul"), times = 45),
  Count = sample(1:15))


# get number of species caught at each run per survey per site
count_summary <- df %>% 
  group_by(Survey, Site, Run, Species) %>%
  summarise(n = sum(Count))%>%
  rename("Count" = "n") %>%
  ungroup

# SPECIFY catch data 
sp_count <- count_summary %>%
  filter(Species == "scul", Survey == 3, Site =="b")

# Make catch a vector so removal() will run
catch <- as.vector(sp_count$Count) 

# Calculate population estimate
pop_est <- removal(catch)

#pulls the relevant list, transforms to a dataframe, transposes, 
#transforms to a dataframe again
df_pop_est <- as.data.frame(t(as.data.frame(pop_est$est)))

# keeps only the relevant numbers (pop estimate, lower CI, upper CI)
pe_ci <- df_pop_est %>%
  select(No, No.UCI, No.LCI)%>%
  rename(PE = No, LCI = No.LCI, UCI = No.UCI)

pe_ci

This works fairly well, and will return a population estimate, and the upper and lower confidence interval for the estimate:

PE      UCI       LCI
pop_est$est 55 142.1708 -32.17078

However, I need to be able to compare these numbers across different sites, surveys, and species. I would like to be able to automatically run all parameter combinations, instead of having to manually change the parameters over and over in here:

# SPECIFY catch data 
sp_count <- count_summary %>%
  filter(Species == "scul", Survey == 3, Site =="b")

so I can just run it once and have a data frame of PE, UCI, LCI, for every site, survey, and species combination.

I've asked a similar question on here and was able to figure out how to do something like this for some much simpler code. But for this I have vectors and lists and dataframes, as well as whatever this is:

pop_est
$est
         No       No.se      No.LCI      No.UCI           p        p.se       p.LCI       p.UCI 
 55.0000000  44.4757062 -32.1707823 142.1707823   0.1942446   0.1949422  -0.1878351   0.5763243 

$int
 k  T  X 
 3 27 26 

$CS.prior
alpha  beta 
    1     1 

$CS.se
[1] "Zippin"

$catch
[1]  6 14  7

$lbl
[1] "Carle & Strub (1978) K-Pass Removal Method"

$method
[1] "CarleStrub"

$conf.level
[1] 0.95

attr(,"class")
[1] "removal"

all playing together, which I struggle to understand the differences of and how to work with them to begin with, so I'm tearing my hair out trying to figure out how to get all that to run repeatedly for different parameters, while also throwing the outputs in a dataframe.

Something like this is ultimately what I'm trying for, I don't know if it's possible to get it looking like this or not

Survey  Site  Species  PE      UCI       LCI
3       b     scul     55      142.1708  -32.17078
3       a     scul     x       y         z
2       a     rbt      x       y         z
1       e     brt      x       y         z
1       d     scul     x       y         z

I've googled how to loop things? Since that sounded like it might be what I'm trying to do, but that didn't clear anything up for me and I'm not sure if it's even actually what I'm looking for.

I've also tried figuring out how to make functions to see if that could help, but that didn't go well and I don't know if it would have helped me anyway.

Any thoughts are appreciated, even if it's just giving me something to look up that will help me better understand what I'm trying to do.

New contributor
Ray is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
3
  • 1
    It follows a quite similar logic, cp. your last question. Where is removal() exported from? Commented 2 days ago
  • It's from the FSA package: cran.r-project.org/web/packages/FSA/index.html Commented 2 days ago
  • Like @Friede says, try aggregate(Count ~ Survey + Site + Species, count_summary, \(x) removal(x)$est). Commented 2 days ago

2 Answers 2

3

This answer is meant to tell the OP that they don't even have to run the first dplyr pipe and create count_summary.
Run the aggregation function removal on the original data.

agg1 <- aggregate(Count ~ Survey + Site + Species, count_summary, \(x) removal(x)$est)
agg2 <- aggregate(Count ~ Survey + Site + Species, df, \(x) removal(x)$est)

identical(agg1, agg2)
#> [1] TRUE

Then, set the names as wanted.

Sign up to request clarification or add additional context in comments.

1 Comment

That's very helpful, thank you!
2

1) That would be base R

aggregate(Count~Survey+Site+Species, x=count_summary, 
          \(x) FSA::removal(x)$est[c('No', 'No.LCI', 'No.UCI')]) |>
  do.call(what='data.frame') |>
  setNames(c('Survey', 'Site', 'Species', 'PE', 'LCI', 'UCI'))
   Survey Site Species  PE         LCI       UCI
1       1    a     brt  43  -36.585786 122.58579
2       2    a     brt  43  -36.585786 122.58579
3       3    a     brt  43  -36.585786 122.58579
4       1    b     brt  61    4.371596 117.62840
5       2    b     brt  61    4.371596 117.62840
6       3    b     brt  61    4.371596 117.62840
7       1    c     brt 107 -476.879941 690.87994
8       2    c     brt 107 -476.879941 690.87994
9       3    c     brt 107 -476.879941 690.87994
10      1    d     brt  25   22.776879  27.22312
<...>

2) and I suspect in {dplyr} they do something along the lines

count_summary |>
  reframe({i = FSA::removal(Count)$est
  tibble::tibble(PE = i[['No']], LCI = i[['No.LCI']], UCI = i[['No.UCI']])}, 
  .by=c(Survey, Site, Species))
# A tibble: 45 × 6
   Survey Site  Species    PE     LCI   UCI
    <dbl> <chr> <chr>   <dbl>   <dbl> <dbl>
 1      1 a     brt        43  -36.6  123. 
 2      1 a     rbt        10    3.91  16.1
 3      1 a     scul       57  -31.6  146. 
 4      1 b     brt        61    4.37 118. 
 5      1 b     rbt        29   -1.24  59.2
 6      1 b     scul       55  -32.2  142. 
 7      1 c     brt       107 -477.   691. 
 8      1 c     rbt        19  -29.5   67.5
 9      1 c     scul       33   19.4   46.6
10      1 d     brt        25   22.8   27.2

2a) An alternative (which might be more efficient and logical)

count_summary |>
  reframe(bind_rows(FSA::removal(Count)$est),
          .by=c(Survey, Site, Species)) |>
  select(Survey:Species, PE=No, LCI=No.LCI, UCI=No.UCI)

Currently it seems to make no difference whether we apply on df or count_summary. This might be different for your actual data/workflow.

3 Comments

That works!! I've spent days trying to figure this out, thank you so much
If you are having this type of problems frequently, you should read this JSS paper by Hadley Wickham.
This is my first project using R, or any kind of code, so pretty much everything is a frequent problem to me right now. But that paper looks super helpful and like it it might help me wrap my head around some things. I'll give it a read! Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.