2

I have the following data frame of student records. what I want is to identify students who joined a certain program in 2014 for the first time when they were in 9th grade.

names.first<-c('a','a','b','b','c','d')
names.last<-c('c','c','z','z','f','h')
year<-c(2014,2013,2014,2015,2015,2014)
grade<-c(9,8,9,10,10,10)

df<-data.frame(names.first,names.last,year,grade)
df

To do this, I have used the following statement to say that I want students where the program year==2014 and their grade ==9.

 df$first.cohort<-ifelse(df$year==2014 & df$grade==9,1,0)
df



  names.first names.last year grade first.cohort
1           a          c 2014     9            1
2           a          c 2013     8            0
3           b          z 2014     9            1
4           b          z 2015    10            0
5           c          f 2015    10            0
6           d          h 2014    10            0

However, as you can notice this would include students who didn't enter the program in year 2014 such as student awho started in 2013. How do I create a ifelse statement where I only capture students who are in 9th grade and started the program in 2014 for the first time so that the df looks like

  names.first names.last year grade first.cohort
1           a          c 2014     9            0
2           a          c 2013     8            0
3           b          z 2014     9            1
4           b          z 2015    10            0
5           c          f 2015    10            0
6           d          h 2014    10            0

2 Answers 2

3

We can use first after arrangeing by 'name' and 'year' to create the logical expression

library(dplyr)
df %>% 
   arrange(names, year) %>% 
   group_by(names) %>% 
   mutate(first.cohort = as.integer(grade == 9 & first(year) == 2014))
# A tibble: 6 x 4
# Groups:   names [4]
#  names  year grade first.cohort
#  <fct> <dbl> <dbl>        <int>
#1 a      2013     8            0
#2 a      2014     9            0
#3 b      2014     9            1
#4 b      2015    10            0
#5 c      2015    10            0
#6 d      2014    10            0

For keeping the same order as in the input dataset, we can create a sequence column first and then do the arrange on the column after the mutate

df %>% 
   mutate(rn = row_number()) %>%
   arrange(names, year) %>% 
   group_by(names) %>% 
   mutate(first.cohort = as.integer(grade == 9 & first(year) == 2014)) %>%
   ungroup %>%
   arrange(rn) %>%
   select(-rn)

Or using the same logic with data.table that have the additional advantage of keeping the same order as in the input dataset

library(data.table)
setDT(df)[order(names, year), first.cohort := as.integer(grade == 9 &
           first(year) == 2014), names]

Update

With the new example in the OP's post, we do the grouping by both the 'names' column

df %>% 
   arrange(names.first, names.last, year) %>%
   group_by(names.first, names.last) %>%
   mutate(first.cohort = as.integer(grade == 9 & first(year) == 2014))
# A tibble: 6 x 5
# Groups:   names.first, names.last [4]
#  names.first names.last  year grade first.cohort
#  <fct>       <fct>      <dbl> <dbl>        <int>
#1 a           c           2013     8            0
#2 a           c           2014     9            0
#3 b           z           2014     9            1
#4 b           z           2015    10            0
#5 c           f           2015    10            0
#6 d           h           2014    10            0
Sign up to request clarification or add additional context in comments.

5 Comments

For data.table, rowid like with(df, rowid(names) == 1 & year == 2014 & grade == 9)
@Frank Thanks for the message. Are you getting two TRUE with that approach setDT(df)[, (rowid(names) == 1) & (year == 2014) & (grade == 9)]# [1] TRUE FALSE TRUE FALSE FALSE FALSE
@akrun my actual data set has first and last names and I updated the example to reflect that. how can I do the ordering when I have both first and last names?
@Nathan123 Updated the answer
@akrun Oh, I see, OP's data is not sorted for some reason, so ... df[order(year), v := rowid(names.first, names.last) == 1 & year == 2014 & grade == 9]
1

Using dplyr

library(dplyr)
df%>%group_by(names)%>%dplyr::mutate(Fc=as.numeric((year==2014&grade==9)&(min(year)==2014)))
# A tibble: 6 x 4
# Groups:   names [4]
   names  year grade    Fc
  <fctr> <dbl> <dbl> <dbl>
1      a  2014     9     0
2      a  2013     8     0
3      b  2014     9     1
4      b  2015    10     0
5      c  2015    10     0
6      d  2014    10     0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.