There are so far as I can see several problems and misconceptions here beyond your question.
Presumably your observations are identified by interview_id speaker_num, so each group of observations is just one observation, with the result reported. I guess that you should specify interview_id only. The general idea is that you should tell by() the variable(s) that specify the groups you want.
I have four comments on your code otherwise.
- You could slim it down. The indirection of creating a local macro only to use its contents immediately afterwards serves no good purpose.
use appended_data, clear
// Speaker-Interview Level Averages and Medians
preserve
foreach var in varA varB {
collapse (mean) mean_`var' = `var' ///
(median) median_`var' = `var', ///
by(interview_id speaker_num)
}
tempfile collapsed_file
save `collapsed_file', replace
restore
You are collapsing twice, so the second collapse works on the previously collapsed file. For your intended purpose, you would I think need to read in the original data again. The restore comes too late to do that. Or (better) collapse both variables in the same command.
Creating a new file is not obviously helpful. One solution is to create new variables and tag just one observation in each group, something like this:
use appended_data, clear
// Speaker-Interview Level Averages and Medians of varA varB
foreach var in varA varB {
egen mean_`var' = mean(`var'), by(interview_id)
egen median_`var' = median(`var'), by(interview_id)
}
egen tag = tag(interview_id)
Now you can compare means and medians if tag.
- Putting results in temporary files may fit some wider strategy, but you have to be very careful. You are usually better served by using a more permanent filename.