Replies to comments
So you've raised a couple more questions in comments. I'll add the answers here for clarity and completeness.
First up: does the channel buffer in my example limit the number of routines? The short answer is no. Earlier on, I included an example of the processURLs function with channels buffered to a size of 10. The being made here is that connecting to, and checking the certs will take longer than pushing the result on the channel, and the routine consuming said channel data will do so faster than the requests themselves (otherwise, using routines in the way you are doing doesn't make all that much sense). In my example, I'm checking all URL's in a single routine. I didn't explicitly say otherwise, but in reality, each URL would be checked in its own routine. I'll rewrite the processURLs function here, this time adding the "1 routine per URL" bit. In doing so, I'll also give an example of when you should pass an argument to an anonymous function:
func processURLs(ctx context.Context, uCh <-chan string, timeout time.Duration, verifySSL bool) (<-chan JSONSuccess, <-chan JSONErr) {
sCh, eCh := make(chan JSONSuccess, 10), make(chan JSONErr) // make channels with some buffer
wg := sync.WaitGroup{} // create a waitgroup
go func() {
defer func() {
wg.Wait() // now add the waitgroup so as to not close the channels too soon
// cleanup when this routine exists
close(sCh)
close(eCh)
}()
// read the URL's. This loop will break once the channel is both empty and closed
for URL := range uCh {
wg.Add(1) // we're adding a new routine
// start the routine for this particular URL
go func(URL string) { // takes the current value as an argument
defer wg.Done() // decrease waitgroup
select {
case <-ctx.Done():
return // the context was cancelled, indicating we're shutting down
default:
// we have a URL, check it
resp, err := checkHTTPSCert(context.WithTimeout(ctx, timeout), URL, verifySSL)
// depending on the result, write to the appropriate channel
if err != nil {
eCh <- err
} else {
sCh <- resp
}
}
}
}(URL) // pass in URL value
}()
return sCh, eCh // return channels right away
}
So we've added a waitgroup. Not to have this function wait for all the work to be done, but more because we need to make sure that we don't close the channels before all routines have returned (avoiding writes to a closed channel).
You can also see that the URL variable is passed in as an argument to our inner routine. If we didn't do this, the routine would just reference the outer URL variable, which could get reassigned before the routine gets created/executed. We need to provide the inner routine with the specific value for URL we want to use. If we didn't do this, we could end up in situations that -in pseudo code- boil down to something like this:
v := <-ch
go func() { fmt.Println(v) }()
v = <-ch // update v
<start the previously scheduled routine, which will now print out the new value of v
By masking the outer variable (URL or in the pseudo-code v), each routine has its own variable masking the outer one, set to the specific value at the point the routine was declared:
v := <-ch
go func(v1 any) { fmt.Println(v1) }(v)
v = <-ch // update v
go func(v2 any) { fmt.Println(v2) }(v)
Now we'll get the expected output. It's still possible to get the output of v2 first, and then v1, but both will be printed regardless.
Other, more niche, situations where you may want to pass in variables to pass in variables as arguments is when you're masking imports. 99% of times, this is the result of unfortunate package/variable naming, but it can sometimes be used as a safeguard, preventing you from making certain calls in the wrong place. This can be used in very rare (agian: 99% of the times, this is code smell), highly concurrent bits of code:
package foo
import (
// a bunch of packages|
"my.complex.mod/concurrent/niche"
"my.complex.mod/concurrent/danger"
)
Say the niche and danger packages should only be called in very, very special situations, in particular our danger module could contain some CGO stuff, or make use of reflection and unsafe stuff. We need to provide it with some values to work on, but we have to ensure that this only happens in one place in the code. We can protect ourselves from accidentally using these packages in the wrong place like this:
func DoStuff(niche args.Niche, danger args.Danger) {
// this function can't call function on niche and danger imports, because the variables mask the package name
// niche.SomeFuncFromPackage will look for a method on args.Niche. Same for danger.Foo()
// these variable names mask the import, but...
// some code
// we have determined we need the niche/danger packages:
ch := make(chan args.DangerRet) // unbuffered
defer close(ch)
go func(narg args.Niche, darg args.Danger) {
nRet := niche.ProcessArgs(nargs)
ch <- danger.DoStuff(darg, nRet) // call danger package and push to channel
}(niche, danger)
return <- ch // wait for routine and return
}
Situations like these are extremely rare, but sometimes you want to make sure some object isn't being used, not even read access, in any other routine (of course, you'd have to implement a bunch of other code for that), so you can then use it in a very specific way that goes beyond the normal framework the go runtime provides for safe, concurrent code. I am still using a fairly simple setup, because the point here is not the use of a go routine. I'm leveraging the routine mostly to ensure that I'm doing everything in as contained a way as possible. The overall function is still blocking, because I am using an unbuffered channel, so the routine and the return statement sync up. You could just use an anonymous function here, but considering that this is, as I said repeatedly, a very rare thing to do, chances are you'll still use a routine that will first do everything it needs to do to ensure the values you're about to operate on are not being used anywhere else, etc...