Return to Answer

added 6821 characters in body

Source Link

edited Nov 18, 2022 at 10:47

Replies to comments

So you've raised a couple more questions in comments. I'll add the answers here for clarity and completeness.

First up: does the channel buffer in my example limit the number of routines? The short answer is no. Earlier on, I included an example of the processURLs function with channels buffered to a size of 10. The being made here is that connecting to, and checking the certs will take longer than pushing the result on the channel, and the routine consuming said channel data will do so faster than the requests themselves (otherwise, using routines in the way you are doing doesn't make all that much sense). In my example, I'm checking all URL's in a single routine. I didn't explicitly say otherwise, but in reality, each URL would be checked in its own routine. I'll rewrite the processURLs function here, this time adding the "1 routine per URL" bit. In doing so, I'll also give an example of when you should pass an argument to an anonymous function:

 func processURLs(ctx context.Context, uCh <-chan string, timeout time.Duration, verifySSL bool) (<-chan JSONSuccess, <-chan JSONErr) {
    sCh, eCh := make(chan JSONSuccess, 10), make(chan JSONErr) // make channels with some buffer
    wg := sync.WaitGroup{} // create a waitgroup
    go func() {
        defer func() {
            wg.Wait() // now add the waitgroup so as to not close the channels too soon
            // cleanup when this routine exists
            close(sCh)
            close(eCh)
        }()
        // read the URL's. This loop will break once the channel is both empty and closed
        for URL := range uCh {
            wg.Add(1) // we're adding a new routine
            // start the routine for this particular URL
            go func(URL string) { // takes the current value as an argument
                defer wg.Done() // decrease waitgroup
                select {
                case <-ctx.Done():
                    return // the context was cancelled, indicating we're shutting down
                default:
                    // we have a URL, check it
                    resp, err := checkHTTPSCert(context.WithTimeout(ctx, timeout), URL, verifySSL)
                    // depending on the result, write to the appropriate channel
                    if err != nil {
                        eCh <- err
                    } else {
                        sCh <- resp
                    }
                }
            }
        }(URL) // pass in URL value
    }()
    return sCh, eCh // return channels right away
}

So we've added a waitgroup. Not to have this function wait for all the work to be done, but more because we need to make sure that we don't close the channels before all routines have returned (avoiding writes to a closed channel).

You can also see that the URL variable is passed in as an argument to our inner routine. If we didn't do this, the routine would just reference the outer URL variable, which could get reassigned before the routine gets created/executed. We need to provide the inner routine with the specific value for URL we want to use. If we didn't do this, we could end up in situations that -in pseudo code- boil down to something like this:

v := <-ch
go func() { fmt.Println(v) }()
v = <-ch // update v
<start the previously scheduled routine, which will now print out the new value of v

By masking the outer variable (URL or in the pseudo-code v), each routine has its own variable masking the outer one, set to the specific value at the point the routine was declared:

v := <-ch
go func(v1 any) { fmt.Println(v1) }(v)
v = <-ch // update v
go func(v2 any) { fmt.Println(v2) }(v)

Now we'll get the expected output. It's still possible to get the output of v2 first, and then v1, but both will be printed regardless.

Other, more niche, situations where you may want to pass in variables to pass in variables as arguments is when you're masking imports. 99% of times, this is the result of unfortunate package/variable naming, but it can sometimes be used as a safeguard, preventing you from making certain calls in the wrong place. This can be used in very rare (agian: 99% of the times, this is code smell), highly concurrent bits of code:

package foo

import (
    // a bunch of packages|
    "my.complex.mod/concurrent/niche"
    "my.complex.mod/concurrent/danger"
)

Say the niche and danger packages should only be called in very, very special situations, in particular our danger module could contain some CGO stuff, or make use of reflection and unsafe stuff. We need to provide it with some values to work on, but we have to ensure that this only happens in one place in the code. We can protect ourselves from accidentally using these packages in the wrong place like this:

func DoStuff(niche args.Niche, danger args.Danger) {
    // this function can't call function on niche and danger imports, because the variables mask the package name
    // niche.SomeFuncFromPackage will look for a method on args.Niche. Same for danger.Foo()
    // these variable names mask the import, but...

    // some code
    // we have determined we need the niche/danger packages:
    ch := make(chan args.DangerRet) // unbuffered
    defer close(ch)
    go func(narg args.Niche, darg args.Danger) {
        nRet := niche.ProcessArgs(nargs)
        ch <- danger.DoStuff(darg, nRet) // call danger package and push to channel
    }(niche, danger)
    return <- ch // wait for routine and return
}

Situations like these are extremely rare, but sometimes you want to make sure some object isn't being used, not even read access, in any other routine (of course, you'd have to implement a bunch of other code for that), so you can then use it in a very specific way that goes beyond the normal framework the go runtime provides for safe, concurrent code. I am still using a fairly simple setup, because the point here is not the use of a go routine. I'm leveraging the routine mostly to ensure that I'm doing everything in as contained a way as possible. The overall function is still blocking, because I am using an unbuffered channel, so the routine and the return statement sync up. You could just use an anonymous function here, but considering that this is, as I said repeatedly, a very rare thing to do, chances are you'll still use a routine that will first do everything it needs to do to ensure the values you're about to operate on are not being used anywhere else, etc...

Replies to comments

So you've raised a couple more questions in comments. I'll add the answers here for clarity and completeness.

 func processURLs(ctx context.Context, uCh <-chan string, timeout time.Duration, verifySSL bool) (<-chan JSONSuccess, <-chan JSONErr) {
    sCh, eCh := make(chan JSONSuccess, 10), make(chan JSONErr) // make channels with some buffer
    wg := sync.WaitGroup{} // create a waitgroup
    go func() {
        defer func() {
            wg.Wait() // now add the waitgroup so as to not close the channels too soon
            // cleanup when this routine exists
            close(sCh)
            close(eCh)
        }()
        // read the URL's. This loop will break once the channel is both empty and closed
        for URL := range uCh {
            wg.Add(1) // we're adding a new routine
            // start the routine for this particular URL
            go func(URL string) { // takes the current value as an argument
                defer wg.Done() // decrease waitgroup
                select {
                case <-ctx.Done():
                    return // the context was cancelled, indicating we're shutting down
                default:
                    // we have a URL, check it
                    resp, err := checkHTTPSCert(context.WithTimeout(ctx, timeout), URL, verifySSL)
                    // depending on the result, write to the appropriate channel
                    if err != nil {
                        eCh <- err
                    } else {
                        sCh <- resp
                    }
                }
            }
        }(URL) // pass in URL value
    }()
    return sCh, eCh // return channels right away
}

v := <-ch
go func() { fmt.Println(v) }()
v = <-ch // update v
<start the previously scheduled routine, which will now print out the new value of v

By masking the outer variable (URL or in the pseudo-code v), each routine has its own variable masking the outer one, set to the specific value at the point the routine was declared:

v := <-ch
go func(v1 any) { fmt.Println(v1) }(v)
v = <-ch // update v
go func(v2 any) { fmt.Println(v2) }(v)

Now we'll get the expected output. It's still possible to get the output of v2 first, and then v1, but both will be printed regardless.

package foo

import (
    // a bunch of packages|
    "my.complex.mod/concurrent/niche"
    "my.complex.mod/concurrent/danger"
)

func DoStuff(niche args.Niche, danger args.Danger) {
    // this function can't call function on niche and danger imports, because the variables mask the package name
    // niche.SomeFuncFromPackage will look for a method on args.Niche. Same for danger.Foo()
    // these variable names mask the import, but...

    // some code
    // we have determined we need the niche/danger packages:
    ch := make(chan args.DangerRet) // unbuffered
    defer close(ch)
    go func(narg args.Niche, darg args.Danger) {
        nRet := niche.ProcessArgs(nargs)
        ch <- danger.DoStuff(darg, nRet) // call danger package and push to channel
    }(niche, danger)
    return <- ch // wait for routine and return
}

Source Link

answered Nov 17, 2022 at 19:31

Elias Van Ootegem

At first glance

Having just skimmed some of the code you posted, a couple of things jumped out right away:

printHelp: If I were to provide a description of what my tool does, I'd call the flag.Usage function, and I'd certainly avoid multiple calls to fmt.Println or fmt.Printf. Golang has multi-line strings (using backtick as delimiter)
runtime.GOMAXPROCS(int(maxThreads)) is something that is fairly common in code written by people relatively new to golang. There seems to be some uncertainty/confusion as to what go routines actually are, and how much of a say you get in whether a routine spawns a thread or not. The long and short of it is that you don't get too much of a say. GOMAXPROCS can still result in more threads being spawned (due to syscalls). It's useful if you want to run a process in the background without hogging too many resources, but for CLI tools that you run and wait for a result, there's really little to no point to doing this. The go runtime isn't perfect, but it handles concurrency very well, and yes, that's concurrency: not all routine is limited to its own thread, so setting GOMAXPROCS will not limit the number of routines that are concurrently running anyway.
Coding standards are important. Back in the early days a lot of people criticised golang for being "too opinionated" (the whole gofmt enforcing tabs, K&R style brackets etc...). This has proven to be a great thing, increasing readability and collaboration across the ecosystem. Aside from the gofmt stuff, it's therefore strongly advised to stick to the Go codereview comments. Most notably: initialisms (which I see scattered throughout your code) should be capitalised accordingly: Not Url or Json, but rather URL and JSON. You can see this in golang's own standard library in places like net/http with functions like ListenAndServeTLS
There are more substantial issues which I'll cover next, most notably in the rather unfortunately named processUrlsParallel. Unfortunate because Parallel should be concurrent, and the initialism should be URLs. The bigger issue there though, is what we'll tackle now

Use of buffered channels

The processUrlsParallel function uses buffered channels. That's great. buffered channels are awesome, and very useful. However, you're resorting to using them because, from the callers' point of view, processUrlsParallel is not a parallel call, but rather a blocking call. You're returning channels, but by the time the caller can set to work with them, they've been filled to the brim already. Why bother? Why not return 2 slices instead? Functionally, they are interchangeable in your case. Using channels here is perfectly valid, but I'd change a couple of things:

I'd return directional channels, in this case I'd return channels that can only be used to read from. This serves to write self-documenting code, and in larger projects avoids silly bugs caused by someone mistakenly writing some value onto the wrong channel.
The URL's are CLI args, so you can pass them in as a slice, but if you were to read them from a file, you might want to pass them through a channel of their own. This will allow you to start processing the URL's while reading them. The function would then look something like this:
```
func processURLs(ctx context.Context, uCh <-chan string, timeout time.Duration, verifySSL bool) (<-chan JSONSuccess, <-chan JSONErr)
```

Now you'll notice I've shortened the variable names. This is common in go code. I personally like it this way, because it becomes a barrier that makes it harder to write incredibly long and verbose functions. It keeps my code tidier. The timeout type likewise communicates to the caller, and more importantly reader/maintainer what this value is meant to express. The channels are all directional, which immediately tells me this function will read data from the uCh channel, and return 2 channels that I'm expected to read from. Now because I'm passing in a channel, it's impossible to do the same thing you are doing (which is to use a waitgroup, fill the channel buffers and close the channels). I have no way to know how many URL's I'm expected to process. Not to worry, we have a context, and we know we're done once our uCh is closed and empty. So let's implement that:

 func processURLs(ctx context.Context, uCh <-chan string, timeout time.Duration, verifySSL bool) (<-chan JSONSuccess, <-chan JSONErr) {
    sCh, eCh := make(chan JSONSuccess, 10), make(chan JSONErr) // make channels with some buffer
    go func() {
        defer func() {
            // cleanup when this routine exists
            close(sCh)
            close(eCh)
        }()
        // read the URL's. This loop will break once the channel is both empty and closed
        for URL := range uCh {
            select {
            case <-ctx.Done():
                return // the context was cancelled, indicating we're shutting down
            default:
                // we have a URL, check it
                resp, err := checkHTTPSCert(context.WithTimeout(ctx, timeout), URL, verifySSL)
                // depending on the result, write to the appropriate channel
                if err != nil {
                    eCh <- err
                } else {
                    sCh <- resp
                }
            }
        }
    }()
    return sCh, eCh // return channels right away
}

That's it. Nice and easy. To use a slice, you can just replace the argument, and the loop in the routine, but the rest works all the same. If you want to support both, you can just create a helper function that makes a channel, and pushes the slice on there.

As far as the function itself goes, I suppose the biggest difference is here:

checkHTTPSCert(context.WithTimeout(ctx, timeout), URL, verifySSL)

Rather than passing through the timeout value, I'm providing a context, wrapped around the context that this function is using, and configure it to have a timeout. When using the net/http package (and indeed many other packages that handle connections), a cancelled context (or a timed-out one) will result in pending requests/transactions getting cancelled. We don't really have to worry about that anymore. What's more: these contexts are wrapped. If the caller (in your case the main function) passes in a context that will be cancelled in case the application received a KILL/TERM signal, that context cancellation is propagated automatically. ongoing requests, routines, etc... can all be notified and gracefully return. For a CLI tool like this, it's not a massive deal, but again: for larger projects, this quickly becomes a necessity. Moving on to your checkHttpsCertificate function (which receives but seemingly doesn't use the timeout argument!):

Other issues

The bulk of this function could be rewritten to be a bit nicer, and I might go through in a bit more detail should I find some more spare time, but for now I'll point to this particular function to point some gripes I have looking at your code. Some of these issues are found throughout your code, like not retunring early or having pointless else clauses:

return early - only if, no else

This function in particular is rather smelly, the way it ends with:

if len(connectionState().PeerCertificates) > 0 {
    // Relies on ordering: "The first element is the leaf certificate that the connection is verified against."
    firstCertificate := connectionState().PeerCertificates[0]

    toReturn.NotAfter = firstCertificate.NotAfter
    toReturn.NotBefore = firstCertificate.NotBefore
    return toReturn, nil
} else {
    return JsonSuccessResponse{}, errors.New("Encountered HTTPS certificate with no peer certificates (internal invariant broken: a9f6135d-93fc-4319-891d-c67bf5a149fc) for URL: " + url)
}

The first if ends in a return, why have that else there? It's completely pointless!

if len(connectionState().PeerCertificates) > 0 {
    // do stuff
    return toReturn, nil
}
return JsonSuccessResponse{}, errors.New("Encountered HTTPS certificate with no peer certificates (internal invariant broken: a9f6135d-93fc-4319-891d-c67bf5a149fc) for URL: " + url)

Likewise, this entire function keeps calling connectionState() over and over again. Just assign the return value to a variable, avoid redundant calls like this. So instead of:

connectionState := tlsConnection.ConnectionState

just write

cState := tlsConnection.ConnectionState()
if len(cState.PeerCertificates) > 0 {
}

Further up in this same function, you have a similar bit of code smell in this bit:

if err != nil {
    if terr, ok := err.(net.Error); ok && terr.Timeout() {
        return JsonSuccessResponse{}, errors.New("I/O Timeout (DF95AE47-F677-40D0-B1F3-209DA7266AAC). Failed to connect to " + url + " in " + strconv.FormatUint(timeoutInSeconds, 10) + " seconds.")
    } else {
        return JsonSuccessResponse{}, fmt.Errorf("SSL certificate err (%w): "+err.Error(), err)
    }

}

Again, the else is completely unnecessary. I also fail to see the point in wrapping an error, and at the same time concatenating its string representation to the format string. What if the error string contains %s, for example. This is not only pointless (errors can be unwrapped after all), it's also fragile. The first error here is a nightmare of concatenation, with manual formatting for things that you can have golang format for you by just using the correct type (time.Duration).

Additionally, golang errors should start with a lower-case character as per convention. Most linters will complain about this code. With that said, let's rewrite both errors:

errors.New("I/O Timeout (DF95AE47-F677-40D0-B1F3-209DA7266AAC). Failed to connect to " + url + " in " + strconv.FormatUint(timeoutInSeconds, 10) + " seconds.")
// becomes
errors.New("connection to URL %s failed (timeout limit %s)", URL, timeout) // assumes timeout is of time time.Duration, will print 10s for 10 seconds...

fmt.Errorf("SSL certificate err (%w): "+err.Error(), err)
// becomes either one of:
fmt.Errorf("certificate error (%w): ", err)
fmt.Errorf("certificate error: %s", err) // not wrapped, but err.Error() gets added

Force error checks

Now this brings me to the biggest issue I have with this function. People often complain how go forces you to check errors as return values, which leads to more verbose code. There's something to be said for this, for sure, but the way you're communicating errors is the worst of both worlds. Because, even in error cases, you're returning a valid JSONSuccessResponse object, it's very easy to picture a scenario where someone writes:

resp, _ := checkSSLCert(...)
// or even
resp := checkSSLCert(...)

Whether this call is successful or not, the resp variable will be set. Compare this to a version of this function which returns a pointer to JSONSuccess on success, and nil otherwise. All of your return statements look nicer:

return nil, errors.New("connection to URL %s failed (timeout limit %s)", URL, timeout)

And the caller is forced to check the error. Failing to do so will cause their code to crash on error:

resp, _ := checkSSLCert(...) // fails
resp.AccessField // runtime panic, resp is nil

URGENT CHANGE

Just because I've scrolled through your code a few times, and each time felt like somewhere, somehow, a unicorn was dying, please, please, please fix outputResult. Change its current:

if outputToJsonP {
    mJson, err := json.Marshal(jsonResponse)
    if err != nil {
        return err
    }
    fmt.Println(string(mJson))
} else {

To remove that bloody else that wraps the entire function. Simply write

if outputToJsonP {
    mJson, err := json.Marshal(jsonResponse)
    if err != nil {
        return err
    }
    fmt.Println(string(mJson))
    // return early HERE
    return nil
}
// rest of the function here
return nil

Right, I though this was going to be a short review, I digressed quite a bit, but I do hope that, despite my just typing out what essentially is a stream of consciousness, you gain something from this.

I am aware that I can sometimes come across as harsh/blunt. Just keep in mind that none of this criticism is in any way meant as a personal attack, or aimed to be discourage you. I've always felt that I learned more/faster after a particularly brutally honest review of my work.