Suppose I have a workload that branches out like a tree.
I have to process n A items.
Processing each A item requires processing of m B items.
This goes on for another level or two.
And I have the following functions:
func handler() {
var aList []A
var wg sync.WaitGroup
for _, a := range aList {
wg.Add(1)
go func () {
defer wg.Done()
processA(a)
}
}
wg.Wait()
}
func processA(a A) error {
var wg sync.WaitGroup
for _, b := range a.BList {
wg.Add(1)
go func () {
defer wg.Done()
processB(b)
}
}
wg.Wait()
}
func processB(b B) error {
var wg sync.WaitGroup
for _, c := range b.CList {
wg.Add(1)
go func () {
defer wg.Done()
processC(c)
}
}
wg.Wait()
}
Now the nature of all of theses tasks is that they're BSP (Bulk Synchronous Processses).
By that I mean, that they need no communication amongst themselves.
And more importantly, if I had an infinite amount of Cores, there would be NO waiting in any thread/goroutine.
Now let's come back to Earth, I am running this on a Lambda function that will have 2/3/4 Cores to offer.
My workload is still such that no memory limits will be hit.
Now should I change my code to limit the number of goroutines?
If what I want is speedup?
Is my code lagging due to too much context switching?
Or is there no such thing as "too many goroutines"?
The answer is unsatisfying: It depends.
You need to incorporate profiling into your development cycle to understand the best way forward — CPU and trace profiles work well for this. We can't easily predict your workload.
If you're saying the work is entirely CPU-bound, having a number of goroutines equal to CPUs will likely perform best. Then your outer-logic sends work to each worker with a channel.
Trace profiles will show how efficiently work is being scheduled for each CPU. CPU profiling could highlight where the hot-spots are.
There's a talk by Dave Cheney where he shows three profiling techniques, including one where too many goroutines slowed down a CPU-bound program. It's worth a watch: https://www.youtube.com/watch?v=nok0aYiGiYA