In Chapter 8 of thinking with types I learned that the fmap Sum part of
fastSum :: [Int] -> Int
fastSum = getSum . mconcat . fmap Sum
has a O(n) runtime cost, whereas using coerce instead avoids that overhead.
I know newtypes have no representation overhead, but what I do not understand is what is the runtime effect of mapping a newtype constructor over a list. I thought this would only have a compile-time overhead, and it should be just O(1), since the compiler only needs to know the type of the fmap SomeNewtypeCtr expression.
 
                        
It's hard to understand what Haskell exactly does in such cases, because of optimization. Haskell only mandates what is the result, not how it is obtained.
Some possibilities:
fmap Sumperforms a list scan, and copies each cell and each element;fmap Sumperforms a list scan, and copies each cell but not the elements (the new cells point to the old elements);fmap Sumdoes not scan the list at all and is automatically optimized to a no-op.I tried godbolt.org to inspect the generated Core and assembly. Note that it still uses an old GHC 8.6.2. Still, I turned optimization on (
-O2) and compiledand obtained
Hence
foobecomes the identity function, suitably coerced, producing the assemblywhich should be, roughly speaking, the equivalent of an immediate return in the GHC runtime system.
Concluding:
Data.Coerce.coerceis very nice since it ensures a no-op, but even plain Haskell, after optimization, can be surprisingly efficient.