In the source code on Hackage I read this:
instance Profunctor (->) where
dimap ab cd bc = cd . bc . ab
{-# INLINE dimap #-}
lmap = flip (.)
{-# INLINE lmap #-}
rmap = (.)
{-# INLINE rmap #-}
but the default implementations of dimap
/lmap
/rmap
for the Profunctor
would require one to just define either both lmap
and rmap
, or dimap
; defining all of them is unnecessary.
Is there a reason why they are all defined, instead?
As @FyodorSoikin comments, the intention was probably that the
lmap
andrmap
hand-coded definitions would be more efficient than the default definitions based ondimap
.However, using the test program below, I tried defining the instance with all three of
dimap
/rmap
/lmap
,dimap
only, andrmap
/lmap
only, and the core for the test functionsl
,r
, andb
was precisely the same in all three cases when compiled with-O2
:While it's possible that for more complicated examples the compiler will fail to optimize the default definitions of
lmap f = dimap f id
andrmap = dimap id
, it strikes me as exceedingly unlikely, and so the hand-codedlmap
andrmap
don't make any difference.I think the explanation is that even extremely skilled Haskell programmers like Edward Kmett still underestimate the compiler and perform unnecessary hand-optimizations of their code.
Update: In a comment, @4castle asked what happens without optimization. With the caveat that "because it improves
-O0
code" doesn't strike me as a sound argument for anything, I took a look.In unoptimized code, the explicit
rmap
definition produces better Core by avoiding an extra composition withid
:while the explicit
lmap
definition ends up producing Core that's about the same, or arguably worse.As a consequence of the above definitions, the explicit
dimap
is better than the default because of the extraflip
.In another comment, @oisdk scolded me for my unrealistic test. I will point out that failure to inline recursion isn't really an issue here, since none of
dimap
,lmap
, orrmap
is recursive. In particular, simply "using" one of these in a recursive manner, likefoo = foldr rmap id
doesn't interfere with inlining or optimization, and the generated code forfoo
is the same with the explicit and defaultrmap
.Also, splitting the class/instance from the
l
/r
definitions into separate modules makes no difference to my test program, nor does splitting it up into three modules, the class, the instance, andl
/r
, so it doesn't seem like inlining across module boundaries is much of a problem here.For unspecialized polymorphic usage, I guess it'll come down to the
Profunctor (->)
dictionary that's generated. I see the following which seems to show that an explicitdimap
with defaultlmap
andrmap
produces better code than the alternatives. The problem seems to be thatflip (.)
isn't being properly optimized here, so the explicitlmap
definition is counterproductive.If someone has an example where these explicit definitions generate better
-O2
code, it would make a great alternative answer.Here's my test program. I compiled with
ghc -O2 Profunctor.hs -fforce-recomp -ddump-simpl -dsuppress-all -dsuppress-uniques
.