I have been working on a pathtracer using the Repa
library. I recently refactored it to incorporate parallelism by using the monadic computeP
. However, I found that the performance increases were negligible. Moreover, monitoring htop
, it seemed like the program was still only using one CPU. To drill down on the problem, I opened ghci
and ran the following:
~
❯ stack ghci --package repa
Configuring GHCi with the following packages:
GHCi, version 8.0.2: http://www.haskell.org/ghc/ :? for help
Loaded GHCi configuration from /tmp/ghci12667/ghci-script
Prelude> import Data.Array.Repa
Prelude Data.Array.Repa> import System.Random
Prelude Data.Array.Repa System.Random> randomList = randoms (mkStdGen 0)
Prelude Data.Array.Repa System.Random> shape = (Z :. 1000000)
Prelude Data.Array.Repa System.Random> array = fromFunction shape $ \(Z :. i) -> randomList !! i
Prelude Data.Array.Repa System.Random> sumP array
No dice. repa
still seems to use only one CPU core as indicated by htop
:
Moreover, execution team barely varies between sumP
and sumS
, slightly favoring sumS
:
Prelude Data.Array.Repa System.Random> array = fromListUnboxed (Z :. 1000000) $ take 1000000 $ randoms (mkStdGen 0)
(0.01 secs, 0 bytes)
Prelude Data.Array.Repa System.Random> sumP array
AUnboxed Z [500140.92257232184]
(0.99 secs, 1,916,158,952 bytes)
Prelude Data.Array.Repa System.Random> sumS array
AUnboxed Z [500140.92257232184]
(0.93 secs, 2,348,156,248 bytes)
What am I missing? In case it matters, I am using Arch Linux:
~
❯ uname -a
Linux roskolnikov 4.11.9-1-ARCH #1 SMP PREEMPT Wed Jul 5 18:23:08 CEST 2017 x86_64 GNU/Linux
Update
Some of the comments indicate that I should use the -threaded
option for ghci
as indicated in the repa
docs. I was under the (mis?)impression that ghci
used -threaded
by default. In any case, my program was already using these flags -- this is the snippet from the .cabal
file:
executable write
hs-source-dirs: app
main-is: Write.hs
ghc-options: -Odph
-rtsopts
-threaded
-fno-liberate-case
-funfolding-use-threshold1000
-funfolding-keeness-factor1000
-fllvm
-optlo-O3
build-depends: base
, pathtracer
, repa
, JuicyPixels
default-language: Haskell2010
Moreover, I reran the commands in ghci
using (I think) the correct ghci options:
~
❯ stack ghci\
--package repa\
--ghc-options -Odph\
--ghc-options -rtsopts\
--ghc-options -with-rtsopts=-N\
--ghc-options -threaded\
--ghc-options -fno-liberate-case\
--ghc-options -funfolding-use-threshold1000\
--ghc-options -funfolding-keeness-factor1000\
--ghc-options -fllvm\
--ghc-options -optlo-O3
Configuring GHCi with the following packages:
when making flags consistent: warning:
-O conflicts with --interactive; -O ignored.
GHCi, version 8.0.2: http://www.haskell.org/ghc/ :? for help
Loaded GHCi configuration from /tmp/ghci31252/ghci-script
Prelude> import Data.Array.Repa
Prelude Data.Array.Repa> import System.Random
Prelude Data.Array.Repa System.Random> randomList = randoms (mkStdGen 0)
Prelude Data.Array.Repa System.Random> shape = (Z :. 1000000)
Prelude Data.Array.Repa System.Random> array = fromFunction shape $ \(Z :. i) -> randomList !! i
Prelude Data.Array.Repa System.Random> sumP array
Still no dice:
I deeply appreciate any further assistance with this matter.
For whatever reason, it appears that ghci ignores certain input options and therefore monadic computations like
sumP
will only use one CPU core. However, the purpose of this experiment was to use multiple cores for a personal project that I was working on, and I was successful in that objective. The key, I think, was adding-with-rtsopts=-N
in my.cabal
file underghc-options
. The finalghc-options
are as follows: