Why does (Haskell) Repa use only one CPU?

366 views Asked by At

I have been working on a pathtracer using the Repa library. I recently refactored it to incorporate parallelism by using the monadic computeP. However, I found that the performance increases were negligible. Moreover, monitoring htop, it seemed like the program was still only using one CPU. To drill down on the problem, I opened ghci and ran the following:

~
❯ stack ghci --package repa
Configuring GHCi with the following packages: 
GHCi, version 8.0.2: http://www.haskell.org/ghc/  :? for help
Loaded GHCi configuration from /tmp/ghci12667/ghci-script
Prelude> import Data.Array.Repa
Prelude Data.Array.Repa> import System.Random
Prelude Data.Array.Repa System.Random> randomList = randoms (mkStdGen 0)
Prelude Data.Array.Repa System.Random> shape = (Z :. 1000000)
Prelude Data.Array.Repa System.Random> array = fromFunction shape $ \(Z :. i) -> randomList !! i
Prelude Data.Array.Repa System.Random> sumP array

No dice. repa still seems to use only one CPU core as indicated by htop:

enter image description here

Moreover, execution team barely varies between sumP and sumS, slightly favoring sumS:

Prelude Data.Array.Repa System.Random> array = fromListUnboxed (Z :. 1000000) $ take 1000000 $ randoms (mkStdGen 0)
(0.01 secs, 0 bytes)
Prelude Data.Array.Repa System.Random> sumP array
AUnboxed Z [500140.92257232184]
(0.99 secs, 1,916,158,952 bytes)
Prelude Data.Array.Repa System.Random> sumS array
AUnboxed Z [500140.92257232184]
(0.93 secs, 2,348,156,248 bytes)

What am I missing? In case it matters, I am using Arch Linux:

~
❯ uname -a
Linux roskolnikov 4.11.9-1-ARCH #1 SMP PREEMPT Wed Jul 5 18:23:08 CEST 2017 x86_64 GNU/Linux

Update

Some of the comments indicate that I should use the -threaded option for ghci as indicated in the repa docs. I was under the (mis?)impression that ghci used -threaded by default. In any case, my program was already using these flags -- this is the snippet from the .cabal file:

executable write
  hs-source-dirs:      app
  main-is:             Write.hs
  ghc-options:         -Odph 
                       -rtsopts 
                       -threaded 
                       -fno-liberate-case 
                       -funfolding-use-threshold1000 
                       -funfolding-keeness-factor1000 
                       -fllvm 
                       -optlo-O3
  build-depends:       base 
                     , pathtracer
                     , repa
                     , JuicyPixels
  default-language:    Haskell2010

Moreover, I reran the commands in ghci using (I think) the correct ghci options:

~
❯ stack ghci\
 --package repa\
 --ghc-options -Odph\
 --ghc-options -rtsopts\
 --ghc-options -with-rtsopts=-N\
 --ghc-options -threaded\
 --ghc-options -fno-liberate-case\
 --ghc-options -funfolding-use-threshold1000\
 --ghc-options -funfolding-keeness-factor1000\
 --ghc-options -fllvm\
 --ghc-options -optlo-O3

Configuring GHCi with the following packages: 

when making flags consistent: warning:
    -O conflicts with --interactive; -O ignored.
GHCi, version 8.0.2: http://www.haskell.org/ghc/  :? for help
Loaded GHCi configuration from /tmp/ghci31252/ghci-script
Prelude> import Data.Array.Repa
Prelude Data.Array.Repa> import System.Random
Prelude Data.Array.Repa System.Random> randomList = randoms (mkStdGen 0)
Prelude Data.Array.Repa System.Random> shape = (Z :. 1000000)
Prelude Data.Array.Repa System.Random> array = fromFunction shape $ \(Z :. i) -> randomList !! i
Prelude Data.Array.Repa System.Random> sumP array

Still no dice:

enter image description here

I deeply appreciate any further assistance with this matter.

1

There are 1 answers

4
ethanabrooks On

For whatever reason, it appears that ghci ignores certain input options and therefore monadic computations like sumP will only use one CPU core. However, the purpose of this experiment was to use multiple cores for a personal project that I was working on, and I was successful in that objective. The key, I think, was adding -with-rtsopts=-N in my .cabal file under ghc-options. The final ghc-options are as follows:

executable write
  hs-source-dirs:      app
  main-is:             Write.hs
  ghc-options:         -Odph 
                       -rtsopts 
                       -with-rtsopts=-N
                       -threaded 
                       -fno-liberate-case 
                       -funfolding-use-threshold1000 
                       -funfolding-keeness-factor1000 
                       -fllvm 
                       -optlo-O3