I have a file which contains a matrix of numbers as following:
0 10 24 10 13 4 101 ...
6 0 52 10 4 5 0 4 ...
3 4 0 86 29 20 77 294 ...
4 1 1 0 78 100 83 199 ...
5 4 9 10 0 58 8 19 ...
6 58 60 13 68 0 148 41 ...
. .
. .
. .
What I am trying to do is sum each row and output the sum of each row to a new file (with the sum of each row on a new line).
I have tried doing it in Haskell using ByteStrings, but the performance is 3 times a slow as the python implementation. Here is the Haskell implementation:
import qualified Data.ByteString.Char8 as B
-- This function is for summing a row
sumrows r = foldr (\x y -> (maybe 0 (*1) $ fst <$> (B.readInt x)) + y) 0 (B.split ' ' r)
-- This function is for mapping the sumrows function to each line
sumfile f = map (\x -> (show x) ++ "\n") (map sumrows (B.split '\n' f))
main = do
contents <- B.readFile "telematrix"
-- I get the sum of each line, and then pack up all the results so that it can be written
B.writeFile "teleDensity" $ (B.pack . unwords) (sumfile contents)
print "complete"
This takes about 14 seconds for a 25 MB file.
Here is the python implemenation
fd = open("telematrix", "r")
nfd = open("teleDensity", "w")
for line in fd:
nfd.write(str(sum(map(int, line.split(" ")))) + "\n")
fd.close()
nfd.close()
This takes about 5 seconds for the same 25 MB file.
Any suggestions on how to increase the Haskell implementation?
The main reason for the poor performance was because I was using runhaskell instead of first compiling and then running the program. So I switched from:
to