Haskell - Lempel-Ziv 78 Compression - very slow, why?

165 views Asked by At

So I've finished my Lempel-Ziv compression/decompression, but compared to C++ its unbelievably slow, I've tried it on http://norvig.com/big.txt file, but my program can't process it, while in C++ it took only 1 second. Could any of you Haskell Gurus look at my code and tell me if there are some obvious flaws?

After adding prepending instead of appending, I managed to reduce the time from 16 seconds to 0.4!

Haskell's laziness is so deceiving, it was printing 'compression finished' almost immediately, but in fact it was the compression that made the program run so slow

import System.IO  
import Control.Monad
import qualified Data.Map as Map
import Debug.Trace
main = do  
  contents <- readFile "plik.txt"  
  let compressed = reverse $ compress contents Map.empty 1 "" []  
  let decompressed = reverse $ decompress compressed Map.empty 1 ""
  --print $ contents
  print $ length compressed
  print $ length decompressed
  --print $ contents == decompressed
compress :: String -> Map.Map String Int -> Int -> String -> [(Int,Char)]-> [(Int,Char)]
compress (s:x) m i c out = do
    if Map.member (c++[s]) m == False 
        then do 
            if c == ""
                then do
                    let newMap = Map.insert [s] i m
                    compress x newMap (i+1) c ((0,s):out)
                else do
                    let newMap = Map.insert (c++[s]) i m
                    compress x newMap (i+1) "" ((newMap Map.! c, s):out)
        else compress x m i (c++[s]) out
    
compress s m i c out = compress2 out

compress2 ::  [(Int,Char)]-> [(Int,Char)]
compress2 out = trace("COMPRESSION FINISHED") out



decompress :: [(Int,Char)] -> Map.Map Int String -> Int -> String -> String
decompress (s:x) m i out = do
    if fst s ==  0
        then do
            let newMap = Map.insert i [snd s] m
            decompress x newMap (i+1) ((snd s):out)
        else do
            let newMap =  Map.insert i ((m Map.! fst s)++[snd s]) m
            decompress x newMap (i+1) ((snd s):(reverse (newMap Map.! fst s))++out)
    
decompress s m i out = out

decompress2 ::  String -> String
decompress2 out =  trace("DECOMPRESSION FINISHED") out
1

There are 1 answers

1
MathematicalOrchid On

As others have already said in the comments:

  • Avoid using String; it uses a lot of RAM and it's quite slow to process. Use ByteString (if you're expecting to process raw binary data) or Text (if you're expecting Unicode text).

  • Don't append to lists. (String is also a list — but don't use String in the first place.) Prepend if it's easy. If not, use Data.Sequence, Data.Set or Data.Map as appropriate. Or maybe even Text, ByteString or Vector.

  • If you see a Haskell program that's really slow and uses buckets of RAM even though the input and output files are tiny, you probably have something somewhere being too lazy.

As an example, I had a program to produce a byte histogram. It took about 20 minutes and consumed 8 GB of RAM. I changed a data constructor to be strict, simply by adding a single ! to it. Now the program takes a fraction of a second! Laziness can make an absurdly large difference if you get it wrong.

Looking at the code you've posted, that probably covers it all. (I'm not sure exactly what changes you've tried and where you're up to now.)