LZW compression generation file bigger than original

1.9k views Asked by At

I have a text file and I think I implemented the LZW algorithm correctly, but the compressed file is getting bigger than the original.

I'm not running the LZW in the bytes of the text, but in the string instead.

I build a dictionary [string:int] and run it. I wonder if I should make it with the bytes instead of strings.

It also runs line by line of the file, instead of building just one dictionary for the whole file.

This is my LZW

map<string, int> D;                      //dictionary

int init(){                              //init dictionary with all single chars
    D.clear(); rD.clear();
    f(i,,256){
        D[string(1, char(i))] = i + 1;
    }
    return 257;
}

void encode(char* file){                 //LZW encoding method
    ifstream in(file);
    if (!in.is_open()) {cout<<"Could not open file"<<endl; return;}
    else {
        ofstream out("compressed.txt");
        for(string text; getline(in, text); ){

            int value = init();
            vector<int> idx;
            string p = "", c = "", pc = "";

            for(int i = 0; i < text.size(); i++){
                c = text[i];
                let s = p + c;
                if(D.find(s) != D.end()){
                    p = p + c;


          }
            else{
                idx.push_back(D[p]);
                D[s] = value++;
                p = c;
            }
        }
        idx.push_back(D[p]);
        int len = idx.size();
        f(i,,len) {out<<idx[i]; if(i == len-1) out<<" 0"<<endl; else out<<" ";}
    }
    in.close();
    out.close();
    cout<<"File compressed successfully"<<endl;

}

}

it just receives the address of the file and compresses it to "compressed.txt" file.

1

There are 1 answers

2
Tiger Hwang On

The heart of LZW is translate duplicated bytes into symbol, then write the symbols to bitstream. The more duplicated bytes you have, the higher compression ratio you will get. And the packed bits will save many space.

When you write a symbol as int to ofstream in that way its likely to use more than 4 bytes. But with the packed bit, it should occupy from 9 bits to 16 bits depending on how you set it. I think this is the main reason your output is bigger than expected.

Good luck.