How are LSTMs used for data compression?

Question

How are LSTMs used for data compression?

325 views Asked by vc00174 At 21 March 2022 at 02:15

The following link shows the best performing ML architectures for text compression of the dataset Text8.

What does the output mean (BPC)? And, how can an LSTM compress data?

There are 1 answers

**maxy** · Accepted Answer · 2022-03-21T06:58:04+00:00

A language model predicts the next character (or token) given the text so far. It doesn't predict a single character but a probability for each possible next character. (A categorical probability distribution.)

If you have 256 possible characters, and the model predicts that the next character will be (50% "e", 6% "a", 1% "b", etc.) then you can compress the next character into a single bit (1) if it is actually "e", or into more than a single bit (0 followed by more bits). If the model prediction is good, you will (on average) need less than 8 bits per character (BPC).

So it is possible to come up with a perfect encoding scheme to compress the text stream if you always know the correct probabilities for the next character. The problem of predicting the next character is a classification problem (the next character is the correct class). So if you have solved the classification problem, you have solved the compression problem (and the other way around).

To understand this better, you have to learn about entropy, cross-entropy, and coding. For a good explanation, see MacKay's book about Information Theory, especially Chapter 4:

https://www.inference.org.uk/mackay/itila/book.html (PDF can be downloaded)

TechQA.

How are LSTMs used for data compression?

There are 1 answers

Related Questions in MACHINE-LEARNING

Related Questions in LSTM

Related Questions in RECURRENT-NEURAL-NETWORK

Related Questions in TEXT-COMPRESSION

Popular Questions

Popular Tags

Trending Questions