Reconstruct the source file from string output

472 views Asked by At

I use stepic3 to hide some data. Multiple files are compressed into a zip file, which will be the hidden message. However, when I use the following code

from PIL import Image
import stepic

def enc_():
    im = Image.open("secret.png")
    text = str(open("source.zip", "rb").read())
    im = stepic.encode(im, text)
    im.save('stegolena.png','PNG')

def dec_():
    im1=Image.open('stegolena.png')
    out = stepic.decode(im1)
    plaintext = open("out.zip", "w")
    plaintext.write(out)
    plaintext.close()

I get the error

Complete Trace back
Traceback (most recent call last):
File "C:\Users\Sherif\OneDrive\Pyhton Projects\Kivy Tests\simple.py", line 28, in enc_()
File "C:\Users\Sherif\OneDrive\Pyhton Projects\Kivy Tests\simple.py", line 8, in enc_
im = stepic.encode(im, text)
File "C:\Users\Sherif\OneDrive\Pyhton Projects\Kivy Tests\stepic.py", line 89, in encode
encode_inplace(image, data)
File "C:\Users\Sherif\OneDrive\Pyhton Projects\Kivy Tests\stepic.py", line 75, in encode_inplace
for pixel in encode_imdata(image.getdata(), data):
File "C:\Users\Sherif\OneDrive\Pyhton Projects\Kivy Tests\stepic.py", line 58, in encode_imdata
byte = ord(data[i])
TypeError: ord() expected string of length 1, but int found

There are two ways to convert to a string.

text = open("source.zip", "r", encoding='utf-8', errors='ignore').read()

with output

PKn!K\Z

sec.txt13 byte 1.10mPKn!K\Z

sec.txtPK52

or

text = str(open("source.zip", "rb").read())

with output

b'PK\x03\x04\x14\x00\x00\x00\x00\x00n\x8f!K\\\xac\xdaZ\r\x00\x00\x00\r\x00\x00\x00\x07\x00\x00\x00sec.txt13 byte 1.10mPK\x01\x02\x14\x00\x14\x00\x00\x00\x00\x00n\x8f!K\\\xac\xdaZ\r\x00\x00\x00\r\x00\x00\x00\x07\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xb6\x81\x00\x00\x00\x00sec.txtPK\x05\x06\x00\x00\x00\x00\x01\x00\x01\x005\x00\x00\x002\x00\x00\x00\x00\x00'

I used the second and I got the same string back from the retrieval.

In order to reconstruct the zip file (output is string), I use the code

plaintext = open("out.zip", "w")
plaintext.write(output)
plaintext.close()

but the written file says is corrupted when I try to open it. When I try to read what was written to it, with either

output = output.encode(encoding='utf_8', errors='strict')

or

output = bytes(output, 'utf_8')

the output is

b"b'PK\\x03\\x04\\x14\\x00\\x00\\x00\\x00\\x00n\\x8f!K\\\\\\xac\\xdaZ\\r\\x00\\x00\\x00\\r\\x00\\x00\\x00\\x07\\x00\\x00\\x00sec.txt13 byte 1.10mPK\\x01\\x02\\x14\\x00\\x14\\x00\\x00\\x00\\x00\\x00n\\x8f!K\\\\\\xac\\xdaZ\\r\\x00\\x00\\x00\\r\\x00\\x00\\x00\\x07\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\xb6\\x81\\x00\\x00\\x00\\x00sec.txtPK\\x05\\x06\\x00\\x00\\x00\\x00\\x01\\x00\\x01\\x005\\x00\\x00\\x002\\x00\\x00\\x00\\x00\\x00'"

which is different from the source file.

What do I have to reconstruct the embedded file faithfully?

1

There are 1 answers

0
Reti43 On BEST ANSWER

When you read a file in rb mode, you'll get a byte array. If you print it, it may look like a string, but each individual element is actually an integer.

>>> my_bytes = b'hello'
>>> my_bytes
b'hello'
>>> my_bytes[0]
104

This explain the error

"C:\Users\Sherif\OneDrive\Pyhton Projects\Kivy Tests\stepic.py", line 58, in encode_imdata byte = ord(data[i]) TypeError: ord() expected string of length 1, but int found

ord() expects a string, so you have to convert all the bytes to strings. Unfortunately, str(some_byte_array) doesn't do what you think it does. It creates a literal string representation of your byte array, including the preceeding "b" and the surrounding quotes.

>>> string = str(my_bytes)
>>> string[0]
'b'
>>> string[1]
"'"
>>> string[2]
'h'

What you want instead is to convert each byte (integer) to a string individually. map(chr, some_byte_array) will do this for you. We have to do this simply because stepic expects a string. When it embeds a character, it does ord(data[i]), which converts a string of length one to its Unicode code (integer).

Furthermore, we can't leave our string as a map object, because the code needs to calculate the length of the whole string before embedding it. Therefore, ''.join(map(chr, some_bytearray)) is what we have to use for our input secret.

For extraction stepic does the opposite. It extracts the secret byte by byte and turns them into strings with chr(byte). In order to reverse that, we need to get the ordinal value of each character individually. map(ord, out) should do the trick. And since we want to write our file in binary, further feeding that into bytearray() will take care of everything.

Overall, these are the changes you should make to your code.

def enc_():
    im = Image.open("secret.png")
    text = ''.join(map(chr, open("source.zip", "rb").read()))
    im = stepic.encode(im, text)
    im.save('stegolena.png','PNG')

def dec_():
    im1=Image.open('stegolena.png')
    out = stepic.decode(im1)
    plaintext = open("out.zip", "wb")
    plaintext.write(bytearray(map(ord, out)))
    plaintext.close()