I have the following version of code for Python:
import hashlib
msg = 'abc'
print msg
sha256_hash = hashlib.sha256()
sha256_hash.update(msg)
hash_digest = sha256_hash.digest()
print hash_digest
And corresponding Node js version:
var crypto= require('crypto');
var msg = 'abc';
var shasum = crypto.createHash('sha256').update(msg);
var hashDigest = shasum.digest();
console.log(hashDigest);
However, the binary output is slightly off for both:
- Node : �x����AA@�]�"#�a��z���a��
- Python:�x���AA@�]�"#�a��z���a��
The hex representation is correct though between the two libraries. Am I doing something wrong here?
TL;DR
The difference is in how the languages treat their binary data and string types. When considering the final binary output, your examples both output the same values. So let's example the output of your two examples, in hex:
In Python:
In Node:
In this case, the core thing to notice is that the result in Python is returned as a string. In Python, strings are simply arrays of chars (0-255) values. The value in Node however, is stored as a Buffer, which actually represents an array of values (0-255) as well. That is the key different here. Node does not return a string, because strings in Node are not arrays of single-byte characters, but arrays of UTF-16 code units. Python supports Unicode using a separate string class designated by
u''
.So then compare your examples of printing the output, shortened for readability
vs
The Python code says, write this array of bytes to the terminal. The second however, says something very different, convert this array of bytes into a string, and then write that string to the terminal. But the buffer is binary data, not UTF-8 encoded data, so it will fail to decode your data into a string, causing garbled results. If you wish to directly compare the binary values as actual decoded values in a terminal, you need to give the equivalent instructions in both languages.
vs
process.stdout.write
in this case being a way to write binary values to the terminal, rather than strings.Really though, you should just compare the hashes as hex, since it is already a string representation of a binary value, and it's easier to read than improperly decoded unicode characters.