Python-like Byte Array String representation in C#

1.5k views Asked by At

Here is the method I am using to get this string representation:

    public static string ByteArrayToString(byte[] ba, string prefix)
    {
        StringBuilder hex = new StringBuilder(ba.Length * 2);
        foreach (byte b in ba)
        {
            if (prefix != null)
            {
                hex.Append(prefix);
            }
            hex.AppendFormat("{0:x2}", b);
        }
        return hex.ToString();
    }

Here is a sample string representation of a byte array (ByteArrayToString(arr, "\\x")):

\x00\x00\x00\x80\xca\x26\xff\x56\xbf\xbf\x49\x5b\x94\xed\x94\x6e\xbb\x7a\xd0\x9d
\xa0\x72\xe5\xd2\x96\x31\x85\x41\x78\x1c\xc9\x95\xaf\x79\x62\xc4\xc2\x8e\xa9\xaf
\x08\x22\xde\x22\x48\x65\xda\x1d\xca\x12\x99\x42\xb3\x56\xa7\x99\xca\x27\x7b\x2b
\x45\x77\x14\x5b\xe1\x75\x04\x3d\xdb\x68\x45\x46\x72\x61\x20\xa9\xa2\xd9\x50\xd0
\x63\x9b\x4e\x7b\xa4\xa4\x48\xd7\xa9\x01\xd1\x8a\x69\x78\x6c\x79\xa8\x84\x39\x42
\x32\xb3\xb1\x1f\x04\x4d\x06\xca\x2c\xd5\xa0\x45\x8d\x10\x44\xd5\x73\xdf\x89\x0c
\x25\x1d\xcf\xfc\xb8\x07\x6b\x1f\xfa\xae\x67\xf9\x00\x00\x00\x03\x01\x00\x01

Here is the representation I want (this is Python's, ignore different newline positions, this is all on a single line):

\x00\x00\x00\x80\xca&\xffV\xbf\xbfI[\x94\xed\x94n\xbbz\xd0\x9d\xa0r\xe5\xd2\x961
\x85Ax\x1c\xc9\x95\xafyb\xc4\xc2\x8e\xa9\xaf\x08"\xde"He\xda\x1d\xca\x12\x99B\xb
3V\xa7\x99\xca\'{+Ew\x14[\xe1u\x04=\xdbhEFra \xa9\xa2\xd9P\xd0c\x9bN{\xa4\xa4H\x
d7\xa9\x01\xd1\x8aixly\xa8\x849B2\xb3\xb1\x1f\x04M\x06\xca,\xd5\xa0E\x8d\x10D\xd
5s\xdf\x89\x0c%\x1d\xcf\xfc\xb8\x07k\x1f\xfa\xaeg\xf9\x00\x00\x00\x03\x01\x00\x0
1

The Python representation appears to convert bytes between (decimal) 32 and 126 to their ASCII representations instead of escaping all the bytes uniformly. How would I get the C# version to produce the same string output? I am reliant on a hash of this string output, so they need to be exactly identical.

2

There are 2 answers

0
Matt Johnson-Pint On BEST ANSWER

Well, if you're certain of the logic of the encoding, then you can just implement it:

foreach (byte b in ba)
{
    if (b >= 32 && b <= 126)
    {
        hex.Append((char) b);
        continue;
    }

    ...

If you're looking for performance though, you should check out this answer, and possibly make some adjustments to one of the methods listed there.

0
Michael B. On

Matt's answer is correct and got me pointed in the right direction for my own usage, but there are a few cases that need to be accounted for that I had to figure out with trial and error.

Edit: This is for matching output from the str(byte) function in Python 3.12, it may not have been the case in 2015, but this was still the top answer when I was looking for help in 2023.

So far, 4 chars that need escaping as literals, not hex

The final format of your string may change based on the presence of single quotes and double quotes within the final hex string.

The Default output wraps the string like b'' BUT if the string contains a single quote, and no double quote, it is wrapped as b"" instead. If instead the string contains both ' and " , all single quotes in the string will be escaped and b"" is used.

examples

  • default: b'x\xc1o'
  • single quote/s present: b"x\xc1'o"
  • single quote/s and double quote/s present: b"x\xc1\'"o" Must escape single quote
foreach (byte b in ba)
{
    if (b == 92) hex.Append("\\\\");
    if (b == 10) hex.Append("\\n");
    else if (b == 13) hex.Append("\\r");
    else if (b == 9) hex.Append("\\t");
    else if (b >= 32 && b <= 126)
    {
        hex.Append((char) b);
        continue;
    }
    else
    {
        hex.Append("\\x");
        hex.AppendFormat("{0:x2}", b);
    }
}

string hexformat = hex.ToString();

if (hexformat.Contains("'") && !hexformat.Contains("\""))
{
     hexformat = "b\"" + hexformat + "\"";
}
else if (hexformat.Contains("'") && hexformat.Contains("\""))
{
    hexformat = hexformat.Replace("'", "\\'");
    hexformat = "b\'" + hexformat + "'";
}
else
{
     hexformat = "b\'" + hexformat + "'";
}

// Could be optimized by checking the bytearray first for the presence of ' 
// and " instead of doing a Replace at the end.