When debugging an operation with UTF8 strings, sometimes I want to see the string representation from a given ReadOnlySpan<byte> so i created a static function to help me achieve it, but, one of the ways to do so doesn't worked as spected, i wonder why does the outcoming string is incomprehensible.
//#define FORCE_NOT_UTF8
using MemoryMarshal = System.Runtime.InteropServices.MemoryMarshal;
using Unsafe = System.Runtime.CompilerServices.Unsafe;
using Encoding = System.Text.Encoding;
static string ForgeString(ReadOnlySpan<byte> utf8Runes)
{
Span<char> buffer = utf8Runes.Length > 1024
? new char[utf8Runes.Length]
: stackalloc char[1024]
;
#if FORCE_NOT_UTF8
Encoding.UTF8.GetChars(utf8Runes, buffer);
#else
if (Encoding.Default.BodyName != Encoding.UTF8.BodyName)
{
Encoding.UTF8.GetChars(utf8Runes, buffer);
}
else if(buffer.Length is <= 1024)
{
MemoryMarshal.Cast<byte, char>(utf8Runes).CopyTo(buffer);
}
else
{
ref readonly var elmnt0 = ref utf8Runes[0];
ref var ptrSrc = ref Unsafe.AsRef(in elmnt0);
ref var ptrDst = ref buffer[0];
for(int i = 0; ptrSrc is not default(byte) && i < utf8Runes.Length; i++)
{
ptrDst = (char) ptrSrc;
ptrSrc = ref Unsafe.Add(ref ptrSrc, 1);
ptrDst = ref Unsafe.Add(ref ptrDst, 1);
}
}
#endif
Index end = buffer.IndexOf(default(char)) is int index and not -1 ? new(index) : Index.End;
return new(buffer[..end]);
}
string result1 = default!;
string result2 = default!;
result1 = ForgeString("foobar"u8);
result2 = ForgeString("james james james (...repeating 166 times)"u8);
Console.WriteLine(result1);
Console.WriteLine(result2);
//in order to get string result3 its necessary to recompile with compiler symbol FORCE_NOT_UTF8
The for loop prints normally, 'James' a bunch of times but, using marshal casting, 'foobar' produces '潦扯牡.'
What's happing behind Cast<TFrom,TTo> to create this unexpected sequence? I thought the idea of it was literally (T)eing each element of a given span.