Following is from the Visual Studio's C# Interactive Compiler:
> BitConverter.ToString(Encoding.BigEndianUnicode.GetBytes(""))
"D8-3D-DE-00"
> BitConverter.ToString(Encoding.BigEndianUnicode.GetBytes(""))
"D8-3C-DF-F4"
> BitConverter.ToString(Encoding.BigEndianUnicode.GetBytes(""))
"D8-3D-DE-00-D8-3C-DF-F4-DB-40-DC-67-DB-40-DC-62-DB-40-DC-65-DB-40-DC-6E-DB-40-DC-67-DB-40-DC-7F"
Emoji smiley's code units are a surrogate pair as expected - "D8-3D-DE-00"
Emoji flag's code units are a surrogate pair as expected - "D8-3C-DF-F4"
Given that, shouldn't the code units of emoji smiley followed by emoji flag have been - "D8-3D-DE-00-D8-3C-DF-F4"?
The latter isn't a simple black flag emoji but a Emoji Tag Sequence:
I have written PowerShell cmdlet
Get-CharInfo
formerly and here's the result for your string (the columnCodePoint
contains Unicode (U+hhhh) and UTF-8 bytes, the columnDescription
contains a surrogate pair if any):