Unicode receiving with ASP.NET core websocket

791 views Asked by At

I'm trying to build a websocket server using ASP.NET core 1.1 websocket middleware that can handle text messages. My strategy is to use a fixed-size buffer to keep reading and decoding until the websocket message ends. The middleware setup goes like this:

public void Configure(IApplicationBuilder app, IHostingEnvironment env, ILoggerFactory loggerFactory)
    {
        loggerFactory.AddConsole(Configuration.GetSection("Logging"));
        loggerFactory.AddDebug();

        var logger = loggerFactory.CreateLogger("websocket");

        app.UseWebSockets();

        app.Use(async (http, next) =>
        {
            if (!http.WebSockets.IsWebSocketRequest)
            {
                await next();
                return;
            }

            var websocket = await http.WebSockets.AcceptWebSocketAsync();

            while (websocket.State == WebSocketState.Open)
            {
                var buffer = new ArraySegment<byte>(new byte[32]);
                var charbuffer = new char[32];
                try
                {
                    var sb = new StringBuilder();
                    var decoder = Encoding.UTF8.GetDecoder();

                    //aha, we got a message
                    var detectResult = await websocket.ReceiveAsync(buffer, CancellationToken.None);
                    var receiveResult = detectResult;

                    while (!receiveResult.EndOfMessage)
                    {
                        var charLen = decoder.GetChars(buffer.Array, 0, receiveResult.Count, charbuffer, 0);
                        logger.LogInformation($"Decoded {charLen} byte(s) from wire");
                        sb.Append(charbuffer, 0, charLen);
                        receiveResult = await websocket.ReceiveAsync(buffer, CancellationToken.None);
                    }
                    var charLenFinal = decoder.GetChars(buffer.Array, 0, receiveResult.Count, charbuffer, 0);
                    logger.LogInformation($"Decoded {charLenFinal} byte(s) from wire");
                    sb.Append(charbuffer, 0, charLenFinal);

                    var message = sb.ToString();
                    logger.LogInformation($"decoded message: {message}");
                    await websocket.SendAsync(new ArraySegment<byte>(Encoding.UTF8.GetBytes("got it")), WebSocketMessageType.Text, true, CancellationToken.None);
                }
                catch (Exception ex)
                {
                    logger.LogError(ex.Message);
                    logger.LogError(ex.InnerException?.Message ?? string.Empty);
                }
            }
        });
    }

Now the code works well for text that include only ASCII characters. But when I tried to send Unicode text message (Vietnamese) whose length is longer than the buffer size, an exception occurs

fail: websocket[0]
  The remote party closed the WebSocket connection without completing the close handshake.
fail: websocket[0]
     at System.Net.WebSockets.ManagedWebSocket.<ReceiveAsyncPrivate>d__60.MoveNext()
  --- End of stack trace from previous location where exception was thrown ---
     at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
     at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
     at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
     at MvcApp.Startup.<>c__DisplayClass5_0.<<Configure>b__0>d.MoveNext()

The exception occurs at the line

var detectResult = await websocket.ReceiveAsync(buffer, CancellationToken.None);

What could be the reason for this error?

1

There are 1 answers

0
vtortola On

I think this:

var charLenFinal = decoder.GetChars(buffer.Array, 0, receiveResult.Count, charbuffer, 0);

should be:

var charLenFinal = decoder.GetChars(buffer.Array, 0, receiveResult.Count, charbuffer, 0, true);

since:

https://msdn.microsoft.com/en-us/library/125z2etb(v=vs.110).aspx

Remember that the Decoder object saves state between calls to GetChars. When the application is done with a stream of data, it should set the flush parameter to true to make sure that the state information is flushed. With this setting, the decoder ignores invalid bytes at the end of the data block and clears the internal buffer

.