I am developing an application in Python that communicates to a device over RS-485 two wire, half-duplex. I have enough of the application working that I can perform some performance tests. I am using a laptop with a USB to 485 converter. The communications is setup as 9600,N,8,1.
For my speed test I send a message with a total length of 10 bytes including the check byte. Then I wait for the reply of 13 bytes. I decode the reply as it is coming in. When the response is complete. I then send the next message. I repeat this 100 times as fast as possible. This takes 2.895 seconds.
From this I calculate that I am transmitting/receiving 23 bytes * 100 iterations / 2.895 seconds = 794 bytes/s.
If I understand it correctly serial port communication of 9600 N-8-1 has 1 start bit, 8 data bits and 1 stop bit. This means that it has a 2 bit overhead. So the actual theoretical transmission rate is (9600 bits / s) * (8 data bits / 10 transmission bits) * (1 Byte / 8 bits) = 960 bytes / s.
My program is transmitting/receiving at a combined rate of 794 bytes/s out of a possible 960 bytes / s = 82.7%.
Should I be able to achieve near 100% of the 960 bytes/s. Or is it typical to have this much bandwidth un-utilized?
You're going to give up some time when the direction of communication is reversed. So there's some "dead time" between when one side receives the last stop bit and when it loads the first response byte into the UART transmitter and starts driving the first start bit.
I'm calculating that this dead time is 5 ms (almost 5 bit times, i.e. half a byte counting framing overhead) per two-way run, or 0.495 seconds of your 2.895 total seconds. This isn't bad, but it could be a little better. However, I'm not sure you'll get much improvement without writing your own UART driver.
(This all assumes, of course, that the clocks both computers are using are crystal accurate. This isn't always true, since UARTs at 8N1 can tolerate up to about 2% clock difference between each end.)
In embedded land, if we wanted to do this with absolute minimum bandwidth loss, we'd write the driver as a standard two-way full-duplex driver, with some way to know when to switch directions (e.g. on packet bounds). This driver would then only push bytes in the correct direction, leaving the other queue unused.
At the user (application) level, we'd have to make sure those queues were never starved. That means that, in your example, the 13-byte response packet would need to be ready to go before the 10-byte incoming packet is fully received. The other end would need to do the same.
With larger machines, the usual practice is to "coalesce" several packets in each direction and transmit them consecutively, so as to minimize the number of times that you have to change direction. This increases latency, though, and requires more memory, which could be a problem with a small microcontroller with only a couple kB of RAM.
IMHO, given your small packets, frequent direction reversals, and lack of driver-level optimizations, that bandwidth looks about right.