I'm currently developing a server using connection-oriented SCTP to serve a small number of clients. After finishing the first prototype with a naive implementation, I'm now profiling the application to optimize. As it turns out, one of the two main consumers of CPU time is the networking part.
There are two questions about the efficiency of the application-level protocol I have implemented:
1) Packet size
Currently, I use a maximum packet size of 64 bytes. You can find many posts discussing packet sizes that are too big, but can they be too small? As SCTP allows me to read one packet at a time - similarly to UPD - while guaranteeing in-order delivery - similarly to TCP - this simplified implementation significantly. However, if I understand correctly, this will cost one syscall for each and every time that I send a packet. Does the amount of syscalls have a significant impact on performance? Would I be able to shave off a lot of CPU cycles by sending the messages in bunches in bigger packets, i.e. 1024 - 8192 bytes?
2) Reading and writing the buffers
I'm currently using memcpy to move data into and out of the application-level network buffers. I found many conflicting posts about what is more efficient, memcpy or normal assignment. I'm wondering if one approach will be significantly faster than the other in this scenario:
Option 1
void Network::ReceivePacket(char* packet)
{
uint8_t param1;
uint16_t param2
uint32_t param3;
memcpy(¶m1, packet, 1);
memcpy(¶m2, packet+1, 2);
memcpy(¶m3, packet+3, 4);
// Handle the packet here
}
void Network::SendPacket(uint8_t param1, uint16_t param2, uint32_t param3)
{
char packet[7]
memcpy(&packet, ¶m1, 1);
memcpy(&packet+1, ¶m2, 2);
memcpy(&packet+3, ¶m3, 4);
// Send the packet here
}
Option 2
void Network::ReceivePacket(char* packet)
{
uint8_t param1;
uint16_t param2
uint32_t param3;
param1 = *((uint8_t*)packet);
param2 = *((uint16_t*)packet+1);
param3 = *((uint32_t*)packet+3);
// Handle the packet here
}
void Network::SendPacket(uint8_t param1, uint16_t param2, uint32_t param3)
{
char packet[7]
*((uint8_t*)packet) = param1;
*((uint16_t*)packet+1) = param2;
*((uint32_t*)packet+3) = param3;
// Send the packet here
}
The first one seems a lot cleaner to me, but I've found many posts indicating that maybe the second one is quite a bit faster.
Any kind of feedback is of course welcome.
As far as I know compilers optimize memcpy calls in particular so you should probably use it.
About your first question:
A
syscall
, a system call, is your OS replying or processing your request and every time your request is being executed in kernel, which is a moderate amount of work. To be honest I am not familiar with theSCTP
concept, as a matter of fact I haven't dealt with socket programming since the last time I worked on some stuff and created a server via TCP. I remember the MTU for the relevant physical layer element was1500
, I also recall implementing my packet size as1450-1460
, as I was trying to get the maximum packet size underneath the1500
cap.So what Im saying is If I were you I would want my OS to be less active as it could so I wont get any trouble with CPU performance.