I've added instrumentation to my code:
state.worker_stats.counters[TUN_TX]++;
state_switch_activity(KERN_WRITE_TUN);
ssize_t wrote = write(peer->tun_fd, plaintext, plaintext_len);
state_switch_activity(USER_UDP_TO_TUN);
where:
static uint64_t nanonow() {
struct timespec t;
int err = clock_gettime(CLOCK_MONOTONIC, &t);
if (err) {
perror(__FILE__);
exit(errno);
}
return t.tv_nsec + 1000000000 * t.tv_sec;
}
enum state_activity state_switch_activity(enum state_activity new_activity) {
uint64_t now = nanonow();
struct state_worker_stats *ws = &state.worker_stats; // abbreviate "state.worker_stats."
enum state_activity old_activity = ws->current_activity;
ws->current_activity = new_activity;
uint64_t duration = now - ws->last_change;
ws->last_change = now;
ws->times[old_activity] += duration;
return old_activity;
}
This, and other instrumentation, results in per-packet stats:
outbound:
KERN_READ_TUN per TUN_RX: 3.199167 µs
USER_TUN_TO_UDP per TUN_RX: 4.038177 µs
USER_ENCRYPT per UDP_TX_UNIQ: 7.955924 µs
KERN_SEND_UDP per UDP_TX: 15.789421 µs
USER_TX_HISTORY per UDP_TX_UNIQ: 0.603579 µs
incoming:
KERN_RECV_UDP per UDP_RX: 6.712680 µs
USER_UDP_TO_TUN per UDP_RX: 5.095780 µs
USER_RX_HISTORY per UDP_RX: 1.087274 µs
USER_DECRYPT per UDP_RX_UNIQ: 11.485002 µs
KERN_WRITE_TUN per TUN_TX: 20.241663 µs
(I've sanity checked my userspace and kernel/system totals against the results from getrusage()
and they very nearly agree, so I'm confident in my instrumentation.)
The biggest use of time, for incoming packets, is writing them to tun0 (20.241663 µs per packet).
Sending UDPs is also very slow.
What can I do to increase the speed of write() and sendmsg()? (25µs of system time, per 1500 byte packet, limits me to 480Mbps if my userspace code uses no CPU time.)
Are 1Gbps+ tunnels possible in userspace on Linux?
If so, how?