Online matching of appliction's requests with responses in Linux Kernel

73 views Asked by At

Using Netfilter's hooks I have implemented a Linux kernel module that inspects all the incoming and outgoing packets of a host. Using this module I can monitor the flows of a given application running on that host, i.e., by knowing the src/dst ports/IPs I can filter out (classify) the relevant packets as they being received/sent.

What I want to do next, is to match at run-time, the incoming application's requests with corresponding outgoing responses. Each request/response is identified by the src/dst IP and a unique ID.

The naive implementation is the flowing:

  • Store each incoming request in a list (array/linked list etc.)
  • For each outgoing reply:
    • Search through the list to find matching request. If request is found, remove it from the list.
  • Periodically prune the list of requests to remove requests that are older than X seconds.

The problem is that for every response, this naive algorithm requires a linear search through the list of requests to identify a match. This is costly at high rates.

Could you suggest an algorithm that can reduce the complexity. It is acceptable to sacrifice precision (loose some of the matches) to maintain performance.

Thanks!

1

There are 1 answers

0
e.jahandar On BEST ANSWER

You can use hash tables to reduce lookup operations, in linux kernel 3.7+ there is generic hash table implementation which you can use it in your kernel module. It has very easy to use functions.

You have to build a key with your requirements, say src/dest ip & port + request unique id and using it in hash table, the hash table stores a value for each key, the value could be a pointer to real request payload. based on its implementation it could have worst case of O(n), but hash tables with good fill factor could achieve O(1) or O(logN) easily.

For more optimization, you can use a bloom filter, a bloom filter could be used for filtering invalid or dropped out requests. its very fast and doesn't have memory access penalties like hash tables. but unlike hash table, it doesn't store value for each key, so you need hash table also. there is bloom filter implementation for Linux kernel here