I was thinking about implementing an IP blacklist and found out that both radix tree and red-black tree can be used as data structures to store IP blacklist. I noticed that many existing IP matching implementations use radix tree, while red-black tree is used in Linux kernel for managing virtual address spaces, which are similar to IP matching. So I want to know the differences between them when storing this kind of fixed-length, sortable, and commonly ranged data.
I noticed a difference in terms of "merging". For example, with 4-bit data, a radix tree cannot merge two data with different prefixes, while a red-black tree is not limited in this aspect. For example, "0111" can't merge "1000" in radix tree, but red-black tree can store "<start: 0111, length: 2>" or "<start: 0111, end: 1000>" in a node which represents a range.
I would like to hear about any factors that would influence the choice of data structure (such as memory usage, workload, caching?).
Radix tree is much preferable when you assume higher probability of having same prefix in two randomly taken data samples. If you ban IPs they with notable probability may appear in same subnet, but you do not want (if you) ban entire subnet. So there is compression on common prefix among these IPs. If you get more unifomly distributed data, you will loose performance comparing to red-black tree which is more general and balanced approach. I think your domain tends to use radix tree.
If you are considering to store ranges in rb-tree, so in fact managing Interval Tree(edit), which is applicable to ban subnets, you may notice you can do so in radix-tree based solution as well.