UPDATE: in a continuation to @kisch's great answer, I read about softirq context, and it seems that (for a very reasonable reason) it is impossible to access user-mode from within this context. I assume that this is indeed the reason why it failed.
currently work on a kernel module where I deal with user-space files. I know it is considered a bad practice and all, but I still need.
The module places an hook using netfilter to catch every outgoing packet in the system, and while the hook is called - it calls filp_open
.
The voodoo starts here.
When I send a ping from the loopback, everything works fine and the file (/etc/fstab
) in this case is being opened successfully.
When I ping the machine from a different IP in my house, filp_open
fails with ENOENT
.
To figure out where it actually fails, I ran the module on a QEMU emulation, successfully reproducing the weird behavior. Apparently, it fails in the kernel inner function do_last
, in the next code (taken from fs/namei.c
):
if (unlikely(d_is_negative(path.dentry))) {
path_to_nameidata(&path, nd);
return -ENOENT;
}
I have absolutely no clue what makes it fail as the file existed the whole time.
Anyone has any idea?
This is the part in the code where it fails:
unsigned int nf_sendfile_hook(void *priv,
struct sk_buff *skb,
const struct nf_hook_state *state)
{
if (NULL == g_get_payload_func) {
// as long as we don't have a way to get our payloads, we don't
// have much to do.
return NF_ACCEPT;
}
struct file *filp;
filp = filp_open("/etc/fstab", O_RDONLY, 0);
if (IS_ERR(filp)) {
printk(KERN_ERR "%p\n", filp);
return NF_ACCEPT;
}
...
}
Thanks in advace.
I'd need some more details to be sure about it. Where exactly do you hook into?
Very likely, the different behaviour is caused by the different context in which your hook function is called by the kernel.
When you
you have a userspace process issueing a sendmsg() syscall. The kernel starts a callchain in a user context attached to that process. The netfilter hook is likely called directly in that callchain, before the packet is put into a queue for further, detached processing.
When you
you have a callchain starting in the Soft-IRQ context of the
NET_RX_SOFTIRQ
, beginning innet_rx_action()
. That callchain classifies the incoming packet as ICMP, passes it to the internal ICMP receive routine, which directly sends the ping reply packet, which then likely calls the netfilter hook.The Soft-IRQ context has no relation to any userspace process.
Now depending on your kernel setup, it's entirely possible that the filesystem lookup code is dependent on the information present in a user context, to decide about access restrictions. You might have mount namespaces, so without the process ID of a user context, the /etc filesystem might not even be mounted, which would explain the ENOENT.
It's also quite possible that the filesystem lookup code would need to call some operation which needs to
schedule()
, i.e., block until a time-consuming operation completes (like paging in blocks of the underlying device of the filesystem for lookup). This wouldn't work from the SoftIRQ context.This is not a complete answer yet, too many "likely", but I'm pretty sure it's the right direction where to find it.