Is it possible to fork a process without inherit virtual memory space of parent process?

5.4k views Asked by At

As the parent process is using huge mount of memory, fork may fail with errno of ENOMEM under some configuration of kernel overcommit policy. Even though the child process may only exec low memory-consuming program like ls.

To clarify the problem, when /proc/sys/vm/overcommit_memory is configured to be 2, allocation of (virtual) memory is limited to SWAP + MEMORY * ration(default to 50%). When a process forks, virtual memory is not copied thanks to COW. But the kernel still need to allocate virtual memory space. As an analogy, fork is like malloc(virtual memory space size) which will not allocate physical memory and writing to shared memory will cause copy of virtual memory and physical memory is allocated. When overcommit_memory is configured to be 2, fork may fail due to virtual memory space allocation.

Is it possible to fork a process without inherit virtual memory space of parent process in the following conditions?

  1. if the child process calls exec after fork

  2. if the child process doesn't call exec and will not using any global or static variable from parent process. For example, the child process just do some logging then quit.

5

There are 5 answers

0
Christopher Monsanto On

This is possible on Linux. Use the clone syscall without the flag CLONE_THREAD and with the flag CLONE_VM. The parent and child processes will use the same mappings, much like a thread would; there is no COW or page table copying.

4
Basile Starynkevitch On

No, it is not possible. You might be interested by vfork(2) which I don't recommend. Look also into mmap(2) and its MAP_NORESERVE flag. But copy-on-write techniques are used by the kernel, so you practically won't double the RAM consumption.

My suggestion is to have enough swap space to not being concerned by such an issue. So setup your computer to have more available swap space than the largest running process. You can always create some temporary swap file (e.g. with dd if=/dev/zero of=/var/tmp/swapfile bs=1M count=32768 then mkswap /var/tmp/swapfile) then add it as a temporary swap zone (swapon /var/tmp/swapfile) and remove it (swapoff /var/tmp/swapfile and rm /var/tmp/swapfile) when you don't need it anymore.

You probably don't want to swap on a tmpfs file system like /tmp/ often is, since tmpfs file systems are backed up by swap space!.

I dislike memory overcommitment and I disable it (thru proc(5)). YMMV.

2
Nik Bougalis On

I'm not aware of any way to do (2), but for (1) you could try to use vfork which will fork a new process without copying the page tables of the parent process. But this generally isn't recommended for a number of reasons, including because it causes the parent to block until the child performs an execve or terminates.

0
Nominal Animal On

As Basile Starynkevitch answered, it's not possible.

There is, however, a very simple and common solution used for this, that does not rely on Linux-specific behaviour or memory overcommit control: Use an early-forked slave process do the fork and exec.

Have the large parent process create an unix domain socket and fork a slave process as early as possible, closing all other descriptors in the slave (reopening STDIN_FILENO, STDOUT_FILENO, and STDERR_FILENO to /dev/null). I prefer a datagram socket for its simplicity and guarantees, although a stream socket will also work.

In some rare cases it is useful to have the slave process execute a separate dedicated small helper program. In most instances this is not necessary, and makes security design much easier. (In Linux, you can include SCM_CREDENTIALS ancillary messages when passing data using an Unix domain socket, and use the process ID therein to verify the identity/executable the peer is using the /proc/PID/exe pseudo-file.)

In any case, the slave process will block in reading from the socket. When the other end closes the socket, the read/receive will return 0, and the slave process will exit.

Each datagram the slave process receives, describes a command to execute. (Using a datagram allows using C strings, delimited with NUL characters, without any escaping etc.; using an Unix stream socket typically requires you to delimit the "command" somehow, which in turn means escaping the delimiters in the command component strings.)

The slave process creates one or more pipes, and forks a child process. This child process closes the original Unix socket, replaces the standard streams with the respective pipe ends (closing the other ends), and executes the desired command. I personally prefer to use an extra close-on-exec socket in Linux to detect successful execution; in an error case, the errno code is written to the socket, so that the slave-parent can reliably detect the failure and the exact reason, too. If success, the slave-parent closes the unnecessary pipe ends, replies to the original process about the success, with the other pipe ends as SCM_RIGHTS ancillary data. After sending the message, it closes the rest of the pipe ends, and waits for a new message.

On the original process side, the above process is sequential; only one thread may execute start executing an external process at a time. (You simply serialize the access with a mutex.) Several can run at the same time; it is only the request to and response from the slave helper that is serialized.

If that is an issue -- it should not be in typical cases -- you can for example multiplex the connections, by prefixing each message with an ID number (assigned by the parent process, monotonically increasing). In that case, you'll probably use a dedicated thread on the parent end to manage the communications with the slave, as you certainly cannot have multiple threads reading from the same socket at the same time, and expect deterministic results.

Further improvements to the scheme include things like using a dedicated process group for the executed processes, setting limits to them (by setting limits to the slave process), and executing the commands as dedicated users and groups by using a privileged slave.

The privileged slave case is where it is most useful to have the parent execute a separate helper process for it. In Linux, both sides can use SCM_CREDENTIALS ancillary messages via Unix domain sockets to verify the identity (PID, and with ID, the executable) of the peer, making it rather straightforward to implement robust security. (But note that /proc/PID/exe has to be checked more than once, to catch the attacks where a message is sent by a nefarious program, quickly executing the appropriate program but with command-line arguments that cause it to exit soon, making it occasionally look like the correct executable made the request, while a copy of the descriptor -- and thus the entire communications channel -- was in control of a nefariuous user.)

In summary, the original problem can be solved, although the answer to the posed question is No. If the executions are security-sensitive, for example change privileges (user accounts) or capabilities (in Linux), then the design has to be carefully considered, but in normal cases the implementation is quite straight-forward.

I'd be happy to elaborate if necessary.

0
Justine Tunney On
madvise(addr, size, MADV_DONTFORK)

Alternatively, you can call munmap() after fork() to remove the virtual addresses inherited from the parent process.