I would like to obtain a parallel download of a file, for example, if the file size of 54 kb, I would like to blocks of 10kb was downloaded the file's contents.
In addition, I have no more than 5 requests at once. but how? I thought of using the fork (), but not really understand how.
1-10 first request
11-20 second request
21-30 third request
31-40 fourth request
41-50 fifth request
51-54 waits until past one request ends.then it will be execute.
I don't care about the method to get data(recv etc etc). I just want to know how to implement a concurrent method? (better if I can do with fork())
There are some readily available software libraries which will provide this functionality. The main one I can think of is curl. You can find an easy introduction to the curl multi library here.
It's usually best to avoid reinventing the wheel unless you have a very good reason (such as improving the world of technology, or for academic research).
For the sake of academic research, and since no "link-only" answers would suffice, I'll elaborate on the one of many possible ways that one could go about multiplexing sockets.
Non-blocking sockets
The first, and currently most portable method is to use non-blocking sockets and/or non-blocking socket calls, however it's important to realise (especially when using the non-blocking socket calls as opposed to setting
O_NONBLOCK
to the file descriptor): some things will still block. For example, you can't getconnect
to return immediately unless you set the file descriptor to non-blocking mode, and you of coursegetaddrinfo
(and similar standard name resolution functions) will block, too.When you use non-blocking files or calls to functions, the functions will return immediately. If there's no data ready, they'll indicate this through their return values. If there's data ready to be processed, again, it'll show through the return value.
There are two ways (that I know of) to ensure non-blocking socket calls (including
connect
).fcntl(socket_fd, fcntl(socket_fd, F_SETFL, fcntl(fd, F_GETFL, 0) | O_NONBLOCK)
. Following that, all calls toread
,write
,accept
andconnect
will return immediately, without delay.connect
has a few error codes (inerrno
) specifically for this, such asEALREADY
,EINPROGRESS
,EISCONN
andEWOULDBLOCK
which you'll want to check for, because when you enable non-blocking, some error return values are actually success return values in disguise; you need to checkerrno
.For Windows systems, call
ioctlsocket(socket_fd, FIONBIO, (u_long[]){1})
. The same semantics will occur as described above, except that theerrno
codes won't beerrno
codes (they'll beGetLastError()
codes, instead) and they're... probably different values, I don't know. Many of them have similar names, however, so in my projects I usually use something like:Not worthy of mention
Using non-blocking sockets alone, I wouldn't be surprised at all if you manage to maintain several thousand connections with a single thread, on a variety of systems with little tweaking necessary. However, this model is not ideal as you need a busy loop to cycle through each socket, instantaneously testing them for events each loop; rather than your code being triggered to wake up by the OS when an event arrives, for example.
We know that in order to send events to the application, the kernel needs to process the events, so we can give it some time using
sleep(0);
for example, as a quick fix. This'll see CPU use drop from near 100% to under 10% for certain. However, another method exists of multiplexing numerous (blocking or not) sockets with a non-blocking (or time-out interrupted) function, such that the function will return immediately when some data is available, or will wait until the time expires to receive data.select
has obviously strong benefits, however there are drawbacks, too; namely, the sets are typically restricted to low numbers of sockets; to support large numbers of sockets, you'll need a loop within a loop, as you'll find the 64 socket limit (or whatever it is) runs out quickly. Additionally, it doesn't solve theconnect
blocking problem (where-as theO_NONBLOCK
and ~FIONBIO` method does).Thus, I'm not going to talk any more about
select
; I'll describe the other options available to you. Another example with similar limitations ispoll
; I won't talk about that, either. If you want to know about that, there's plenty on the internet about it...Note that everything from this point on is quite non-portable (though you might find ways to wrap them all into a common interface, like curl multi does).
Asynchronous socket calls will begin a connection, then return immediately like the non-blocking socket calls, except they'll also raise a signal or call a function which you specify when the connection is complete. This is putting the OS in control of notifying your code when events arrive, rather than the OS waiting for you. It should be clear that asynchronous sockets are ideal as far as optimisation goes, but they're not portable. There are various options per OS:
epoll
for Linuxkqueue
for FreeBSD (and possibly OS X?)WSAAsyncSelect
.All of these have something in common, which is that they call a function (or raise a signal, which you could translate to a call to a function) upon success or failure. However, their interfaces aren't so close to being familiar.
Typically, I haven't bothered writing any kind of wrapper for them, as I find the non-blocking sockets I mentioned at the beginning of this answer are more than adequate nowadays. What's important is that I don't need to port it to every system, because... I'm too lazy for that! I'll only optimise for a system when someone shows me it's slow on that system. Otherwise we end up digging ourselves into a torrent of systems people might not ever even use our software on...