Can't fork more than 200 processes, sometimes, less, depending on memory, cpu usage

168 views Asked by At

Here's the guts of the program using Parallel::ForkManager. It seems to stop at 200 proccesses, sometimes its around 30, depending on the size of the pgsql query that collects URLs to send to Mojo::UserAgent. There seems to be some hard limits somewhere? Is there a better way to write this so that I don't run into those limits? The machine its running on has 16 CPUs and 128GB of memory, so it can certainly run more than 200 proccesses that will die after the Mojo::UserAgent timeout, which is generally 2 seconds.

use Parallel::ForkManager;
use Mojo::Base-strict;
use Mojo::UserAgent;
use Mojo::Pg;
use Math::Random::Secure qw(rand irand);
use POSIX qw(strftime);
use Socket;
use GeoIP2::Database::Reader;
use File::Spec::Functions qw(:ALL);
use File::Basename qw(dirname);

use feature 'say';


$max_kids = 500;
sub do_auth {
...
        push( @url, $authurl );
}


do_auth();

my $pm = Parallel::ForkManager->new($max_kids);

LINKS:
foreach my $linkarray (@url) {
    $pm->start and next LINKS;    # do the fork
    my $ua = Mojo::UserAgent->new( max_redirects => 5, timeout => $timeout );
    $ua->get($url);
    $pm->finish;
}

$pm->wait_all_children;
2

There are 2 answers

0
Warren Dew On

Most likely you are running into an operating system limit on threads or processes. The quick and dirty way to fix this would be to increase the limit, which is usually configurable. That said, rewriting the code not to use so many short lived threads is a more scalable solution.

4
user3606329 On

For your example code (fetching a URL) I would never use Forkmanager. I would use Mojo::IOLoop::Delay or non-blocking calling style.

use Mojo::UserAgent;
use feature 'say';

my $ua = Mojo::UserAgent->new;

$ua->inactivity_timeout(15);
$ua->connect_timeout(15);
$ua->request_timeout(15);
$ua->max_connections(0);

my @url = ("http://stackoverflow.com/questions/41253272/joining-a-view-and-a-table-in-mysql",
           "http://stackoverflow.com/questions/41252594/develop-my-own-website-builder",
           "http://stackoverflow.com/questions/41251919/chef-mysql-server-configuration",
           "http://stackoverflow.com/questions/41251689/sql-trigger-update-error",
           "http://stackoverflow.com/questions/41251369/entity-framework-how-to-add-complex-objects-to-db",
           "http://stackoverflow.com/questions/41250730/multi-dimensional-array-from-matching-mysql-columns",
           "http://stackoverflow.com/questions/41250528/search-against-property-in-json-object-using-mysql-5-6",
           "http://stackoverflow.com/questions/41249593/laravel-time-difference",
           "http://stackoverflow.com/questions/41249364/variable-not-work-in-where-clause-php-joomla");

foreach my $linkarray (@url) {
    # Run all requests at the same time
    $ua->get($linkarray => sub {
    my ($ua, $tx) = @_;
    say $tx->res->dom->at('title')->text;
   });
}
Mojo::IOLoop->start unless Mojo::IOLoop->is_running;