I am using this code to scrape an API:

submissions = get_submissions(1)
with futures.ProcessPoolExecutor(max_workers=4) as executor:
#or using this: with futures.ThreadPoolExecutor(max_workers=4) as executor:
    for s in executor.map(map_func, submissions):
        collection_front.update({"time_recorded":time_recorded}, {'$push':{"thread_list":s}}, upsert=True)

It works great/fast with threads but when I try to use processes I get a full queue and this error:

  File "/usr/local/lib/python3.4/dist-packages/praw/objects.py", line 82, in __getattr__
    if not self.has_fetched:
RuntimeError: maximum recursion depth exceeded
Exception in thread Thread-3:
Traceback (most recent call last):
  File "/usr/lib/python3.4/threading.py", line 920, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.4/threading.py", line 868, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.4/concurrent/futures/process.py", line 251, in _queue_management_worker
    shutdown_worker()
  File "/usr/lib/python3.4/concurrent/futures/process.py", line 209, in shutdown_worker
    call_queue.put_nowait(None)
  File "/usr/lib/python3.4/multiprocessing/queues.py", line 131, in put_nowait
    return self.put(obj, False)
  File "/usr/lib/python3.4/multiprocessing/queues.py", line 82, in put
    raise Full
queue.Full

Traceback (most recent call last):
  File "reddit_proceses.py", line 64, in <module>
    for s in executor.map(map_func, submissions):
  File "/usr/lib/python3.4/concurrent/futures/_base.py", line 549, in result_iterator
    yield future.result()
  File "/usr/lib/python3.4/concurrent/futures/_base.py", line 402, in result
    return self.__get_result()
  File "/usr/lib/python3.4/concurrent/futures/_base.py", line 354, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

Note that originally the processes worked great and very fast for small data retrievals, but now they're not working at all. Is this a bug or what's going on that the PRAW object would cause a recursion error with Processes but not with Threads?

1

There are 1 answers

3
dyeray On

I had a similar problem moving from threads to processes only I was using executor.submit. I think this might be the same problem you have, but I can't be sure because I don't know in what context your code is running.

In my case what happened was: I was running my code as a script, and I didn't use the always recommended if __name__ == "__main__":. It looks like when running a new process with the executor, python loads the .py file and runs the function specified in submit. Because it loads the file, the code that exists on the main file (not inside functions or the above if sentence) gets ran, so each process would run again a new process, having an infinite recursion.

It looks like this doesn't happen with threads.