multiprocess sub-function does not return any results

156 views Asked by At

I am trying to use concurrency function provided by deco module. The code is working without multiple threads as shown in the answer here:

Extract specific columns from a given webpage

But the following code does not return any element for finallist (it is empty). It returns some results within function scope of "slow" as evident from the print statement. But why does the outer list is empty?

import urllib.request
from bs4 import BeautifulSoup
from deco import concurrent, synchronized

finallist=list()
urllist=list()
    
@concurrent
def slow(url):
    #print (url)
    try:
        page = urllib.request.urlopen(url).read()
        soup = BeautifulSoup(page)
        mylist=list()
        for anchor in soup.find_all('div', {'class':'col-xs-8'})[:9]: 
            mylist.append(anchor.text)
            urllist.append(url)
        finallist.append(mylist)
        #print (mylist)
        print (finallist)
    except:
        pass

@synchronized
def run():
    finallist=list()
    urllist=list()
    for i in range(10):
        url='https://pythonexpress.in/workshop/'+str(i).zfill(3)
        print (url)
        slow(url)
    slow.wait()
1

There are 1 answers

0
Alex Sherman On BEST ANSWER

I refactored your code to work with the module. I fixed two of the common pitfalls outlined on the deco wiki:

  1. Don't use global variables
  2. Do everything with square bracket operations: obj[key] = value

Here's the result:

import urllib
from bs4 import BeautifulSoup
from deco import concurrent, synchronized

N = 10

@concurrent
def slow(url):
    try:
        page = urllib.urlopen(url).read()
        soup = BeautifulSoup(page, "html.parser")
        mylist=list()
        for anchor in soup.find_all('div', {'class':'col-xs-8'})[:9]: 
            mylist.append(anchor.text)
        return mylist
    except:
        pass

@synchronized
def run():
    finallist=[None] * N
    urllist = ['https://pythonexpress.in/workshop/'+str(i).zfill(3) for i in range(N)]
    for i, url in enumerate(urllist):
        print (url)
        finallist[i] = slow(url)
    return finallist

if __name__ == "__main__":
    finallist = run()
    print(finallist)