Async pycurl requests processing for a python beginner

2.3k views Asked by At

I'm trying to combine the async functionality of program A

With the super simple string based logic enabled by program B

#pseudocode 
    label beginning
    sleep(10)
    if substring in someString:
        print "It's not happening!!!"
        goto beginning 

Snippet 2:

 #unique verification variable automatically gets generated every request 
 c.setopt(pycurl.HTTPHEADER, ['verification: ' + verification ])

Basically if the first time the request response html didn't return a specific string. A request with the same verification code has to be send after 10 seconds. This all has to happen asynchronously preferably in a way that doesn't touch the harddisk (only memory) so it can be executed with 1k> requests per second.

Pythons lack of goto in the name of some kind of purity fetish has made my head hurt in solving this problem.

The center of gravity seems to be around these functions: c.setopt(pycurl.WRITEDATA,) vs c.setopt(pycurl.WRITEFUNCTION,) m = pycurl.CurlMulti() m.handles.append(c)

Any suggestions on how to best solve this puzzle are welcome. What i'm looking for mostly is maybe a general of pseudocode/logic + some suggestions for functions i should look into, once i have the general blueprint i should be able to cobble it together myself.

1

There are 1 answers

0
alex.garcia On
from StringIO import StringIO

import pycurl

class CurlStream(object):
    """"""
    curl_count = 0
    curl_storage = []

    def __init__(self):
        self.curl_multi = pycurl.CurlMulti()

    def add_request(self, request, post_fields=None):
        self.curl_count += 1
        curl = self._create_curl(request, post_fields)
        self.curl_multi.add_handle(curl)

    def perform(self):
        while self.curl_count:
            while True:
                response, self.curl_count = self.curl_multi.perform()
                if response != pycurl.E_CALL_MULTI_PERFORM:
                    break
            self.curl_multi.select(1.0)

    def read_all(self):
        for response in self.curl_storage:
            print response.getvalue() # this does nothing --prints blank lines

    def close(self):
        self.curl_multi.close()

    def _create_curl(self, request, post_fields):
        curl = pycurl.Curl()
        curl.setopt(curl.URL, request)
        curl.setopt(curl.WRITEFUNCTION, self.write_out) # now passing own method
        curl.setopt(curl.TIMEOUT, 20)
        # Below is the important bit, I am now adding each curl object to a list
        self.curl_storage.append(curl)
        return curl

    def write_out(self, data):
        print 'Data len', len(data)
        print data
        return len(data)


def main():
    curl_stream = CurlStream()
    curl_stream.add_request('http://www.google.com')
    curl_stream.add_request('http://www.tomdickin.com')
    curl_stream.perform()
    curl_stream.read_all()
    curl_stream.close()

if __name__ == '__main__':
    main()

How can I get the response body from pycurl multi curl requests

the code from that answer seems decent and works only when i run it at the end after doing what it should do it says

Traceback (most recent call last):
  File "Untitled 2.py", line 55, in <module>
    main()
  File "Untitled 2.py", line 53, in main
    curl_stream.read_all()
  File "Untitled 2.py", line 28, in read_all
    print response.getvalue() # this does nothing --prints blank lines
AttributeError: getvalue