python code not seeing a class variable iniialized in the __init__() function

368 views Asked by At

I am running into a weird error with spynner, though the question is a generic one. Spynner is the stateful web-browser module for python. It works fine when it works but I almost with every run I get a failure saying this --

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/spynner-2.16.dev0-py2.7.egg/spynner/browser.py", line 1651, in createRequest
    self.cookies,
AttributeError: 'Browser' object has no attribute 'cookies'
Segmentation fault (core dumped)

The problem here is its segfaulting and not letting me continue.

Looking at the code for spynner I see that the cookies variable is in fact initialized in the __init__() function for the Browser class like this:

self.cookies = []

Now on failure its really saying that the __init__() is not run since its not seeing the cookies variable. I do not understand how that can be possible. Without restricting to the spynner module can someone venture a guess as to how a python object could fail with an error like this?

EDIT: I definitely would have pasted my code here except its not all in one place for me to compactly show it. I should have done it earlier but here is the overall structure and how I instantiate and use spynner.

# helper class to get url data
class C:
   def __init__(self):
       self.browser = spynner.Browser()

   def get_data(self, url):
       try:
           self.browser.load(url)
           return self.browser.html
       except:
           raise

# class that does other stuff among saving url data to disk
class B:
    def save_url_to_disk(self, url):
        urlObj = C()
        html = urlObj.get_data(url)
        # do stuff with html


# class that drives everything
class A:
    def do_stuff_and_save_url_data(self, url):
       fileObj = B()
       fileObj.save_url_to_disk(url)

driver = A()
# call this function for multiple URLs.
driver.do_stuff_and_save_url_data(url)

The way I run it is ---

# xvfb-run python myfile.py

The segfault is probably something else I am doing. May be its because of the xvfb I am using and not handling properly? I don't know yet. I need to mention that I am relatively new to python.

I noticed that when I run the code above with say 'http://www.google.com' I get the segfault every other time.

1

There are 1 answers

0
eyquem On

The code block of do_stuff_and_save_url_data() doesn't use the reference self:
then the execution of this function doesn't depend on driver.

The code block of save_url_to_disk() also doesn't use the reference self:
then the execution of this second function doesn't depend on the object fileObj.

Only the code block of get_data() uses the reference self, and more precisely the reference self.browser:
so its execution and result depends on the attribute browser of the instance urlObj from class C. This attribute is in fact a browser instance named browser of the spynner.Browser class.

In the end, you "do stuff with html" with just the data outputed by spynner.Browser().html. And creation of driver and fileObj aren't mandatory in any way.

.

Another point is that
when the instruction driver.do_stuff_and_save_url_data(url) is executed,
the method driver.do_stuff_and_save_url_data(url) is first created, then executed, and finally "destroyed" (or more precisely forgot somewhere in the RAM) because it hasn't been assigned to any identifier.

Then the identifier fileObj, which is an identifier belonging to the local namespace of the function driver.do_stuff_and_save_url_data() , is lost too, which means the instance fileObj of class B is also lost for ulterior use since it has no more assigned identifier alive.

It's the same for save_url_to_disk():
after the creation and execution of the method fileObj.save_url_to_disk(url), the object urlObj of class C , which contains an instance of browser ( an object created by spynner.Browser() ), is lost: the created browser and all its data is lost.

I wonder if this isn't because of this destruction of the browser instance after each execution of do_stuff_and_save_url_data() and save_url_to_disk() that the cookies information wouldn't be destroyed before an ulterior call.

.

So, in my opinion, your code only embeds two functions in two definitions of classes A and B and they are used as being considered functions , not as methods.

1/ I don't think it is a good coding pattern. When one wants only plain functions, they must be written outside of any class.

2/ The problem is that if operations are triggered by functions, a new browser is created each time these functions are activated , even if they have the mantle of methods.

You will say me that you want these functions to act with data provided by the browser defined by spynny.Browser().
That's why I think that they must not be functions embeded in classes as now, but real methods attached to a stable instance of a browser. That's the aim of object to keep in the same namespace the data and the tools to treat the data.

-

.

All that said, I would personnally write:

class C(spynner.Browser):
   def __init__(self):
       spynner.Browser.__init__(self)

   def get_data(self, url):
       try:
           self.html = self.load(url).html
       except:
           raise

    # method that does other stuff among saving url data to disk
    def save_url_to_disk(self, url):
        get_data(url)
        # do stuff with self.html

    # method that drives everything
    def do_stuff_and_save_url_data(self, url):
        self.save_url_to_disk(url)


driver = C()
driver.do_stuff_and_save_url_data(url)

But I'm not sure to have well undesrtood all your considerations, and I warn that I didn't know spynner before reading your post. All that I've written could be stupid relatively to your real problem. Keep a critic eye on my post, please.