Can only use wrapper function a single time after definition then getting NameError

150 views Asked by At

Background

I'm using pdfquery to scrap data from pdfs. Like this one. This questions builds off my earlier question here.

I have successfully been able to use custom wrapper functions that can take arguments as seen in this answer. Except for the following which is giving me trouble when I try to run it multiple times in jupyter notebook;

Cell 1

import pdfquery

def load_file(PDF_FILE):
    pdf = pdfquery.PDFQuery(PDF_FILE)
    pdf.load()
    return pdf

file_with_table = 'path_to_the_file_mentioned_above.pdf'
pdf = load_file(file_with_table)

Cell 2

def in_range(prop, bounds):
    def wrapped(*args, **kwargs):
        n = float(this.get(prop, 0))
        return bounds[0] <= n <= bounds[1]
    return wrapped

def is_element(element_type):
    def wrapped(*args, **kwargs):
        return this.tag in element_type
    return wrapped

def str_len(condition):
    def wrapped(*args, **kwargs):
        cond = ''.join([str(len(this.text)),condition])
        return eval(cond)
    return wrapped

Cell 3

x_check = in_range('x0', (97, 160))
y_check = in_range('y0', (250, 450))
el_check = is_element(['LTTextLineHorizontal', 'LTTextBoxHorizontal'])
str_len = str_len('>0')

els = pdf.pq('LTPage[page_index="0"] *').filter(el_check)
els = els.filter(str_len)
els = els.filter(x_check)
els = els.filter(y_check)

[(i.text) for i in els]

The function, str_len, will work fine if it is run a single time after definition;

No error when running the third cell pictured

enter image description here

but throws a NameError when I try to run the function a second time;

NameError after running third cell a second time.

error after running cell 2nd time

Here is the text of the NameError

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-27-54cd329bb1e1> in <module>()
      2 y_check = in_range('y0', (250, 450))
      3 el_check = is_element(['LTTextLineHorizontal', 'LTTextBoxHorizontal'])
----> 4 str_len = str_len('>0')
      5 
      6 els = pdf.pq('LTPage[page_index="0"] *').filter(el_check)

<ipython-input-25-654bff7d0eed> in wrapped(*args, **kwargs)
     12 def str_len(condition):
     13     def wrapped(*args, **kwargs):
---> 14         return eval(''.join([str(len(this.text)),condition]))
     15     return wrapped

NameError: name 'this' is not defined 

Questions

Why can I only use this function once after it's definition?

Is there anyway that I can circumvent this problem?

1

There are 1 answers

0
chepner On BEST ANSWER

Function names are variables like any other; there isn't a separate namespace for functions. str_len = str_len('>0') rebinds the name str_len to the return value of the call to the original value of str_len. After this line, you no longer have a reference to the function. Use a different name for the computed length:

new_name = str_len('>0')