pdfplumber gives fp.seek(pos) AttributeError: 'dict' object has no attribute 'seek'

1.1k views Asked by At

So this is my code:

def main():    
    import combinedparser as cp
    from tkinter.filedialog import askopenfilenames

    files = askopenfilenames()
    print(files) #this gives the right files as a list of strings composed of path+filename


    def file_discriminator(func):
        def wrapper():
            results = []
            for item in files:
                if item.endswith('.pdf'):
                    print(item + 'is pdf')
                    func = f1(file = item)
                    results.append(item, Specimen_Output)
                else:
                    print(item + 'is text')
                    func = f2(file = item)
                    results.append(item, Specimen_Output)

        return wrapper


    @file_discriminator
    def parse_me(**functions):
        print(results)


    parse_me(f1 = cp.advparser(), f2 = cp.vikparser())

main()

where combinedparser.py has two functions:

def advparser(**file):
    import pdfplumber
    with pdfplumber.open(file) as pdf:  # opened fname and assigned it to the variable pdf
        page = pdf.pages[0]  # assigned index 0 of pages to the variable page
        text = page.extract_words()
    #followed by a series of python operations generating a dict named Specimen_Output
def vikparser(**file):
    with open(file, mode = 'r') as filename:
        Specimen_Output = {}
    #followed by a series of python operations generating a dict named Specimen_Output 

I have a directory with pdf and text files interspersed at random. I'm trying to use the decorator @file_discriminator to run the function advparser, which uses pdfplumber and subsequent processing to extract usable info from the pdf files, on the pdf files in the directory; and vikparser to perform regular text file processing on the text files. Each should generate a dictionary called Specimen_Output. I got the right results when advparser was a separate .py file run as advparser(file), imported askopenfilename instead of its plural, and called with advparser(file = askopenfilename()); likewise with vikparser (which is looking at textfiles with readlines). But when I try to do it from the main module and use a parent function to call them I can't get it to work. I've tried almost every possible permutation of where I call them as well as using positional vs keyword arguments for 'file'.

When I fix whatever bugs I create from changing things around, this is the most common error I get:

Traceback (most recent call last):


 File "<input>", line 1, in <module>
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/zachthomasadmin/PycharmProjects/pythonProject1/main.py", line 29, in <module>
    parse_me(f1 = cp.advparser(), f2 = cp.vikparser())
  File "/Users/zachthomasadmin/PycharmProjects/pythonProject1/combinedparser.py", line 12, in advparser
    with pdfplumber.open(file) as pdf:  # opened fname and assigned it to the variable pdf
  File "/Users/zachthomasadmin/PycharmProjects/pythonProject1/venv/lib/python3.8/site-packages/pdfplumber/pdf.py", line 48, in open
    return cls(path_or_fp, **kwargs)
  File "/Users/zachthomasadmin/PycharmProjects/pythonProject1/venv/lib/python3.8/site-packages/pdfplumber/pdf.py", line 25, in __init__
    self.doc = PDFDocument(PDFParser(stream), password=password)
  File "/Users/zachthomasadmin/PycharmProjects/pythonProject1/venv/lib/python3.8/site-packages/pdfminer/pdfparser.py", line 39, in __init__
    PSStackParser.__init__(self, fp)
  File "/Users/zachthomasadmin/PycharmProjects/pythonProject1/venv/lib/python3.8/site-packages/pdfminer/psparser.py", line 502, in __init__
    PSBaseParser.__init__(self, fp)
  File "/Users/zachthomasadmin/PycharmProjects/pythonProject1/venv/lib/python3.8/site-packages/pdfminer/psparser.py", line 172, in __init__
    self.seek(0)
  File "/Users/zachthomasadmin/PycharmProjects/pythonProject1/venv/lib/python3.8/site-packages/pdfminer/psparser.py", line 514, in seek
    PSBaseParser.seek(self, pos)
  File "/Users/zachthomasadmin/PycharmProjects/pythonProject1/venv/lib/python3.8/site-packages/pdfminer/psparser.py", line 202, in seek
    self.fp.seek(pos)
AttributeError: 'dict' object has no attribute 'seek'

What am I doing wrong? What dict object is it talking about, and why doesn't pdfplumber have this problem when I try to do each type individually calling from askopenfilename()? I'm a newbie coder and have torn my hair out at this all day. Thanks!

1

There are 1 answers

0
Sergey Shubin On

The problem is that your file argument in advparser and vikparser functions is actually a dictionary of named arguments because it is defined with two asterisks. So when you call these functions this way

func = f1(file = item)

your file argument in the advparser or vikparser functions is actually equal to {"file": "some_filename.pdf"}.

You need to either unpack your arguments:

def vikparser(**file):
    with open(file["file"], mode='r') as filename:
        pass

or just use single file argument in function definitions:

def vikparser(file):
    with open(file, mode='r') as filename:
        pass