Using pypdfium in python flask via built in flask server and via a waitress WSGI server

91 views Asked by At

Following function

def convert_pdf_to_img_extract_text(pdf_path: str, language='deu') -> str:   
    text = ''
    images = []
    pdf = pdfium.PdfDocument(pdf_path)
    n_pages = len(pdf)
    page_indices = [i for i in range(n_pages)]
    pytesseract.pytesseract.tesseract_cmd = r"C:\\Users\\myname\\AppData\\Local\\Programs\\Tesseract-OCR\\tesseract.exe"
    config = r"--oem 3 --psm 6 --tessdata-dir 'C:\\Users\\myname\\AppData\\Local\\Programs\\Tesseract-OCR\\tessdata\\'"

    renderer = pdf.render_to(
        pdfium.BitmapConv.pil_image,
        page_indices = page_indices,
        scale = 300/72,
    )

    for ele in renderer:
        images.append(ele)

    for image in images:
        img_bytes = io.BytesIO()
        image.save(img_bytes, format="PNG")
        img_bytes.seek(0)
        img = Image.open(img_bytes)
        text += pytesseract.image_to_string(img, lang="deu", config=config)

    return text

as part of a flask app can be executed successfully when serving the app with the built in server via flask run. As soon as i serve the same app with waitress

from waitress import serve 
import app 
serve(app.app, host="0.0.0.0", port="5000")

the server becomes silent after entering the function and showing following output, until i KeyboardInterrupt the execution.

2023-10-18 11:48:51 INFO Serving on http://0.0.0.0:5000
2023-10-18 11:48:58 INFO Webseite wurde aufgerufen von ipAdress 
2023-10-18 11:49:08 WARNING Cannot perform concurrent rendering with buffer input - reading the whole buffer into memory implicitly.
2023-10-18 11:49:12 INFO Serving on http://0.0.0.0:5000
2023-10-18 11:49:12 INFO Serving on http://0.0.0.0:5000
2023-10-18 11:49:12 INFO Serving on http://0.0.0.0:5000
2023-10-18 11:49:12 INFO Serving on http://0.0.0.0:5000
2023-10-18 11:49:12 INFO Serving on http://0.0.0.0:5000
2023-10-18 11:49:12 INFO Serving on http://0.0.0.0:5000
2023-10-18 11:49:12 INFO Serving on http://0.0.0.0:5000
2023-10-18 11:49:12 INFO Serving on http://0.0.0.0:5000

The WARNING is displayed regardless of the way the app is served.

Update: i added some logging to pypdfium2/helpers/document.py to get some more information. The render_to()-function from pypdfium executes until it spawns processes with the ProcessPoolExecutor. Then it outputs 2023-10-18 11:49:12 INFO Serving on http://0.0.0.0:5000 8 times as shown above.

After interrupting the execution i get this output:

KeyboardInterrupt
2023-10-18 11:31:21 ERROR Exception when servicing <waitress.channel.HTTPChannel connected thisIsAnIp at 0x1e4d81cf460>
concurrent.futures.process._RemoteTraceback:

Traceback (most recent call last):
File "C:\Users\myName\AppData\Local\Programs\Python\Python38\lib\concurrent\futures\process.py", line 239, in _process_worker
  r = call_item.fn(*call_item.args, **call_item.kwargs)
File "C:\Users\myName\AppData\Local\Programs\Python\Python38\lib\concurrent\futures\process.py", line 198, in _process_chunk
  return [fn(*args) for args in chunk]
File "C:\Users\myName\AppData\Local\Programs\Python\Python38\lib\concurrent\futures\process.py", line 198, in <listcomp>
  return [fn(*args) for args in chunk]
File "C:\Users\myName\Documents\Git\mbk\venv\lib\site-packages\pypdfium2\_helpers\document.py", line 525, in _process_page
  result = page.render_to(converter, **kwargs)
File "C:\Users\myName\Documents\Git\mbk\venv\lib\site-packages\pypdfium2\_helpers\page.py", line 370, in render_to
  args = (self.render_base(**renderer_kws), renderer_kws)
File "C:\Users\myName\Documents\Git\mbk\venv\lib\site-packages\pypdfium2\_helpers\page.py", line 567, in render_base
  pdfium.FPDF_RenderPageBitmap(*render_args)
KeyboardInterrupt

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "C:\Users\myName\Documents\Git\mbk\venv\lib\site-packages\waitress\task.py", line 84, in handler_thread
  task.service()
File "C:\Users\myName\Documents\Git\mbk\venv\lib\site-packages\waitress\channel.py", line 428, in service
  task.service()
File "C:\Users\myName\Documents\Git\mbk\venv\lib\site-packages\waitress\task.py", line 168, in service
  self.execute()
File "C:\Users\myName\Documents\Git\mbk\venv\lib\site-packages\waitress\task.py", line 434, in execute
  app_iter = self.channel.server.application(environ, start_response)
File "C:\Users\myName\Documents\Git\mbk\venv\lib\site-packages\flask\app.py", line 2548, in __call__
  return self.wsgi_app(environ, start_response)
File "C:\Users\myName\Documents\Git\mbk\venv\lib\site-packages\flask\app.py", line 2525, in wsgi_app
  response = self.full_dispatch_request()
File "C:\Users\myName\Documents\Git\mbk\venv\lib\site-packages\flask\app.py", line 1820, in full_dispatch_request
  rv = self.dispatch_request()
File "C:\Users\myName\Documents\Git\mbk\venv\lib\site-packages\flask\app.py", line 1796, in dispatch_request
  return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "C:\Users\myName\Documents\Git\mbk\app\views.py", line 318, in auswertung
  pdf = ocr.convert_pdf_to_img_extract_text(massnahmebogen_raw)
File "C:\Users\myName\Documents\Git\mbk\app\packages\helpers\pdf_ocr.py", line 54, in convert_pdf_to_img_extract_text
  for ele in renderer:
File "C:\Users\myName\Documents\Git\mbk\venv\lib\site-packages\pypdfium2\_helpers\document.py", line 594, in render_to
  for result, index in pool.map(invoke_renderer, page_indices):
File "C:\Users\myName\AppData\Local\Programs\Python\Python38\lib\concurrent\futures\process.py", line 484, in _chain_from_iterable_of_lists
  for element in iterable:
File "C:\Users\myName\AppData\Local\Programs\Python\Python38\lib\concurrent\futures\_base.py", line 611, in result_iterator
  yield fs.pop().result()
File "C:\Users\myName\AppData\Local\Programs\Python\Python38\lib\concurrent\futures\_base.py", line 439, in result
  return self.__get_result()
File "C:\Users\myName\AppData\Local\Programs\Python\Python38\lib\concurrent\futures\_base.py", line 388, in __get_result
  raise self._exception

What could cause this when serving with waitress?

0

There are 0 answers