convert PDF to HTML(one paragraph in each <span>tag)

32 views Asked by At

I'm junior data scientist with 3 months experiences. Now, I'm going to convert the pdf file to html file. using several convert API, I achieved converting. But, current html represent one word in each <span>. I want to html to represent one paragraph in each <span> at least.

import pdfcrowd
import sys

try:
    # create the API client instance
    client = pdfcrowd.PdfToHtmlClient('demo', 'ce544b6ea52a5621fb9d55f8b542d14d')

    # run the conversion and write the result to a file
    client.convertFileToFile('./111.pdf', 'logo.html')
    
except pdfcrowd.Error as why:
    sys.stderr.write('Pdfcrowd Error: {}\n'.format(why))
    raise
0

There are 0 answers