I'm junior data scientist with 3 months experiences.
Now, I'm going to convert the pdf file to html file.
using several convert API, I achieved converting.
But, current html represent one word in each <span>.
I want to html to represent one paragraph in each <span> at least.
import pdfcrowd
import sys
try:
# create the API client instance
client = pdfcrowd.PdfToHtmlClient('demo', 'ce544b6ea52a5621fb9d55f8b542d14d')
# run the conversion and write the result to a file
client.convertFileToFile('./111.pdf', 'logo.html')
except pdfcrowd.Error as why:
sys.stderr.write('Pdfcrowd Error: {}\n'.format(why))
raise