We have a project where we use pdf.js
to render a PDF into webpage and it creates HTML container elements for the PDF pages. The content of the PDF is split as HTML span
in the view.
Attached is the image which shows how pdf text is rendered in the view. It also shows, each span
has a data-key
does not corresponds to a line in PDF.
Now, I need a pdf reader for java which reads and breaks the content as span
with data-key
or just the span
in the order.
There are lot of java libraries available to read PDF content which gets the content line by line but that does not solve my issue. I need a java library which could break the content equivalent to span
in the view.