PDF Reader Cucumber Ruby

445 views Asked by At

I've been asked to write some tests to confirm text is contained within a PDF file. I've come across the PDF reader gem which is all good at rendering text from the file except the output isn't too good. I have a piece of text for example, that should read Date of first registration of the product but PDF reader sees this as Date offirstregistrationoftheproduct. Thus when I run my assertion, it fails due to the spacing of the text.

My code:

expected_text = 'Date of first registration of the product'

file = File.open(my_pdf, "rb")
  PDF::Reader.open(file) do |reader|
    reader.pages.each do |page|
       expect(page).to have_text expected_text
    end

The result is an RSpec expectation not met error.

Is there a way I can get this text properly formatted so that my assertion can read it?

1

There are 1 answers

0
Marcello Mello On

The page object of Reader is not text. If you want to get text from a pdf, you may use page.text. Using a regex may solve your problem.

Try something like below.

expected_text = 'Date of first registration of the product'

file = File.open(my_pdf, "rb")
  PDF::Reader.open(file) do |reader|
    reader.pages.each do |page|
       expect(page.text.match(/#{expected_text}/)).to be true
    end