Reading text from a PDF works in Rails console but not in Rails application

1.3k views Asked by At

I have a simple one-page searchable PDF that is uploaded to a Rails 6 application model (Car) using Active Storage. I can extract the text from the PDF using the 'tempfile' and 'pdf-reader' gems in the Rails console:

> @car.creport.attached?
=> true
> f = Tempfile.new(['file', '.pdf'])
> f.binmode
> f.write(@car.creport.blob.download)
> r = PDF::Reader.new(f.path.to_s)
> r.pages[1].text
=> "Welcome to the ABC Car Report for January 16, 20...

But, if I try the same thing in the create method of my cars_controller.rb, it doesn't work:

# cars_controller.rb
...
  def create
    @car = Car.new(car_params)
    @car.filetext = ""
    f = Tempfile.new(['file', '.pdf'])
    f.binmode
    f.write(@car.creport.blob.download)
    r = PDF::Reader.new(f.path.to_s)
    @car.filetext = r.pages[1].text
    ...
  end

When I run the Rails application I can create a new Car and select a PDF file to attach. But when I click 'Submit' I get a FileNotFoundError in cars_controller.rb at the f.write() line.

My gut instinct is that the controller is trying to read the blob in order to write it to the temp file too soon (i.e., before the blob has even been written). I tried inserting a sleep(2) to give it time, but I get the same FileNotFoundError.

Any ideas?

Thank you!

2

There are 2 answers

2
max On BEST ANSWER

I don't get why you're jumping through so many hoops. And using .download without a block loads the entire file into memory (yikes). If @car.creport is an ActiveStorage attachment you can just use the open method instead:

@car.creport.blob.open do |file|
  file.binmode
  r = PDF::Reader.new(file) # just pass the IO object
  @car.filetext = r.pages[1].text
end if @car.creport

This steams the file to disk instead (as a tempfile).

If you're just taking file input via a plain old file input you will get a ActionDispatch::Http::UploadedFile in the parameters that also is extemely easy to open:

params[:file].open do |file|
  file.binmode
  r = PDF::Reader.new(file) # just pass the IO object
  @car.filetext = r.pages[1].text
end if params[:file].respond_to?(:open)
0
Mark On

The difference looks like it's with your @car variable.

In the console you have a blob attached (@car.creport.attached? => true). In your controller, you're initializing a new instance of the Car class, so unless you have some initialization going on that attaches something in the background, that will be nil.

Why that would return a 'file not found' error I'm not sure, but from what I can see that's the only difference between code samples. You're trying to write @car.creport.blob.download, which is present on @car in console, but nil in your controller.