I'm building an application in Ruby 1.9.3-p327 that fetch-parse some pages(scrapping) and then according some values insert/update some columns into the database. In order to fetch-parse, the app use Mechanize gem, and the access to the database(MySQL) is through activerecord gem.
The weird problem that I had is that sometimes a Timeout::Error exception is raised randomness, sometimes never happens but maybe in two more days will happen, and with different type of records or pages. The log of the exception is:
/root/.rbenv/versions/1.9.3-p327/lib/ruby/1.9.1/net/protocol.rb:146:in `rescue in rbuf_fill': too many connection resets (due to Timeout::Error - Timeout::Error) after 0 requests on 21716860, last used 1378984537.2796552 seconds ago (Net::HTTP::Persistent::Error)
from /root/.rbenv/versions/1.9.3-p327/lib/ruby/1.9.1/net/protocol.rb:140:in `rbuf_fill'
from /root/.rbenv/versions/1.9.3-p327/lib/ruby/1.9.1/net/protocol.rb:122:in `readuntil'
from /root/.rbenv/versions/1.9.3-p327/lib/ruby/1.9.1/net/protocol.rb:132:in `readline'
from /root/.rbenv/versions/1.9.3-p327/lib/ruby/1.9.1/net/http.rb:2562:in `read_status_line'
from /root/.rbenv/versions/1.9.3-p327/lib/ruby/1.9.1/net/http.rb:2551:in `read_new'
from /root/.rbenv/versions/1.9.3-p327/lib/ruby/1.9.1/net/http.rb:1319:in `block in transport_request'
from /root/.rbenv/versions/1.9.3-p327/lib/ruby/1.9.1/net/http.rb:1316:in `catch'
from /root/.rbenv/versions/1.9.3-p327/lib/ruby/1.9.1/net/http.rb:1316:in `transport_request'
from /root/.rbenv/versions/1.9.3-p327/lib/ruby/1.9.1/net/http.rb:1293:in `request'
from /root/.rbenv/versions/1.9.3-p327/lib/ruby/gems/1.9.1/gems/net-http-persistent-2.9/lib/net/http/persistent.rb:986:in `request'
from /root/.rbenv/versions/1.9.3-p327/lib/ruby/gems/1.9.1/gems/mechanize-2.7.2/lib/mechanize/http/agent.rb:257:in `fetch'
from /root/.rbenv/versions/1.9.3-p327/lib/ruby/gems/1.9.1/gems/mechanize-2.7.2/lib/mechanize.rb:432:in `get'
from /root/notificador-corte/lib/downloader.rb:10:in `fetch'
from /root/notificador-corte/worker.rb:63:in `fetch_page'
from /root/notificador-corte/worker.rb:49:in `process_causa'
from /root/notificador-corte/worker.rb:41:in `block in worker_main_cycle'
from /root/.rbenv/versions/1.9.3-p327/lib/ruby/gems/1.9.1/gems/activerecord-4.0.0/lib/active_record/relation/delegation.rb:13:in `each'
from /root/.rbenv/versions/1.9.3-p327/lib/ruby/gems/1.9.1/gems/activerecord-4.0.0/lib/active_record/relation/delegation.rb:13:in `each'
from /root/notificador-corte/worker.rb:39:in `worker_main_cycle'
from /root/notificador-corte/worker.rb:26:in `run'
from /root/notificador-corte/app.rb:12:in `<main>'
The downloader.rb line 10 contains the definition of the method fetch:
def fetch(url)
begin
@agent.get(url) )
rescue Errno::ETIMEDOUT, Timeout::Error => exception
end
end
The worker.rb in line 63 contains the call to the fetch method.
Reading the documentation, said that I should be trying setting the read_timeout, open_timeout properties for the agent(Mechanize), and also try with idle_timeout, keep_alive, but the error still remains randomness.
The content of the Gemfile is:
gem 'activerecord', "~> 4.0.0"
gem 'mechanize', "~> 2.7.1"
gem 'mysql', '~> 2.9.1'
gem 'actionmailer', "~> 4.0.0"
gem 'rspec', "~> 2.14.1"
I don't think it necessarily is a bug in either your code or mechanize it self. Most likely it's a network issue.
I would rather implement a policy into that
rescue
statement, so that you make sure, that whenever this error occurs, you make sure to "retry" at a later point.