I get this error trying to scrape a website with mechanize.
This is my code:
agent = Mechanize.new
agent.user_agent_alias = 'Mac Safari'
agent.keep_alive = false
page = agent.get('https://web.archive.org/web/20170417084732/https://www.cs.auckland.ac.nz/~andwhay/postlist.html')
page.links_with(:text => 'post').each do |link|
post = link.click
Article.create(
user_id: 1,
title: post.css('title'),
text: post.at("//div[@itemprop = 'description']")
)
end
I also used this code to avoid the "Too Many Connection Resets" error.
The code from the linked blog post seems to be incompatible with v3.0.0 of the net-http-persistent gem. Note that Mechanize v2.7.6 (the current version as of this writing) is compatible with net-http-persistent >= v2.5.2, which includes v3.0.0.
The short answer is to do one of the following:
self.http.shutdownon line 44 of the linked blog postThe long answer is that the net-http-persistent gem started using the connection_pool gem in v3.0.0, which changed the behavior of
Net::HTTP::Persistent#shutdown(akaself.http.shutdowninMechanize::HTTP::Agent). The new behavior raises aConnectionPool::Error("no connections are checked out") if a request is made aftershutdownhas been invoked.However, looking through the code of both net-http-persistent v2.9.4 and v3.0.0, it seems like
self.http.shutdownmay not have been necessary in the first place. The main purpose ofshutdownseems to be invokingfinishon each of the connections. In both v2.9.4 and v3.0.0, whenNet::HTTP::Persistent#requestrescues from anErrno::ECONNRESETexception (the original cause of all this), it retries only once and then callsNet::HTTP::Persistent#request_failed.request_failedin turn callsNet::HTTP::Persistent#finishwith the connection. Thus, it seems the only necessary monkey patching is to retry more than once.