I get this error trying to scrape a website with mechanize.
This is my code:
agent = Mechanize.new
agent.user_agent_alias = 'Mac Safari'
agent.keep_alive = false
page = agent.get('https://web.archive.org/web/20170417084732/https://www.cs.auckland.ac.nz/~andwhay/postlist.html')
page.links_with(:text => 'post').each do |link|
post = link.click
Article.create(
user_id: 1,
title: post.css('title'),
text: post.at("//div[@itemprop = 'description']")
)
end
I also used this code to avoid the "Too Many Connection Resets" error.
The code from the linked blog post seems to be incompatible with v3.0.0 of the net-http-persistent gem. Note that Mechanize v2.7.6 (the current version as of this writing) is compatible with net-http-persistent >= v2.5.2, which includes v3.0.0.
The short answer is to do one of the following:
self.http.shutdown
on line 44 of the linked blog postThe long answer is that the net-http-persistent gem started using the connection_pool gem in v3.0.0, which changed the behavior of
Net::HTTP::Persistent#shutdown
(akaself.http.shutdown
inMechanize::HTTP::Agent
). The new behavior raises aConnectionPool::Error
("no connections are checked out") if a request is made aftershutdown
has been invoked.However, looking through the code of both net-http-persistent v2.9.4 and v3.0.0, it seems like
self.http.shutdown
may not have been necessary in the first place. The main purpose ofshutdown
seems to be invokingfinish
on each of the connections. In both v2.9.4 and v3.0.0, whenNet::HTTP::Persistent#request
rescues from anErrno::ECONNRESET
exception (the original cause of all this), it retries only once and then callsNet::HTTP::Persistent#request_failed
.request_failed
in turn callsNet::HTTP::Persistent#finish
with the connection. Thus, it seems the only necessary monkey patching is to retry more than once.