I have 3 node cluster with 2 clickhouse instance running on 2 hosts, with replicated merge tree engine. I am frequently getting connect timeout error on port 9009. I am assuming this is interserver communication port related timeout?
I did updated 'connect_timeout_with_failover_ms' to almost 5000, nothing happened. What can be the reason for it. This is comming every few minutes?? Any timeout I can update?
PS: I am almost writing 100000 rows per batch usually per 2-3 seconds.
<Error> DB_1.school: DB::StorageReplic
atedMergeTree::queueTask()::<lambda(DB::StorageReplicatedMergeTree::LogEntryPtr&
)>: Poco::Exception. Code: 1000, e.code() = 0, e.displayText() = Timeout: connec
t timed out: 172.*.*.*:9009, Stack trace (when copying this message, always inc
lude the lines below):
0. Poco::TimeoutException::TimeoutException(std::__1::basic_string<char, std::__
1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string
<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0
x12409d8f in /usr/bin/clickhouse
1. ? @ 0x1231e545 in /usr/bin/clickhouse
2. Poco::Net::HTTPSession::connect(Poco::Net::SocketAddress const&) @ 0x122e8385
in /usr/bin/clickhouse
3. Poco::Net::HTTPClientSession::reconnect() @ 0x122d5278 in /usr/bin/clickhouse
4. Poco::Net::HTTPClientSession::sendRequest(Poco::Net::HTTPRequest&) @ 0x122d65
d8 in /usr/bin/clickhouse
5. DB::detail::ReadWriteBufferFromHTTPBase<std::__1::shared_ptr<DB::UpdatablePoo
ledSession> >::call(Poco::URI, Poco::Net::HTTPResponse&) @ 0xf8611db in /usr/bin
/clickhouse
6. DB::detail::ReadWriteBufferFromHTTPBase<std::__1: ared_pt264,1:Updatable7%o
ledSession> >::ReadWriteBufferFromHTTPBase(std::__1::shared_ptr<DB::UpdatablePoo
9009 is controlled by another timeout parameter
http_connection_timeout
it's related to a bad network.
And it's not a big deal. Basically it just an annoying message.
Replica tried to connect to another replica and connect timed out in 1 sec. Then replica made another connection. That's all.