Getting connect timeout on 9009 port of clickhouse

2.4k views Asked by At

I have 3 node cluster with 2 clickhouse instance running on 2 hosts, with replicated merge tree engine. I am frequently getting connect timeout error on port 9009. I am assuming this is interserver communication port related timeout?

I did updated 'connect_timeout_with_failover_ms' to almost 5000, nothing happened. What can be the reason for it. This is comming every few minutes?? Any timeout I can update?

PS: I am almost writing 100000 rows per batch usually per 2-3 seconds.

<Error> DB_1.school: DB::StorageReplic
atedMergeTree::queueTask()::<lambda(DB::StorageReplicatedMergeTree::LogEntryPtr&
)>: Poco::Exception. Code: 1000, e.code() = 0, e.displayText() = Timeout: connec
t timed out: 172.*.*.*:9009, Stack trace (when copying this message, always inc
lude the lines below):

0. Poco::TimeoutException::TimeoutException(std::__1::basic_string<char, std::__
1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string
<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0
x12409d8f in /usr/bin/clickhouse
1. ? @ 0x1231e545 in /usr/bin/clickhouse
2. Poco::Net::HTTPSession::connect(Poco::Net::SocketAddress const&) @ 0x122e8385
 in /usr/bin/clickhouse
3. Poco::Net::HTTPClientSession::reconnect() @ 0x122d5278 in /usr/bin/clickhouse
4. Poco::Net::HTTPClientSession::sendRequest(Poco::Net::HTTPRequest&) @ 0x122d65
d8 in /usr/bin/clickhouse
5. DB::detail::ReadWriteBufferFromHTTPBase<std::__1::shared_ptr<DB::UpdatablePoo
ledSession> >::call(Poco::URI, Poco::Net::HTTPResponse&) @ 0xf8611db in /usr/bin
/clickhouse
6. DB::detail::ReadWriteBufferFromHTTPBase<std::__1:   ared_pt264,1:Updatable7%o
ledSession> >::ReadWriteBufferFromHTTPBase(std::__1::shared_ptr<DB::UpdatablePoo
1

There are 1 answers

4
Denny Crane On

9009 is controlled by another timeout parameter http_connection_timeout

cat /etc/clickhouse-server/conf.d/user_substitutes.xml
<?xml version="1.0"?>
<yandex>
    <profiles>
        <default>
         <connect_timeout_with_failover_ms>1000</connect_timeout_with_failover_ms>
            <http_connection_timeout>15</http_connection_timeout>
        </default>
....

it's related to a bad network.

And it's not a big deal. Basically it just an annoying message.

Replica tried to connect to another replica and connect timed out in 1 sec. Then replica made another connection. That's all.