ClickHouse Server Exception: Code: 210.DB::Exception: Fail to read from HDFS:

31 views Asked by At

I'm trying to migrate data from hdfs to clickhouse from hdfs. Sometimes the showcase is assembled without problems, but most often it falls with such an error. I tried to wrap it in try except so that it would continue to take this SUV, but it doesn't work. The network is normal. That is, some days it works, some days it doesn't. Files are not broken

`2024-03-13, 01:00:49 UTC] {logging_mixin.py:137} INFO - ClickHouse Server Exception: Code: 210. DB::Exception: Fail to read from HDFS: hdfs://eevteev:@hadoop-amber, file path: /user/hive/warehouse/processing.db/outbox_partitioned/dt=2024-03-10/pid=119/type=all/outbox=-1/part-00000-221487c8-dac4-4cb5-bcc9-581d496593b4.c000.txt.gz. Error: HdfsIOException: InputStreamImpl: cannot read file: /user/hive/warehouse/processing.db/outbox_partitioned/dt=2024-03-10/pid=119/type=all/outbox=-1/part-00000-221487c8-dac4-4cb5-bcc9-581d496593b4.c000.txt.gz, from position 0, size: 1048576. Caused by: HdfsTimeoutException: Read 8 bytes timeout: While executing ParallelParsingBlockInputFormat: While executing HDFSSource. Stack trace:

  1. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000c7498f7 in /opt/bitnami/clickhouse/bin/clickhouse

  2. DB::Exception::Exception<String&, String&, String>(int, FormatStringHelperImplstd::type_identity<String&::type, std::type_identity<String&>::type, std::type_identity::type>, String&, String&, String&&) @ 0x0000000010d7112c in /opt/bitnami/clickhouse/bin/clickhouse

  3. DB::ReadBufferFromHDFS::ReadBufferFromHDFSImpl::nextImpl() @ 0x0000000010f1c774 in /opt/bitnami/clickhouse/bin/clickhouse

  4. DB::ReadBufferFromHDFS::nextImpl() @ 0x0000000010f1b6dd in /opt/bitnami/clickhouse/bin/clickhouse

  5. DB::ZlibInflatingReadBuffer::nextImpl() @ 0x000000000f519939 in /opt/bitnami/clickhouse/bin/clickhouse

  6. DB::segmentationEngine(DB::ReadBuffer&, DB::Memory<Allocator<false, false>>&, unsigned long, unsigned long) (.llvm.14850076424094712064) @ 0x00000000134c6b9b in /opt/bitnami/clickhouse/bin/clickhouse

  7. DB::ParallelParsingInputFormat::segmentatorThreadFunction(std::shared_ptr<DB::ThreadGroup>) @ 0x00000000134fe4e0 in /opt/bitnami/clickhouse/bin/clickhouse

  8. void std::__function::__policy_invoker<void ()>::__call_implstd::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<true::ThreadFromGlobalPoolImpl<void (DB::ParallelParsingInputFormat::)(std::shared_ptr<DB::ThreadGroup>), DB::ParallelParsingInputFormat, std::shared_ptr<DB::ThreadGroup>>(void (DB::ParallelParsingInputFormat::&&)(std::shared_ptr<DB::ThreadGroup>), DB::ParallelParsingInputFormat&&, std::shared_ptr<DB::ThreadGroup>&&)::'lambda'(), void ()>>(std::__function::__policy_storage const*) @ 0x0000000013502ad6 in /opt/bitnami/clickhouse/bin/clickhouse

  9. void* std::__thread_proxy[abi:v15000]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_deletestd::__thread_struct>, void ThreadPoolImplstd::thread::scheduleImpl(std::function<void ()>, Priority, std::optional, bool)::'lambda0'()>>(void*) @ 0x000000000c832d27 in /opt/bitnami/clickhouse/bin/clickhouse

  10. start_thread @ 0x0000000000007ea7 in /lib/x86_64-linux-gnu/libpthread-2.31.so

  11. ? @ 0x00000000000fba2f in /lib/x86_64-linux-gnu/libc-2.31.so`

    max_retries = 3 retry_count = 0 while retry_count < max_retries: for file_path in file_paths: try: result_final = client.execute(f"INSERT INTO main_gpb.yandex_gpb SELECT new_puid, sid, dt, pid FROM (SELECT CASE WHEN substring_index(line, '\\t', 1) IN (SELECT puid FROM test_gpb.gpb_cated_stream) THEN substring_index(line, '\\t', 1) ELSE TO_BASE64(substring_index(line, '\\t', 1)) END AS new_puid, arrayJoin(splitByChar(',', REGEXP_REPLACE(substring_index(line, '\\t', -1), '.*?(492708|492707|492706|492705|492704|492703|492702|492701|492700|492699|492698|492697).*', '\\\\0'))) as sid, toDate('{result_dt}') AS dt, '{pid}' AS pid FROM hdfs('hdfs://eevteev:@hadoop-amber{file_path}', 'LineAsString')) WHERE sid == '492708' or sid == '492707' or sid == '492706' or sid == '492705' or sid == '492704' or sid == '492703' or sid == '492702' or sid == '492701' or sid == '492700' or sid == '492699' or sid == '492698' or sid == '492697'") break except Exception as e: print(f"ClickHouse Server Exception: {e}") retry_count += 1 if retry_count == max_retries: print("Maximum retries reached. Exiting...") break print("Retry in 5 minutes...") time.sleep(300)

0

There are 0 answers