Datastax: Block not found error from DSEFS

820 views Asked by At

Spark streaming job running in DSE using DSEFS for check-pointing directory. I see this error in debug log file. How to resolve this error?

ERROR [dsefs-netty-worker-5] 2017-12-01 05:23:02,679 DSE-FS RestServerHandler.scala:126 - [id: 0x9964e082, /<>:58874 :> 0.0.0.0/0.0.0.0:5598] Streaming data to remote end failed.
java.io.IOException: Block not found a3859f30-aa23-11e7-80b9-4b8bdaf197cd
    at com.datastax.bdp.fs.server.blocks.BlockService$stateMachine$33$1.apply(BlockService.scala:706) ~[dsefs-server_2.10-5.0.19.jar:5.0.19]
    at com.datastax.bdp.fs.server.blocks.BlockService$stateMachine$33$1.apply(BlockService.scala:703) ~[dsefs-server_2.10-5.0.19.jar:5.0.19]
    at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) [scala-library-2.10.6.jar:na]
    at com.datastax.bdp.fs.exec.SameThreadExecutionContext$class.executeInSameThread(SameThreadExecutionContext.scala:24) ~[dsefs-common_2.10-5.0.19.jar:5.0.19]
    at com.datastax.bdp.fs.exec.SameThreadExecutionContext$class.execute(SameThreadExecutionContext.scala:33) ~[dsefs-common_2.10-5.0.19.jar:5.0.19]
    at com.datastax.bdp.fs.exec.SerialExecutionContextProvider$$anon$5$$anon$2.execute(SerialExecutionContextProvider.scala:24) ~[dsefs-common_2.10-5.0.19.jar:5.0.19]
    at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) [scala-library-2.10.6.jar:na]
    at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248) ~[scala-library-2.10.6.jar:na]
    at scala.concurrent.Promise$class.complete(Promise.scala:55) ~[scala-library-2.10.6.jar:na]
    at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:153) ~[scala-library-2.10.6.jar:na]
    at com.datastax.bdp.fs.server.blocks.BlockService$stateMachine$1$1.apply(BlockService.scala:60) ~[dsefs-server_2.10-5.0.19.jar:5.0.19]
    at com.datastax.bdp.fs.server.blocks.BlockService$stateMachine$1$1.apply(BlockService.scala:60) ~[dsefs-server_2.10-5.0.19.jar:5.0.19]
    at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) [scala-library-2.10.6.jar:na]
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358) [netty-all-4.0.34.Final.jar:4.0.34.Final]
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) [netty-all-4.0.34.Final.jar:4.0.34.Final]
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112) [netty-all-4.0.34.Final.jar:4.0.34.Final]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_112]
1

There are 1 answers

3
Piotr Kołaczkowski On

This error means DSEFS server failed to find metadata of the data block in the dsefs.blocks Cassandra table. The ids of the file blocks are stored in the dsefs.block_offsets table and they reference blocks stored in dsefs.blocks. If a row exists in dsefs.block_offsets and points to the block id that is absent in dsefs.blocks, you get this error when reading the file.

This error should not happen under normal circumstances and it means the filesystem metadata somehow got into inconsistent state. This may be a bug in the DSEFS implementation, a result of a data loss caused by setting up dsefs keyspace with insufficient replication factor or a result of a write operation that did not finish successfully and was applied only partially.

Please make sure you set dsefs keyspace RF to at least 3 and run nodetool repair to avoid accidental data loss or unavailability of some DSEFS metadata.

If this doesn't help, please contact me directly or through DataStax technical support and provide more details, including logs from the time before the error and more context on what the job was doing when the failure occurred.