For a big project we use DynamoDbLocal in our unit tests. Most of the times, these tests pass. The code also works as expected on our production environments, where we use the "real" dynamodb that's part of the VPC.
However, sometimes the unit tests fail. Particularly when calling putItem() we sometimes get the following exception:
The request processing has failed because of an unknown error, exception or failure. (Service: DynamoDb, Status Code: 500, Request ID: db23be5e-ae96-417b-b268-5a1433c8c125, Extended Request ID: null)
software.amazon.awssdk.services.dynamodb.model.DynamoDbException: The request processing has failed because of an unknown error, exception or failure. (Service: DynamoDb, Status Code: 500, Request ID: db23be5e-ae96-417b-b268-5a1433c8c125, Extended Request ID: null)
at software.amazon.awssdk.services.dynamodb.model.DynamoDbException$BuilderImpl.build(DynamoDbException.java:95)
at software.amazon.awssdk.services.dynamodb.model.DynamoDbException$BuilderImpl.build(DynamoDbException.java:55)
at software.amazon.awssdk.protocols.json.internal.unmarshall.AwsJsonProtocolErrorUnmarshaller.unmarshall(AwsJsonProtocolErrorUnmarshaller.java:89)
at software.amazon.awssdk.protocols.json.internal.unmarshall.AwsJsonProtocolErrorUnmarshaller.handle(AwsJsonProtocolErrorUnmarshaller.java:63)
at software.amazon.awssdk.protocols.json.internal.unmarshall.AwsJsonProtocolErrorUnmarshaller.handle(AwsJsonProtocolErrorUnmarshaller.java:42)
at software.amazon.awssdk.core.http.MetricCollectingHttpResponseHandler.lambda$handle$0(MetricCollectingHttpResponseHandler.java:52)
at software.amazon.awssdk.core.internal.util.MetricUtils.measureDurationUnsafe(MetricUtils.java:64)
at software.amazon.awssdk.core.http.MetricCollectingHttpResponseHandler.handle(MetricCollectingHttpResponseHandler.java:52)
at software.amazon.awssdk.core.internal.http.async.AsyncResponseHandler.lambda$prepare$0(AsyncResponseHandler.java:89)
at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
at software.amazon.awssdk.core.internal.http.async.AsyncResponseHandler$BaosSubscriber.onComplete(AsyncResponseHandler.java:132)
at java.base/java.util.Optional.ifPresent(Optional.java:183)
at software.amazon.awssdk.http.crt.internal.AwsCrtResponseBodyPublisher.completeSubscriptionExactlyOnce(AwsCrtResponseBodyPublisher.java:216)
at software.amazon.awssdk.http.crt.internal.AwsCrtResponseBodyPublisher.publishToSubscribers(AwsCrtResponseBodyPublisher.java:281)
at software.amazon.awssdk.http.crt.internal.AwsCrtAsyncHttpStreamAdapter.onResponseComplete(AwsCrtAsyncHttpStreamAdapter.java:114)
at software.amazon.awssdk.crt.http.HttpStreamResponseHandlerNativeAdapter.onResponseComplete(HttpStreamResponseHandlerNativeAdapter.java:33)
Relevant versions of our tools and artifacts:
- Maven 3
- Kotlin version 1.5.21
- DynamoDbLocal version 1.16.0
- Amazon SDK 2.16.67
Our DynamoLocalDb is spun up inside our unit tests as follows:
val url: String by lazy {
System.setProperty("sqlite4java.library.path", "target/dynamo-native-libs")
System.setProperty("aws.accessKeyId", "test-access-key")
System.setProperty("aws.secretAccessKey", "test-secret-key")
System.setProperty("log4j2.configurationFile", "classpath:log4j2-config-for-dynamodb.xml")
val port = randomFreePort()
logger.info { "Creating local in-memory Dynamo server on port $port" }
val instance = ServerRunner.createServerFromCommandLineArgs(arrayOf("-inMemory", "-port", port.toString()))
try {
instance.safeStart()
} catch (e: Exception) {
instance.stop()
fail("Could not start Local Dynamo Server on port $port.", e)
}
Runtime.getRuntime().addShutdownHook(object : Thread() {
override fun run() {
logger.debug("Stopping Local Dynamo Server on port $port")
instance.stop()
}
})
"http://localhost:$port"
}
Our dynamo log4j configuration is:
<?xml version="1.0" encoding="UTF-8"?>
<Configuration status="WARN">
<Appenders>
<Console name="Console" target="SYSTEM_OUT">
<PatternLayout pattern="%d{HH:mm:ss.SSS} [%t] %-5level %logger{36} - %msg%n"/>
</Console>
</Appenders>
<Loggers>
<Logger name="com.amazonaws.services.dynamodbv2.local" level="DEBUG">
<AppenderRef ref="Console"/>
</Logger>
<Logger name="com.amazonaws.services.dynamodbv2.local.shared.access.sqlite.SQLiteDBAccess" level="WARN">
<AppenderRef ref="Console"/>
</Logger>
<Root level="DEBUG">
<AppenderRef ref="Console"/>
</Root>
</Loggers>
</Configuration>
Our client is created with:
val client: DynamoDbAsyncClientWrapper by lazy {
DynamoDbAsyncClientWrapper(
DynamoDbAsyncClient.builder()
.region(Region.EU_WEST_1)
.credentialsProvider(DefaultCredentialsProvider.builder().build())
.endpointOverride(URI.create(url))
.httpClientBuilder(AwsCrtAsyncHttpClient.builder())
.build()
)
}
The code for the Kotlin Dynamo Wrapper DSL we use in the code above is open sourced and worth a look.
After enabeling the debug logging in com.amazonaws.services.dynamodbv2.local as described above, we noticed the following in the logs:
09:14:22.917 [SQLiteQueue[]] DEBUG com.amazonaws.services.dynamodbv2.local.shared.access.sqlite.SQLiteDBAccessJob - SELECT ObjectJSON FROM "events-table" WHERE hashKey = ? AND rangeKey = ?;
09:14:22.919 [qtp1058328657-20] ERROR com.amazonaws.services.dynamodbv2.local.server.LocalDynamoDBServerHandler - Unexpected exception occured
com.amazonaws.services.dynamodbv2.local.shared.exceptions.LocalDBAccessException: [1] DB[1] prepare() INSERT OR REPLACE INTO "events-table" (rangeKey, hashKey, ObjectJSON, indexKey_1, indexKey_2, indexKey_6, rangeValue, hashRangeValue, hashValue,itemSize) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?,?); [table events-table has no column named indexKey_6]
at com.amazonaws.services.dynamodbv2.local.shared.access.sqlite.AmazonDynamoDBOfflineSQLiteJob.get(AmazonDynamoDBOfflineSQLiteJob.java:84) ~[DynamoDBLocal-1.16.0.jar:?]
at com.amazonaws.services.dynamodbv2.local.shared.access.sqlite.SQLiteDBAccess.putRecord(SQLiteDBAccess.java:1718) ~[DynamoDBLocal-1.16.0.jar:?]
at com.amazonaws.services.dynamodbv2.local.shared.access.api.dp.PutItemFunction.putItemNoCondition(PutItemFunction.java:183) ~[DynamoDBLocal-1.16.0.jar:?]
at com.amazonaws.services.dynamodbv2.local.shared.access.api.dp.PutItemFunction$1.criticalSection(PutItemFunction.java:83) ~[DynamoDBLocal-1.16.0.jar:?]
at com.amazonaws.services.dynamodbv2.local.shared.access.LocalDBAccess$WriteLockWithTimeout.execute(LocalDBAccess.java:361) ~[DynamoDBLocal-1.16.0.jar:?]
at com.amazonaws.services.dynamodbv2.local.shared.access.api.dp.PutItemFunction.apply(PutItemFunction.java:85) ~[DynamoDBLocal-1.16.0.jar:?]
at com.amazonaws.services.dynamodbv2.local.shared.access.api.dp.TransactWriteItemsFunction.doWrite(TransactWriteItemsFunction.java:353) ~[DynamoDBLocal-1.16.0.jar:?]
at com.amazonaws.services.dynamodbv2.local.shared.access.api.dp.TransactWriteItemsFunction.access$000(TransactWriteItemsFunction.java:60) ~[DynamoDBLocal-1.16.0.jar:?]
at com.amazonaws.services.dynamodbv2.local.shared.access.api.dp.TransactWriteItemsFunction$1.run(TransactWriteItemsFunction.java:109) ~[DynamoDBLocal-1.16.0.jar:?]
at com.amazonaws.services.dynamodbv2.local.shared.helpers.MultiTableLock$SingleTableLock$2.criticalSection(MultiTableLock.java:66) ~[DynamoDBLocal-1.16.0.jar:?]
at com.amazonaws.services.dynamodbv2.local.shared.access.LocalDBAccess$WriteLockWithTimeout.execute(LocalDBAccess.java:361) ~[DynamoDBLocal-1.16.0.jar:?]
at com.amazonaws.services.dynamodbv2.local.shared.helpers.MultiTableLock$SingleTableLock.run(MultiTableLock.java:68) ~[DynamoDBLocal-1.16.0.jar:?]
at com.amazonaws.services.dynamodbv2.local.shared.access.api.dp.TransactWriteItemsFunction.apply(TransactWriteItemsFunction.java:113) ~[DynamoDBLocal-1.16.0.jar:?]
at com.amazonaws.services.dynamodbv2.local.shared.access.awssdkv1.client.LocalAmazonDynamoDB.transactWriteItems(LocalAmazonDynamoDB.java:401) ~[DynamoDBLocal-1.16.0.jar:?]
at com.amazonaws.services.dynamodbv2.local.server.LocalDynamoDBRequestHandler.transactWriteItems(LocalDynamoDBRequestHandler.java:240) ~[DynamoDBLocal-1.16.0.jar:?]
at com.amazonaws.services.dynamodbv2.local.dispatchers.TransactWriteItemsDispatcher.enact(TransactWriteItemsDispatcher.java:16) ~[DynamoDBLocal-1.16.0.jar:?]
at com.amazonaws.services.dynamodbv2.local.dispatchers.TransactWriteItemsDispatcher.enact(TransactWriteItemsDispatcher.java:8) ~[DynamoDBLocal-1.16.0.jar:?]
at com.amazonaws.services.dynamodbv2.local.server.LocalDynamoDBServerHandler.packageDynamoDBResponse(LocalDynamoDBServerHandler.java:395) ~[DynamoDBLocal-1.16.0.jar:?]
at com.amazonaws.services.dynamodbv2.local.server.LocalDynamoDBServerHandler.handle(LocalDynamoDBServerHandler.java:482) ~[DynamoDBLocal-1.16.0.jar:?]
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[jetty-server-9.4.30.v20200611.jar:9.4.30.v20200611]
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235) ~[jetty-server-9.4.30.v20200611.jar:9.4.30.v20200611]
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1369) ~[jetty-server-9.4.30.v20200611.jar:9.4.30.v20200611]
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:190) ~[jetty-server-9.4.30.v20200611.jar:9.4.30.v20200611]
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1284) ~[jetty-server-9.4.30.v20200611.jar:9.4.30.v20200611]
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) ~[jetty-server-9.4.30.v20200611.jar:9.4.30.v20200611]
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234) ~[jetty-server-9.4.30.v20200611.jar:9.4.30.v20200611]
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[jetty-server-9.4.30.v20200611.jar:9.4.30.v20200611]
at org.eclipse.jetty.server.Server.handle(Server.java:501) ~[jetty-server-9.4.30.v20200611.jar:9.4.30.v20200611]
at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383) ~[jetty-server-9.4.30.v20200611.jar:9.4.30.v20200611]
at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:556) [jetty-server-9.4.30.v20200611.jar:9.4.30.v20200611]
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375) [jetty-server-9.4.30.v20200611.jar:9.4.30.v20200611]
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:272) [jetty-server-9.4.30.v20200611.jar:9.4.30.v20200611]
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) [jetty-io-9.4.30.v20200611.jar:9.4.30.v20200611]
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103) [jetty-io-9.4.30.v20200611.jar:9.4.30.v20200611]
at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) [jetty-io-9.4.30.v20200611.jar:9.4.30.v20200611]
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336) [jetty-util-9.4.30.v20200611.jar:9.4.30.v20200611]
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313) [jetty-util-9.4.30.v20200611.jar:9.4.30.v20200611]
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171) [jetty-util-9.4.30.v20200611.jar:9.4.30.v20200611]
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129) [jetty-util-9.4.30.v20200611.jar:9.4.30.v20200611]
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375) [jetty-util-9.4.30.v20200611.jar:9.4.30.v20200611]
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806) [jetty-util-9.4.30.v20200611.jar:9.4.30.v20200611]
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938) [jetty-util-9.4.30.v20200611.jar:9.4.30.v20200611]
at java.lang.Thread.run(Thread.java:829) [?:?]
This stacktrace hints at a problem in dynamically creating the underlying SQLLite table or querie(s), and the fact that it not always occurs feels like a bug in the form of a race condition or a failure to clean up memory or old objects between statements.
In this case, the generated SQL was:
INSERT OR REPLACE INTO "events-table"
(rangeKey, hashKey, ObjectJSON, indexKey_1, indexKey_2, indexKey_6, rangeValue, hashRangeValue, hashValue,itemSize)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?,?);
[table events-table has no column named indexKey_6]
We have tried several changes to our code, but we are running out of options. We are looking for a possible cause of this intermittent problem, or a way to reliably reproduce it.
On the Amazon forums we found this post which seems to hint at a similar problem, but is also unanswered/unsolved since August 2020. It would be great if we could also solve Andrew's problem. I also posted this question on re:Post but somehow AWS does not let me log back in with the userid/password I created there, and to create a ticket I have to log in. I guess I should have posted this on StackOverflow to begin with.
Edit: Looking at the decompiled com.amazonaws.services.dynamodbv2.local.shared.access.sqlite.SQLiteDBAccess code, it seems that there is an internal queue for firing queries at the database. Is it possible that meta information is fetched from the SQLite table to build an item in the queue, but meanwhile the data in the table is changed by a statement that is fired earlier than the statement that was put on the queue? I haven't been able to create this situation yet, but it almost feels like this is what is happening.
It turns out that this problem indeed has to do with the queue and the way the DynamoLocalDb works, but not in the way we thought. We use DynamoLocalDb in a Kotlin project where we use coroutines. In case of I/O work, we dispatch routines, like so:
By using a dispatcher, the code is executed on one of the threads in the I/O Thread Pool. If we happen to delete or change a table on a different IO thread, or the main thread, the SQLite statements sometimes are executed before the statements in the IO threads are done, and that in turn causes the errors we get.
We "solved" the issue by never throwing tables away in our unittests, but rather delete all items from a table like so:
In our code, the
DynamoTableclass is a simple data class holding the name, pk and sk of a table. Please note thatclient.scan()returns paginated results. Because we are deleting, the pagination is expected to break and since we don't really care about the pagination here, we just fire the request again until we get an empty first page back.I hope this helps other people struggling with similar problems.
Cheers!