We have two Raven servers running on Azure VMs which replicate to one another. Occasionally, we get replication conflicts. We have a conflict listener which automatically resolves conflicts in the situation where the contents of both versions of the document are identical. We also have scripts running on both machines which check for conflicts and report them.
We had an error occur in one of the following servers today with the following stack trace:
Raven.Client.Connection.ServerClient.<DirectQuery>b__77(String conflictedResultId):0
Raven.Client.Connection.ServerClient.AssertNonConflictedDocumentAndCheckIfNeedToReload(RavenJObject docResult, Func`2 onConflictedQueryResult):110
Raven.Client.Connection.ServerClient+<>c__DisplayClass94`1.<RetryOperationBecauseOfConflict>b__93(Boolean current, RavenJObject docResult):0
System.Linq.Enumerable.Aggregate[TSource,TAccumulate](IEnumerable`1 source, TAccumulate seed, Func`3 func):46
Raven.Client.Connection.ServerClient.RetryOperationBecauseOfConflict[T](IEnumerable`1 docResults, T currentResult, Func`1 nextTry, Func`2 onConflictedQueryResult):35
Raven.Client.Connection.ServerClient.DirectQuery(String index, IndexQuery query, OperationMetadata operationMetadata, String[] includes, Boolean metadataOnly, Boolean includeEntries):579
Raven.Client.Connection.ServerClient+<>c__DisplayClass62.<Query>b__61(OperationMetadata u):0
Raven.Client.Connection.ReplicationInformer.TryOperation[T](Func`2 operation, OperationMetadata operationMetadata, OperationMetadata primaryOperationMetadata, Boolean avoidThrowing, T& result, Boolean& wasTimeout):200
Raven.Client.Connection.ReplicationInformer.ExecuteWithReplication[T](String method, String primaryUrl, OperationCredentials primaryCredentials, Int32 currentRequest, Int32 currentReadStripingBase, Func`2 operation):193
Raven.Client.Connection.ServerClient.ExecuteWithReplication[T](String method, Func`2 operation):44
Raven.Client.Document.AbstractDocumentQuery`2.ExecuteActualQuery():65
Raven.Client.Document.AbstractDocumentQuery`2.get_QueryResult():6
Raven.Client.Linq.RavenQueryProviderProcessor`1.ExecuteQuery[TProjection]():123
Raven.Client.Linq.RavenQueryInspector`1.GetEnumerator():0
The exception occurred during a Lucene query against one of the collections. The stack trace suggests that a conflict was encountered. The call to AssertNonConflictedAndCheckIfNeedToReload
suggests that the conflict was resolved. There are no conflicts on the machine at present and the team have not manually resolved any, so I assume it was taken care of by the conflict listener.
Does anyone have any idea what may have caused the null reference exception?
Edit: here is the code for the conflict listener. It checks if the two versions are identical, except for a field called UtcTimestamp
. If they are, it resolves using the latest.
if (conflictedDocs.Length < 2)
{
// This really, really shouldn't happen
resolvedDocument = null;
return false;
}
string previousDocument = string.Empty;
int counter = 0;
foreach (var conflictedDoc in conflictedDocs)
{
// We don't want the comparison of document versions to look at a property called UtcTimestamp
conflictedDoc.DataAsJson.Remove("UtcTimestamp");
var alteredConflictedDocString = conflictedDoc.DataAsJson.ToString();
if (counter > 0)
{
if (previousDocument != alteredConflictedDocString)
{
resolvedDocument = null;
return false;
}
}
previousDocument = alteredConflictedDocString;
counter++;
}
resolvedDocument = conflictedDocs.OrderByDescending(x => x.LastModified).First();
ConflictHelper.RemoveBadMetadata(resolvedDocument);
return true;
ConflictHelper.RemoveBadMetadata() does this:
if (documentToResolveWith.Metadata.ContainsKey("@id"))
{
documentToResolveWith.Metadata.Remove("@id");
}
if (documentToResolveWith.Metadata.ContainsKey("@etag"))
{
documentToResolveWith.Metadata.Remove("@etag");
}
if (documentToResolveWith.Metadata.ContainsKey("Raven-Replication-Conflict-Document"))
{
documentToResolveWith.Metadata.Remove("Raven-Replication-Conflict-Document");
}