How data is stored/accessed and preventing race conditions in maps, java

315 views Asked by At

We have a case like this.

class A{
 class foo{
    //Map with a lot of entries
    private HashMap<String,String> dataMap; 

    public updater(){
        // updates dataMap
        // takes several milliseconds
    }

    public someAction(){
        // needs to perform read on dataMap
        // several times, in a long process
        // which takes several milliseconds
    }
}

The issue is, both someAction and updater both can be called simultaneously, someAction is a more frequent method. If updater is called, it can replace a lot of values from dataMap. And we need consistency in readAction. If the method starts with old dataMap, then all reads should happen with old dataMap.

class foo{
    //Map with a lot of entries
    private HashMap<String,String> dataMap; 


    public updater(){
        var updateDataMap = clone(dataMap); // some way to clone data from map
        // updates updateDataMap instead of dataMap
        // takes several milliseconds
        this.dataMap = updateDataMap;       
    }

    public someAction(){
        var readDataMap = dataMap;
        // reads from readDataMap instead of dataMap
        // several times, in a long process
        // which takes several milliseconds
    }
}

Will this ensure consistency? I believe that the clone method will allocate a different area in memory and new references will happen from there. And are there going to be any performance impacts? And will the memory of oldDataMap be released after it has been used?

If this is the correct way, are there are any other efficient way to achieve the same?

2

There are 2 answers

6
tucuxi On BEST ANSWER

I believe your approach will work, because all changes from the updater() will occur to a (deep) copy, and will not be visible by someAction() until, in a single operation, the reference is updated.

I understand that you do not care about whether someAction() sees the latest version of the map's contents, as long as the map is consistent, that is, it is not observed while it is in the middle of being updated. In this case, there is no way for your someAction() to look at an incomplete map.

Beware that at most 1 thread should be able to call updater() - two threads calling it at the same time would mean that only one of them gets to write an updated map. I recommend the following changes:

// no synchronization needed at this level, but volatile is important
private volatile HashMap<String,String> dataMap = new HashMap<>;

// if two threads attempt to call this at once, one blocks until the other finishes
public synchronized updater() {
    var writeDataMap = clone(dataMap);  // a deep copy

    // update writeDataMap - guaranteed no other thread updating
    // ... long operation

    dataMap = writeDataMap;             // switch visible map with the updated one 
}

public someAction() {
    var readDataMap = dataMap;

    // process readDataMap - guaranteed not to change while being read
    // ... long operation
}

The important keyword here is volatile, to ensure that other threads have access to the updated map as soon as the updater() finishes its job. The use of synchronized simply prevents multiple updater() threads from interferring with each other, and is mostly defensive.

0
daniel On

If you want to avoid copies you could use a java.util.concurrent.locks.ReentrantReadWriteLock to protect access to your map. Use the write lock in the updater and the read lock in someAction.