j.u.c.ConcurrentHashMap - What is baseCount and sumCount?

53 views Asked by At

I am trying to understand the implementation of CHM in Java.

Two fields are used when resizing operations are taking place:

baseCount is described as :

/

**
     * Base counter value, used mainly when there is no contention,
     * but also as a fallback during table initialization
     * races. Updated via CAS.
     */

This seems like a cached value which can be read quickly for size operations, but then I see that sumCount() seems to be adding baseCount before computing it, which does not make much sense:

 final long sumCount() {
        CounterCell[] cs = counterCells;
        long sum = baseCount; // it takes baseCount and then adds to it ?
        if (cs != null) {
            for (CounterCell c : cs)
                if (c != null)
                    sum += c.value;
        }
        return sum;
    }

Are the number of CounterCells the same as the number of bin ?

Can anyone help understand the basics I need to understand this ?

I really could not find any docs regarding any of this, even in books by Doug Lea or Brian Goetz.

1

There are 1 answers

3
Pritam Banerjee On

BaseCount and CounterCells

The CHM uses a scheme to approximate the count of elements (size) in the map without locking the whole structure. The primary mechanism is through CounterCells and baseCount.

baseCount is the foundational count, used when there's no contention (multiple threads trying to update the count simultaneously).

CounterCells are a way to handle contention. Rather than continually contending over a single count value, CHM spreads contention across an array of counters. When a thread finds contention updating baseCount, it can move to update one of the counters in CounterCells array.

sumCount()

The sumCount() method's responsibility is to aggregate the count of all elements. It starts with the baseCount and then adds every counter in the CounterCells array to it.

The reason it uses both is that entries might have been added directly to the baseCount before contention was detected, and other entries might have been added to the CounterCells during contention.

Number of CounterCells vs Number of Bins

The number of CounterCells is not the same as the number of bins. They serve different purposes:

Bins in CHM are the actual data structures holding the key-value pairs. The number of bins tends to expand (resize) when the map gets more entries to keep the access time acceptable.

CounterCells are there just for counting purposes. Their number is more related to the number of concurrently updating threads than the number of entries in the map.

Documentation

The primary source of information on these internals is the source code itself. The authors, especially Doug Lea, provide extensive comments in the source code to explain the design decisions and mechanisms.

Books by Doug Lea or Brian Goetz, like "Java Concurrency in Practice", discuss high-level concurrent programming concepts and best practices but might not delve deep into the specifics of the ConcurrentHashMap internals.

If you really want to understand CHM and similar complex data structures, often the best approach is to combine reading the source code, referring to academic papers (if available), and running debug sessions or logging to see the behavior in real scenarios.