False Sharing only became noticeable on certain machines

Question

False Sharing only became noticeable on certain machines

156 views Asked by njzhxf At 13 September 2013 at 10:23

I wrote the following test class in java to reproduce the performance penalty introduced by "False Sharing".

Basically you can tweak the "size" of array from 4 to a much larger value (e.g. 10000) to turn the "False Sharing phenomenon" either on or off. To be specific, when size = 4, different threads are more likely to update values within the same cache line, causing much more frequent cache misses. In theory, the test program should run much faster when size = 10000 than size = 4.

I ran the same test on two different machines multiple times:

Machine A: Lenovo X230 laptop w/ Intel® Core™ i5-3210M Processor (2 core, 4 threads) Windows 7 64bit

size = 4 => 5.5 second

size = 10000 => 5.4 second

Machine B: Dell OptiPlex 780 w/ Intel® Core™2 Duo Processor E8400 (2 core) Windows XP 32bit

size = 4 => 14.5 second

size = 10000 => 7.2 second

I ran the tests later on a few other machines and quite obviously False Sharing only becomes noticeable on certain machines and I couldn't figure out the decisive factor that makes such difference.

Can anyone kindly take a look at this problem and explain why false sharing introduced in this test class only became noticeable on certain machines?

public class FalseSharing {

interface Oper {
    int eval(int value);
}

//try tweak the size
static int size = 4;

//try tweak the op
static Oper op = new Oper() {
    @Override
    public int eval(int value) {
        return value + 2;
    }
};

static int[] array = new int[10000 + size];

static final int interval = (size / 4);

public static void main(String args[]) throws InterruptedException {

    long start = System.currentTimeMillis();
    Thread t1 = new Thread(new Runnable() {
        @Override
        public void run() {

            System.out.println("Array index:" + 5000);

            for (int j = 0; j < 30; j++) {
                for (int i = 0; i < 1000000000; i++) {
                    array[5000] = op.eval(array[5000]);
                }
            }
        }
    });
    Thread t2 = new Thread(new Runnable() {
        @Override
        public void run() {

            System.out.println("Array index:" + (5000 + interval));

            for (int j = 0; j < 30; j++) {
                for (int i = 0; i < 1000000000; i++) {
                    array[5000 + interval] = op.eval(array[5000 + interval]);
                }
            }
        }
    });
    Thread t3 = new Thread(new Runnable() {
        @Override
        public void run() {

            System.out.println("Array index:" + (5000 + interval * 2));

            for (int j = 0; j < 30; j++) {
                for (int i = 0; i < 1000000000; i++) {
                    array[5000 + interval * 2] = op.eval(array[5000 + interval * 2]);
                }
            }
        }
    });
    Thread t4 = new Thread(new Runnable() {
        @Override
        public void run() {

            System.out.println("Array index:" + (5000 + interval * 3));

            for (int j = 0; j < 30; j++) {
                for (int i = 0; i < 1000000000; i++) {
                    array[5000 + interval * 3] = op.eval(array[5000 + interval * 3]);
                }
            }
        }
    });
    t1.start();
    t2.start();
    t3.start();
    t4.start();
    t1.join();
    t2.join();
    t3.join();
    t4.join();
    System.out.println("Finished!" + (System.currentTimeMillis() - start));
}

}

Original Q&A

There are 2 answers

**user2023577** · Answer 1 · 2016-02-08T17:03:44+00:00

Your code is probably fine, Here is a simpler version with results:

import java.util.concurrent.CountDownLatch;
import java.util.concurrent.TimeUnit;


public class TestFalseSharing {
    static long T0 = System.currentTimeMillis();

    static void p(Object msg) {
        System.out.format("%09.3f %-10s %s%n", new Double(0.001*(System.currentTimeMillis()-T0)), Thread.currentThread().getName(), " : "+msg);
    }

    public static void main(String args[]) throws InterruptedException {
        int NT = Runtime.getRuntime().availableProcessors();
        p("Available processors: "+NT);

        int MAXSPAN = 0x1000; //4kB
        final byte[] array = new byte[NT*MAXSPAN];

        for(int i=1; i<=MAXSPAN; i<<=1) {
            testFalseSharing(NT, i, array);
        }
    }

    static void testFalseSharing(final int NT, final int span, final byte[] array) throws InterruptedException {
        final int L1 = 10;
        final int L2 = 10_000_000;

        final CountDownLatch cl = new CountDownLatch(NT*L1);

        long t0 = System.nanoTime();

        for(int i=0 ; i<4; i++) {
            final int startOffset = i*span;

            Thread t = new Thread(new Runnable() {
                @Override
                public void run() {
                    //p("Offset:" + startOffset);
                    for (int j = 0; j < L1; j++) {
                        for (int k = 0; k < L2; k++) {
                            array[startOffset] += 1;
                        }
                        cl.countDown();
                    }
                }
            });
            t.start();

        }

        while(!cl.await(10, TimeUnit.SECONDS)) {
            p(""+cl.getCount()+" left");
        }

        long d = System.nanoTime() - t0;
        p("Duration: " + 1e-9*d + " seconds, Span="+span+" bytes");
    }
}

Results:

00000.000 main        : Available processors: 4
00002.843 main        : Duration: 2.837645384 seconds, Span=1 bytes
00005.689 main        : Duration: 2.8454065760000002 seconds, Span=2 bytes
00008.659 main        : Duration: 2.9697156340000004 seconds, Span=4 bytes
00011.640 main        : Duration: 2.979306959 seconds, Span=8 bytes
00013.780 main        : Duration: 2.140246744 seconds, Span=16 bytes
00015.387 main        : Duration: 1.6061148440000002 seconds, Span=32 bytes
00016.729 main        : Duration: 1.34128957 seconds, Span=64 bytes
00017.944 main        : Duration: 1.215005455 seconds, Span=128 bytes
00019.208 main        : Duration: 1.263007368 seconds, Span=256 bytes
00020.477 main        : Duration: 1.269272208 seconds, Span=512 bytes
00021.719 main        : Duration: 1.241061631 seconds, Span=1024 bytes
00022.975 main        : Duration: 1.256024242 seconds, Span=2048 bytes
00024.171 main        : Duration: 1.195086858 seconds, Span=4096 bytes

So to answer, it confirms the 64 bytes cache line theory, at least on my laptop core i5.

**Peter Lawrey** · Answer 2 · 2013-09-13T13:55:20+00:00

False sharing only occurs with blocks of 64 bytes. You need to be accessing the same 64-byte block in all four threads. I suggest you create an object or an array with long[8] and update different cells of this array in all four threads and compare with the four threads accessing independent arrays.

TechQA.

False Sharing only became noticeable on certain machines

There are 2 answers

Related Questions in JAVA

Related Questions in MULTITHREADING

Related Questions in FALSE-SHARING

Popular Questions

Popular Tags

Trending Questions