UI blocking loops behaviours differ( Oreo vs Mashmallow)

84 views Asked by At

I have a small Android application which does a server call to post some User data to a server. Following is the code :

private boolean completed = false;
public String postData( Data data){

    new Thread(new Runnable() {
        @Override
        public void run() {

            try{

                String response = callApi(data);

                completed = true;



            }catch(Exception e){

                Log.e("API Error",e.getMessage());
                completed = true;
                return;
            }


        }
    }).start();

    while(!completed){

  //      Log.i("Inside loop","yes");
    }

    return response.toString();
}

The above method calls the API to post data and returns the response received which works fine. The loop at the bottom is a UI blocking loop which blocks the UI until a response is received or an error.

The problem :

I tried the same code for Marshmallow and Oreo device and the results were different.

For Marshmallow : Things moved in line with my expectation. :)

For Oreo (8.1.0) :

The very first API call works good enough after I open the App. However, the subsequent API calls after, cause the UI to block forever although an Error or Response is received from the Server(verified by logging and debugging).

However, on setting breakpoints(running in Debug mode) the App moves with much less trouble.

It seems the system is unable to exit the UI blocking loop although the condition is met.

The second behavior which was noticed is when I log a message in the UI blocking thread, the System is able to exit the loop and return from the Method though the API response is not logged.

Could someone help understand such inconsistency across these two flavors of Android and what could be the change introduced causing such a behavior for Oreo but not for Marshmallow? Any insight would be extremely helpful.

1

There are 1 answers

1
greeble31 On BEST ANSWER

It's more likely to be differences in the processor cache implementation in the two different hardware devices you're using. Probably not the JVM at all.

Memory consistency is a pretty complicated topic, I recommend checking out a tutorial like this for a more in-depth treatment. Also see this java memory model explainer for details on the guarantees that the JVM will provide, irrespective of your hardware.

I'll explain a hypothetical scenario in which the behavior you've observed could happen, without knowing the specific details of your chipset:

HYPOTHETICAL SCENARIO

Two threads: Your "UI thread" (let's say it's running on core 1), and the "background thread" (core 2). Your variable, completed, is assigned a single, fixed memory location at compile time (assume that we have dereferenced this, etc., and we've established what that location is). completed is represented by a single byte, initial value of "0".

The UI thread, on core 1, quickly reaches the busy-wait loop. The first time it tries to read completed, there is a "cache miss". Thus the request goes through the cache, and reads completed (along with the other 31 bytes in the cache line) out of main memory. Now that the cache line is in core 1's L1 cache, it reads the value, and it finds that it is "0". (Cores are not connected directly to main memory; they can only access it via their cache.) So the busy-wait continues; core 1 requests the same memory location, completed, again and again, but instead of a cache miss, L1 is now able to satisfy each request, and need no longer communicate with main memory.

Meanwhile, on core 2, the background thread is working to complete the API call. Eventually it finishes, and attempts to write a "1" to that same memory location, completed. Again, there is a cache miss, and the same sort of thing happens. Core 2 writes a "1" into appropriate location in its own L1 cache. But that cache line doesn't necessarily get written back to main memory yet. Even if it did, core 1 isn't referencing main memory anyway, so it wouldn't see the change. Core 2 then completes the thread, returns, and goes off to do work someplace else.

(By the time core 2 is assigned to a different process, its cache has probably been synchronized to main memory, and flushed. So, the "1" does make it back to main memory. Not that that makes any difference to core 1, which continues to run exclusively from its L1 cache.)

And things continue in this way, until something happens to suggest to core 1's cache that it is dirty, and it needs to refresh. As I mentioned in the comments, this could be a fence occurring as part of a System.out.println() call, debugger entry, etc. Naturally, if you had used a synchronized block, the compiler would've placed a fence in your own code.

TAKEAWAYS

...and that's why you always protect accesses to shared variables with a synchronized block! (So you don't have to spend days reading processor manuals, trying to understand the details of the memory model on the particular hardware you are using, just to share a byte of information between two threads.) A volatile keyword will also solve the problem, but see some of the links in the Jenkov article for scenarios in which this is insufficient.