Sunday, May 31, 2009

Advanced Java "Stump the Chump" Interview Questions Part 3

A while back I wrote an entry listing some Java-based “stump the chump” questions. These are questions and I have encountered or used in an interview to separate the competent developers from the extremely competent ones. The idea is that the questions represent fairly obscure areas of development where the details can cost days or months of productivity but aren’t really deal breakers in terms of employability.

Recently I encountered a few more such questions, this time in the realm of concurrency and multi-threaded development so I thought I’d document them for future reference.

Question 1: Why will the following code not always terminate?
public class foo {
private boolean shouldStopFlag = false;

public void methodCalledByThreadA() {
while (!shouldStopFlag) {
//Do some work here.
}
}

public void methodCalledByThreadB() {
shouldStopFlag = true;
}
}
The answer to this problem is a fairly subtle one that deals with data visibility. As written, the JRE is allowed to put the shouldStopFlag in any place in memory including registers and cache that isn’t necessarily visible to all threads. I first read about this issue several years ago in the excellent book, Java Concurrency in Practice by Brian Goetz, Joshua Block, et. all. I even gave a speech at the Rocky Mountain Oracle Users’ Group (ROUG) Training Days where I warned others about the issue. However, I still actually had to make the mistake and lose a couple hours wondering why before I truly appreciated the fact that this isn’t really an isolated, “one in a million” kind of problem. (I was running a dual core machine and saw the problem about one time in a hundred calls. Fortunately, I was using test-driven development and was able to reproduce the symptoms every couple of runs.)

There are several solutions that can address this problem. The first is to use a synchronized block to guard the flag. (That may be overkill in this particular code sample but I believe that, in general, synchronized blocks have an unfair stigma due to performance problems that were addressed a long time ago.)

The second option is to use the volatile keyword. This instructs the JRE not to put the variable into areas that aren’t visible across threads. (The meaning of this keyword has changed slightly starting in Java 1.5 so be careful of older documents covering the topic.)

The third option is to make the variable into a java.util.concurrent.atomic.AtomicBoolean. This class makes the variable into a lock-free, thread-safe version of the boolean variable. AtomicBooean variables also have other standard, atomic methods such as compareAndSet and getAndSet. Finally, according to this source, the atomic package also takes advantage of underlying hardware to implement the atomic behavior.

In general, I’ve noticed that certain concurrency issues and tools such as race conditions, semaphores, and deadlocks are well understood by developers but visibility issues unique to Java are not as widely known or understood. (How many people know what the Java Memory Model is and why it changed as of Java 1.5?)

Question 2: Why are happens-before relationships important in the Java Memory Model?

First a note: Notice that I didn’t ask, “What is a happens-before relationship as it pertains to the Java Memory Model?” The reason for this is that the best descriptions that I’ve read of happens-before relationships take several hundred carefully-chosen words and are full of subtleties that I think even the experts would have trouble getting right without a cheat-sheet. The clearest explanation that I have seen on-line so far is here.

In any case, the reason that has-before relationships matter in multi-threaded applications is that under certain circumstances, the compiler and the JRE are allowed to execute commands out of order from what was actually written. The reordering is invisible most of the time. (In single threaded applications, it is invisible all of the time.) However, in multi-threaded applications, the reordering may be visible if the affected sections of code aren’t properly synchronized.

I feel lucky not to have personally run afoul of this issue (that I know of). Some of the just-in-time compiler’s optimizing capabilities are really cool but I’d hate to have to try to identify this issue in a live system.