Saturday, October 20, 2007

Separating Advanced Java Programmers from Competent Ones: “Stump the Chump” Interview Questions Part 2

Q: What is Jar sealing and when would you need it?

A: This question relates to how class loaders are implemented. (It’s my preferred stump the chump question and I’ve actually run into this situation before.) The background necessary for this question is as follows:

Class loaders are hierarchical. Most J2SE applications have at least three class loaders and J2EE apps generally have more. (The class loaders in a J2EE app form a tree which allows one Java VM to protect deployments from namespace collisions, etc.) Each class loader, except for the root loader, has a parent and the class loaders are usually supposed to ask the parent to supply a class before trying to do it themselves. Thus, when a class loader tries to load java.lang.String, it should try to get it from the parent rather than create its own. Only if the parent cannot create the class should the loader try. The root-level class loader loads all classes necessary for things like the security manager to perform its job. The second class-loader usually creates all Java classes that run inside the security manager, and the rest of the class loaders in the hierarchy are used for the code that isn’t part of the VM.

Classes are loaded by a class loader exactly once. In other words, classes are read from a jar file no more than once by a class loader instance. The problem comes when developers want the ability to reload a class, for example when an application is hot-deployed to a server. (I believe JUnit also likes to reload any user-created classes between tests.) In that case, the class loader instance is destroyed and a new one is created in its place. However, for this technique to work, the class loader must have the opposite behavior from the one described above, they must first attempt to create the class, and only ask the parent if they cannot do it themselves.

So far, so good. The problem comes when a class is loaded by two separate class loaders in a hierarchy. At this point version differences can cause all kinds of unpredictable behavior and often, the problem doesn’t even manifest itself near the actual offending code. Symptoms generally include things like null pointer exceptions in lines of code that make no sense. In many ways, it reminds me of incrementing a pointer too far in C++ except that it generally doesn’t core the VM.

In general, there are two ways that I’ve seen the problems manifest themselves:

  1. An application server’s class loader loads one version of a jar file (like log4j) and the developer deploys a different version with an application. If a method signature changes from one version to the next, then the developer can get errors about not having the correct number of parameters in the method call even though he/she’s done nothing wrong. The VM may also generate errors about calling methods that don’t exist (even when they do) or worse, the method call works fine until the class called by the developer calls another object that doesn’t exist in that version of the package.

  2. A developer is writing code and testing it periodically as he develops. At this point, it is almost certain that method signatures are changing and new methods are being added, renamed, or removed. When deploying this code to an application server, something goes wrong with the deployment and more than one version of the code lives on the server without the developer realizing it. For me this happened when I was using weblogic 7 and switched between hot deployment (where I dropped a new version of the ear file into a directory) and deploying via the web interface. The class loaders were apparently not peers in the class loader tree and both versions of my application were “partially” deployed.

The worst part of this problem is that it is not obvious and not intuitive what’s going on. The problem may not manifest any symptoms right away and when symptoms appear, they almost never seem related to the class loader in any way. Generally, I’ve realized the mistake after spending three days debugging code that hasn’t changed and always used to work. Somewhere around the point when I start to question my sanity and my abilities as a developer, I realize that this is the dreaded “hot deployment” issue. The problem is easily fixed by reinstalling the server instance. (Sometimes undeploying and redeploying an application isn’t sufficient and it’s not obvious where the server keeps all of its cached files.)

Enter the ounce of prevention. Sealing a jar involves placing a line inside the manifest file that lists the jar (or some subset of it like a particular package) as being sealed. (The <jar> ant task also has an option to seal the jar.) That line instructs the class loader hierarchy to only retrieve classes in that package from exactly one file. A jar sealing exception, java.lang.SecurityException: sealing violation is thrown if the class loader attempts to get its files from more that one jar. This prevents all of the headaches listed above and, generally, the class loader “does the right thing” on the edge cases. (for example, if one version of a jar is sealed but not another)

The only place that I’ve run into problems with jar sealing is when developers want to build test classes in the same package and directory as the code itself. Usually, they use ant tasks to create a deliverable jar and a test jar that only contains the test classes. Considering the pain that can be caused when class loaders go wrong, I would recommend placing test code in its own package (Sub-packages can live in a separate jar with no problems.) or building a “test” jar that contains both the base code and the test classes.

No comments: