Saturday, March 30, 2013

Use ThreadLocal for shared objects that are not thread-safe

In my last post, Just date, no time: comparing java.util.Date with org.joda.time.DateMidnight, I outlined one reason for using Joda Time over java.util date/time classes (like java.util.Date and java.util.GregorianCalendar) was thread safety.

To be honest, thread safety with java.util.Date isn't likely to be a problem - because most of the time a date will be an instance field in a mutable data object rather than a shared utility object that multiple threads will use at the same time. What you should look out for are objects that are likely to be shared (like GregorianCalendar, DecimalFormat and SimpleDateFormat). Any time you think "I should really make this a static constant so that multiple methods can share it", check the Javadocs for that object to see if it is mutable. Mutable objects that are shared may cause race conditions if they are accessed by multiple threads at the same time. For example, the below objects are mutable and commonly used as shared utility objects in a single class or entire application.

  • GregorianCalendar: it has setter/mutator methods that change it's state.
  • DecimalFormat: Javadocs say if multiple threads access a format concurrently, it must be synchronized.
  • SimpleDateFormat: Javadocs say if multiple threads access a format concurrently, it must be synchronized externally.

The easiest way to make the above objects thread-safe is to use java.lang.ThreadLocal. This means that every thread may have one copy of the object. Below is an example of this.

/** Each thread may have once copy of DATE_FORMAT. */
public static final ThreadLocal<SimpleDateFormat> DATE_FORMAT =
      new ThreadLocal<SimpleDateFormat>() {
         @Override
         protected SimpleDateFormat initialValue() {
            return new SimpleDateFormat("yyyy/MM/dd HH:mm:s:S");
         }
      };

/** This method uses a thread local version of the formatter. */
public void outputDate() {
   Date date1 = new Date();
   System.out.println(DATE_FORMAT.get().format(date1));
}

Each thread that executes outputDate() will create its own copy of SimpleDateFormat. It solves synchronization issues because no two threads will ever access the same copy of the object at the same time. If your application has 20 threads, then you will have up to 20 copies of SimpleDateFormat: a thread that never executes outputDate() - or any method that uses DATE_FORMAT.get() - will never create its own SimpleDateFormat.

Use this technique for GregorianCalendar, DecimalFormat and SimpleDateFormat. It is safe for most applications because the cost of one of these objects per thread is not high. It is safe for web applications too. In web apps, you are not meant to create your own threads - and you are not doing that here: the web container is the one creating and managing threads, not you.

When not to use this technique:

  • Don't use this technique for objects that must be singletons within an application.
  • Don't use this technique where you must not have multiple threads accessing the same instance (this should be a singleton or have synchronization).
  • Don't use this technique for objects that are heavy enough to break an application if you have multiple copies in memory at the same time.

Other Ways To Solve Thread Safety Issues

There are many ways to solve issues that arise from sharing the same instance of an object. Below is a brief discussion of the most common techniques I know about.

  1. Avoid sharing altogether by keeping the object local. This means every time you need a Calendar, DecimalFormat or SimpleDateFormat etc, declare and initialise the object within each method that uses it.
    private void outputDate(Date date) {
       SimpleDateFormat formatter = new SimpleDateFormat("yyyy/MM/dd HH:mm:s:S");
       System.out.println("Date is [" + formatter.format(date) + "]");
    }
    No other method or thread will ever share the same instance so you will never have a synchronization problem. But you will have other problems.
    • DRY: Don't Repeat Yourself. If you have more than one method that needs to input or output dates in the same format, they should use the same formatter. If each method has their own formatter (using the same format string) subtle errors may arise when you need to change the format, but forget to make the change to each formatter.
    • Object creation overhead. Let's say your application has to call outputDate() a few million times. Your application will be slower because it is creating a SimpleDateFormat X million times when it should be creating it just once.
  2. Use a single shared object, but synchronize each use of it. For example, see below.
    synchronized (FORMATTER) {
       System.out.println("Date is [" + FORMATTER.format(date) + "]");
    }
    
    For a singleton that has state, you will most likely need to use synchronization. No other method or thread will ever get to use the same object at the same time. But you may have other problems.
    • Choke points. synchronization can make applications slower in two ways: the overhead of maintaining locks and the cost of having multiple threads waiting on the currently locked thread. The first isn't much of a problem, but the second may be. If the actions performed in the synchronized block take a long time, then all the other threads that are waiting on the synchronized resource must wait. Sometimes this is necessary, but it is still worth considering if there are other ways of solving the problem (can you have multiple copies of the synchronized resource?).
    • Deadlocks. This type of error can be very hard to debug. A deadlock arises where two or more threads are each waiting for the other to finish, and therefore neither thread can ever finish.
  3. Having multiple copies of the shared resource. If the problem allows you to have multiple copies of the same resource, this can help you avoid synchronization issues. There a couple of ways to do this.
    • Object pooling. This technique involves having a group (or pool) of shared objects. Whoever needs one of the objects takes it from a the pool and releases it when finished. Multi-threaded applications use thread pools. Databases uses connection pools. Web servers use HTTP connection pools. Apache Commons has a Pool Component. This is a good way to control the number of shared objects you wish to have in use, but will often need tuning to make sure that your application has enough objects in the pool to satisfy demand but not so many objects that they take up too much of your finite resources at once.
    • Thread local storage (the subject of this post) is similar to object pooling in that you have multiple objects. However, instead of having a common pool of objects that a limited number of threads can share at once, under thread local storage, every thread can potentially have it's own copy of the object. A pool may have 4 objects to be shared among 10 threads; with thread local storage, you will have up to 10 objects - once for each thread.

ThreadLocal vs synchronization

One more issue that needs discussion here is when to use ThreadLocal vs synchronization. Either technique will work for GregorianCalendar, DecimalFormat and SimpleDateFormat - but only the former (ThreadLocal) is appropriate. Below is one question you should ask yourself when determining which technique is most appropriate for your situation.

Is it necessary for multiple threads to access the same instance of the share object?

If you must have only one copy of an object, use synchronization. For example, consider a bank account in a banking application that has two transactions occurring at the same time. The account has $20 in it and two people are attempting to withdraw $15 at the same time: only one of these transactions can succeed. Logically speaking, there is only one instance of the $20 balance, so you can only have one copy of the account in your application at once. Use synchronization so that only one transaction occurs at a time; the second will fail because there are insufficient funds.