Threads are back in style these days. If you were like me, a Java engineer getting into Ruby couple years back, you probably would have seen your fair share of JVM bashing by a few people who relished beating on all things Java, while peddling their half-assed gems as software masterpieces. This is not to say that Java, especially it’s community, loves to over-engineer the crap out of everything they can lay their hands on, but (and this is a pretty big BUT) the JVM is a seminal piece of software. Okay, if you are still hanging around after this rant, I thank you for affording me this indulgence.
Having gotten that out of the way, for the next couple of posts I’ll work on exploring some of the concurrency packages Java ships with, along with discussing the Java concurrency model, all the while using JRuby to drive my code.Why JRuby?
As much as I hate the irrational bashing of another language, I’ve really grown to love working with Ruby. It’s just great for “putting your ideas down on paper”, with minimal ceremony. While for many, this ceremony may not feel like a big deal, it actually is. This talk on cognitive pyschology goes into further detail as to how our brain processes information, and how boiler plate code comes in the way of other engineers understanding your code. Using JRuby also has a unintended benefit of really forcing me to concentrate just on the concurrency and parallelism concepts, without wasting a lot of time of on all those Java'ish things such as final variables or private static methods, amongst others. It’s also going to be a lot more fun focussing only on the awesome things of the Java concurrency packages, while skipping all of the code bloat that is part and parcel of working with Java.Concurrency and Parallelism
The terms concurrency and parallelism get thrown around a lot and I for one didn’t really understand the difference until I bothered looking it up recently. Here’s a diagram that should help clear it up:
As you can see, concurrency doesn’t mean squat. In the concurrent graph, Thread A runs from time 0-10 and then the CPU scheduler schedules Thread B to run from time 10-20, after which it switches back to Thread A and so on and so forth. So the machine is not actually executing CPU instructions at the same time. In the parallel graph, the machine does execute instructions on both threads at the same time between time 10 and time 30. All this is assuming you have a multi-core CPU of course (if you don’t you probably don’t really want to be reading this or any other blog post on parallelizing work).Thread Safety
Now that we have understood that parallelism, is really what we are after, we can look at the various problems that arise when you have your code run in a multi-threaded parallel environment. These problems that crop up are all grouped under the broad umbrella called thread safety. So before we look at each of these problems let’s first try and get a sense of what it means to be thread safe.
The simplest way to define what it means to be thread safe is, the invariant of your code is preserved even while running in a multi-threaded environment. What this means, is that, whatever behavior your code said it would exhibit does not change whether it’s running in a single or multi-threaded environment. An example, will help make this clear:
class Incrementer def increment @val ||= 0 @val += 1 end end incrementer = Incrementer.new incrementer.increment
Now as you can see, the behavior this class is promising to uphold is an arithmetic series where every element the increment function returns is 1 greater than the last element it returned. This is the invariant of this code. Now, with a little unlucky timing, in a multi-threaded environment, this invariant can be broken and this function can skip numbers or return the same number it did in it’s previous call. The following line:
@val += 1
Is not actually an atomic operation, i.e. a computer operation that is exactly one instruction. Rather, it can be thought of as two discrete operations:
temp = @val + 1 - (1) @val = temp - (2)
With a little unlucky timing, line (1) could complete by Thread A and then Thread B could run and re-execute line (1) in which case it’s reading a stale value of @val. There is also the initialization of the variable, which is not Thread safe, but I’ll leave that for a later discussion. So how do we make this code thread safe?Synchronization and locks
One of the easiest ways we can make this code thread safe is by requiring that whenever a given Thread is scheduled to run by the CPU scheduler it needs to have a lock. If it does not, it should do nothing and go back to waiting. Java has a keyword called synchronized that let’s a Thread try to acquire a lock on an Object. The thread safe version of the incrementer with this in, looks like:
require 'java' class Incrementer < java.lang.Object def increment self.synchronized do @val ||= 0 @val += 1 end end end incrementer = Incrementer.new incrementer.increment
The call to self.synchronized tries to acquire a lock on self which in this would be the instance incrementer of the Incrementer class. Couple of things to note here:
- In order to be able to synchronize on an Object, that Object should have a *java.lang.Object* object somewhere it's ancestry tree - Ensure that all your threads synchronize on the same Object, or they'll be getting/releasing different locks
This code is actually a case of over-synchronization, and over-synchronization is bad in that, it is slow. To give you a taste of some of the cool, Java packages we can use, here’s an altered version:
def increment self.synchronized do @val ||= java.util.concurrent.atomic.AtomicInteger.new end @val.incrementAndGet end
Here we still need to guard a section of our code where we initialize our instance variable. But as soon as we have verified we have a valid reference, we release the lock and call the incrementAndGet method made available to us by Java’s AtomicInteger objects.Conclusion
Thread safety is when the behavior of your code does not change when it’s run in either a single or multi-threaded environment. Java’s synchronized keyword is the simplest form of Thread safety, that uses a concept called locking where, only a Thread that has acquired a lock is permitted to run. JRuby gets for free a lot of the Java concurrent packages and this is great news for engineers looking to build parallelizable code but are not yet ready to give up on ruby.Up next
Sharing data, thread visibility, volatile variables, thread safe initialization and immutable objects and how they rock!