Pat Shaughnessy is a Ruby developer working at a global management consulting firm during the day, but who also enjoys writing in his spare time. Pat lives in Boston, but is fluent in Spanish and spends a lot of time in Asturias, Spain with his wife and two kids. Pat also writes in-depth articles at patshaughnessy.net, some of which have been featured on the Ruby Weekly newsletter, the Ruby5 podcast and the Ruby Show. His twitter is @pat_shaughnessy Pat is a DZone MVB and is not an employee of DZone and has posted 11 posts at DZone. You can read more from them at their website. View Full User Profile

Journey to the Center of JRuby

02.07.2012
| 5512 views |
  • submit to reddit

This week I decided to take a look at JRuby, which is a Java implementation of the Ruby interpreter. I’ve heard great things about JRuby, but had never taken the time before to look at how it works internally. What I found was an amazingly complex system involving many different technologies. Following the path of execution through the JRuby stack was like peeling away different layers of an onion – there was always another layer inside.

Read on to join me on a journey through all of these layers; we’ll see how a simple Ruby String method call is handled by Java, how it’s translated into Java byte code, and, ultimately, how that is translated into native machine language by the JVM’s Just In Time (“JIT”) compiler. While you may not be convinced JRuby is the right platform for you – certainly just the hearing word “Java” is enough to send chills down my spine – I hope you’ll come away with a new appreciation of the elegance and sophistication of what the JRuby team has built.

Ruby

One of the nicest things about JRuby is its apparent simplicity: you use it more or less as a drop in replacement for MRI Ruby. That is, once you install JRuby, running a JRuby program works the same way as running a Ruby program. For example, here’s the trivial, one line Ruby program that I’ll look at today:

 
puts "Journey to the Center of JRuby".upcase

Since I installed JRuby using RVM on my Mac, switching to JRuby and running a Ruby program is as simple as:

$ rvm jruby-head
$ ruby upcase.rb 
JOURNEY TO THE CENTER OF JRUBY

I could also have used the “jruby” executable in the same way:

$ jruby upcase.rb
JOURNEY TO THE CENTER OF JRUBY

But don’t be deceived! Things are not as simple as they seem.

Java

Let’s get started by reviewing the basics of how JRuby works: JRuby is a Java program that interprets Ruby code. Here’s a simple diagram comparing the standard MRI setup with JRuby:

On the left, you can see MRI executing a Ruby program, while on the right JRuby is doing the same thing. The difference is that JRuby is actually a Java program, which means it requires a Java Virtual Machine in order to run. MRI, on the other hand, is written in C and is compiled into native machine language and can run directly as a standalone program on your computer.

Now let’s dive in a take a look at the Java code that implements the Ruby upcase method. Since I installed JRuby using RVM, the JRuby source code is located at this location on my machine:

$ cd ~/.rvm/src/jruby-head/src


And since JRuby uses the “org.jruby” Java package you can find most of the Java source code for JRuby under the “org/jruby” subfolder. Think of Java packages the same way you do Ruby namespaces.

$ cd org/jruby

Now let’s take a look at the JRuby implementation of the Ruby String class, located in the “RubyString.java” file in this folder:

package org.jruby;

... etc ...

/**
 * Implementation of Ruby String class
 * 
 * Concurrency: no synchronization is required among readers, but
 * all users must synchronize externally with writers.
 *
 */
@JRubyClass(name="String", include={"Enumerable", "Comparable"})
public class RubyString extends RubyObject implements EncodingCapable {

... etc ...

    private ByteList value;

... etc...

You can see here the RubyString.java file contains a Java class called RubyString which corresponds to the Ruby String class. In fact it looks like the JRuby team used the “@JRubyClass” annotation to indicate this. The one important detail I’ve shown here is that the RubyString class stores the actual string value in an instance variable called value, which is a ByteList Java object.

Farther down in the file we can see the definition of the upcase and upcase! methods:

@JRubyMethod(name = "upcase", compat = RUBY1_8)
public RubyString upcase(ThreadContext context) {
    RubyString str = strDup(context.getRuntime());
    str.upcase_bang(context);
    return str;
}

@JRubyMethod(name = "upcase!", compat = RUBY1_8)
public IRubyObject upcase_bang(ThreadContext context) {
    Ruby runtime = context.getRuntime();
    if (value.getRealSize() == 0) {
        modifyCheck();
        return runtime.getNil();
    }
    modify();
    return singleByteUpcase(runtime, value.getUnsafeBytes(),
                            value.getBegin(), value.getBegin() + value.getRealSize());
}

This Java code is executed when you use the String.upcase method in your Ruby script. This is the version used for Ruby 1.8; the Ruby 1.9 version is slightly more complicated since it handles multibyte characters. JRuby supports a Ruby 1.8 mode and a Ruby 1.9 mode, allowing you to pick the version of Ruby you need for your code. Reading this code and ignoring some of the details, we can see that upcase simply calls strDup and then upcase! on the new copy. Upcase! then returns nil if the size of the string is zero, and calls singleByteUpcase otherwise. On the last line you can see it passes an actual byte array (“byte[]”) into singleByteUpcase.

Just below in RubyString.java, you’ll find the Java singleByteUpcase method that actually performs the upcase operation:

private IRubyObject singleByteUpcase(Ruby runtime, byte[]bytes, int s, int end) {
    boolean modify = false;
    while (s < end) {
        int c = bytes[s] & 0xff;
        if (ASCII.isLower(c)) {
            bytes[s] = AsciiTables.ToUpperCaseTable[c];
            modify = true;
        }
        s++;
    }
    return modify ? this : runtime.getNil();
}

Again glossing over some of the details, you can see that JRuby is iterating over the bytes in the string and checking whether each one is lower case or not. If a byte is lower case, it converts it to upper case and sets the modify flag to true. Finally, it returns the same string (this) or else nil if the string was not modified.

That’s enough Java for one day! Let’s move on to see how this Java code is actually executed by your computer.

Java byte code

 

In the diagram above, I showed how the JRuby interpreter is actually a Java program executed by the Java Virtual Machine (“JVM”). However, as you may know, Java is not interpreted the same way that Ruby is. Instead Java is a compiled, statically typed language. This means that the JRuby Java code, like the RubyString.java file we saw above, is compiled into something called “Java byte code” ahead of time. The byte code is what the JVM interprets and executes. Here’s a diagram showing how this process works:

Java compiler

To get a sense of what Java byte code looks like, let’s actually open up the RubyString.class file and see what’s contained inside. Since RubyString.class is a binary file, we need to use a Java utility called “javap” to be able to read and inspect its contents:

$ cd .rvm/src/jruby-head/build/classes/jruby/org/jruby
$ javap -c -private RubyString.class > RubyString.txt

Here the “-c” option means I want to see the actual byte code, and the “-private” option means include the private methods as well, such as singleByteUpcase. Since the output is very large, I’ve redirected it into a file called RubyString.txt. Now let’s take a look inside RubyString.txt at how the singleByteUpcase method looks compiled into byte code:

Compiled from "RubyString.java"
public class org.jruby.RubyString extends org.jruby.RubyObject
             implements org.jruby.runtime.encoding.EncodingCapable {

... etc ...

  private org.jruby.util.ByteList value;

... etc ...

  private org.jruby.runtime.builtin.IRubyObject singleByteUpcase(org.jruby.Ruby, byte[], int, int);
    Code:
       0: iconst_0      
       1: istore        5
       3: iload_3       
       4: iload         4
       6: if_icmpge     47
       9: aload_2       
      10: iload_3       
      11: baload        
      12: sipush        255
      15: iand          
      16: istore        6
      18: getstatic     #287 // Field ASCII:Lorg/jcodings/specific/ASCIIEncoding;
      21: iload         6
      23: invokevirtual #288 // Method org/jcodings/specific/ASCIIEncoding.isLower:(I)Z
      26: ifeq          41
      29: aload_2       
      30: iload_3       
      31: getstatic     #269 // Field org/jcodings/ascii/AsciiTables.ToUpperCaseTable:[B
      34: iload         6
      36: baload        
      37: bastore       
      38: iconst_1      
      39: istore        5
      41: iinc          3, 1
      44: goto          3
      47: iload         5
      49: ifeq          56
      52: aload_0       
      53: goto          60
      56: aload_1       
      57: invokevirtual #200 // Method org/jruby/Ruby.getNil:()Lorg/jruby/runtime/builtin/IRubyObject;
      60: areturn       

Wow – that’s crazy! What in the world does it mean? Java byte code is actually a platform independent version of machine language. These cryptic commands, e.g. “istore” and ”baload,” correspond to native, machine language commands that your computer can execute directly. However, the basic idea behind Java is that the same Java byte code will run on any hardware architecture: your i386 Linux or Mac laptop, your Android phone, etc. etc. The JVM, built for whatever platform you are using, interprets these byte codes and executes your program.

I don’t pretend to understand any of the byte code commands above – although it is tempting to look up their meanings somewhere and try to trace this code back to the Java version of singleByteUpcase above. I don’t have time or space for that today, but it is easy to see how some of the commands work, for example:

23: invokevirtual #288 // Method org/jcodings/specific/ASCIIEncoding.isLower:(I)Z

… must be a virtual method call to the “isLower” method of the ASCIIEncoding class. And:

47: iload         5
49: ifeq          56
52: aload_0       
53: goto          60
56: aload_1       
57: invokevirtual #200 // Method org/jruby/Ruby.getNil:()Lorg/jruby/runtime/builtin/IRubyObject;
60: areturn       

… seems to correspond to the last line of Java code in singleByteUpcase:

return modify ? this : runtime.getNil();

However, I’m not sure. I feel like a 19th Century egyptologist trying to understand a newly discovered hieroglyphic passage for the first time.

Assembly Language

Now we’re reaching the deepest, darkest depths of the JRuby system. It turns out that the modern JVMs are sophisticated enough to perform another compilation step internally and convert portions of your Java byte code – the interpreter running your Ruby program – directly into native machine language. This means that running the JRuby interpreter uses native machine language just like using the standard MRI interpreter does, at least for some portions of the code. This is one of the reasons why running a Ruby program with JRuby can often be faster than running it with MRI Ruby!

This last compilation step, which is performed dynamically while your program is running, is called “Just In Time” (JIT) compilation. As the JVM runs your program, whether it’s Java, JRuby or any other program that uses the JVM platform, it monitors which code functions are executed most often. Identifying these as “hotspots,” the JIT compiler inside the JVM will dynamically compile these sections of Java byte code, like what we saw above, into native machine language.

Let’s see how the JIT compiler works for a Ruby program. But first we need to create a hotspot – a bit of code that is executed many times – in order to trigger the JIT compiler. To do that, I’ll just put the call to upcase into a loop, and remove the puts statement so I don’t get drowned in output:

10000.times do
  str = "Journey to the Center of JRuby".upcase
end

Now if you set the -J-XX:+PrintCompilation option while running JRuby, you’ll get lots of output showing which Java methods were compiled by the JIT compiler. I’ll redirect the output into a text file so I can search through it more easily:

$ ruby -J-XX:+PrintCompilation upcase.rb > compilation.log

 

 

 

Reading through the compilation.log file:

51    1       3       java.lang.String::equals (88 bytes)
53    2       3       java.lang.Object::<init> (1 bytes)
54    3       3       java.lang.Number::<init> (5 bytes)
54    4       3       java.lang.Integer::<init> (10 bytes)
54    5       1       java.lang.Object::<init> (1 bytes)

... etc ...

1026  821       4       java.util.ArrayList::get (11 bytes)
1026  903       1       org.jruby.util.ByteList::invalidate (11 bytes)
1026  407       3       org.jruby.util.ByteList::invalidate (11 bytes)   made not entrant
1026  899       3       org.jruby.RubyString::singleByteUpcase (61 bytes)
1026  204       3       java.util.ArrayList::get (11 bytes)   made not entrant
1027  756       4       org.jruby.util.ByteList::append (36 bytes)
1028  904       1       org.jruby.RubyString::getCodeRange (8 bytes)

At the top of the file we see that many of the basic, intrinsic Java methods like java.lang.String.equals, are compiled into machine language first. This is because the JRuby interpreter is actually quite complex and uses all of these methods often. Later in the file the org.jruby.RubyString::singleByteUpcase method we saw above is compiled since it is executed so many times in my loop.

To finish things off for today, let’s take a look at part of the assembly language that our call to upcase was converted to… it turns out that there’s another JVM option that will actually display the assembly language generated by the JIT compiler. But to use it I first needed to download and build a debug version of the OpenJDK JVM source code (following these instructions, and specifying “make debug_build” instead of “make”).

Here’s what I get running “ruby —version” and “java -version”

$ ruby --version
jruby 1.7.0.dev (ruby-1.9.3-p28) (2012-02-04 9aac4bd) (OpenJDK 64-Bit Server VM 1.7.0-internal-debug)
[darwin-amd64-java]

$ java -version 
openjdk version "1.7.0-internal-debug"
OpenJDK Runtime Environment (build 1.7.0-internal-debug-pat_2012_02_04_07_35-b00)
OpenJDK 64-Bit Server VM (build 23.0-b12-jvmg, mixed mode)

And here’s the command to view the assembly output from the JIT – I got this from Charles Nutter’s list of his favorite JVM options:

$ ruby -J-XX:+PrintOptoAssembly upcase.rb > assembly.log

Finally, here’s a small portion of the assembly.log file:

{method} 
 - klass: {other class}
 - this oop:          0x0000000132798980
 - method holder:     'org/jruby/RubyString'
 - constants:         0x0000000132780428 constant pool [3032] for 'org/jruby/RubyString' cache=...
 - access:            0xc1000002  private 
 - name:              'singleByteUpcase'
 - signature:         '(Lorg/jruby/Ruby;[BII)Lorg/jruby/runtime/builtin/IRubyObject;'
 - max stack:         4
 - max locals:        7

... etc ...

04c     testq   RBP, RBP    # ptr
04f     je     B44  P=0.001000 C=-1.000000
04f
055   B2: # B50 B3 <- B1  Freq: 0.999
055     movq    R8, RBP # spill
058     movq    R10, [RBP + #8 (8-bit)] # class
05c     movq    R11, precise klass org/jruby/RubyString: 0x00007fd0bc06e698:Constant:exact *    # ptr
066     cmpq    R10, R11    # ptr
069     jne,u  B50  P=0.000001 C=-1.000000
069
06f   B3: # B4 <- B2  Freq: 0.998999
06f     movq    RCX, RBP    # spill
072     # checkcastPP of RCX
072
072   B4: # B45 B5 <- B3 B44  Freq: 0.999999
072     movq    R9, [rsp + #0]  # spill
076     testq   R9, R9  # ptr
079     je     B45  P=0.001000 C=-1.000000
079
07f   B5: # B51 B6 <- B4  Freq: 0.998999
07f     movq    R10, [R9 + #8 (8-bit)]  # class
083     movq    R11, precise klass org/jruby/Ruby: 0x00007fd0bc06e898:Constant:exact *  # ptr
08d     cmpq    R10, R11    # ptr
090     jne,u  B51  P=0.000001 C=-1.000000

... etc...

And there you have it: the actual i386 assembly language code that’s executing our “Journey to the Center of JRuby”.upcase Ruby code! We’ve finally reached the “center of JRuby;” we can’t go any farther from here.

 Source: http://patshaughnessy.net/2012/2/7/journey-to-the-center-of-jruby

 

 

 

Published at DZone with permission of Pat Shaughnessy, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags:

Comments

Ivan Bosnic replied on Wed, 2012/02/08 - 5:06am

Very good article. Thanx.

Shaun Abram replied on Wed, 2012/02/08 - 10:23am

"Here’s a simple diagram comparing the standard MRI setup with JRuby"

The diagram seems to be missing?

Pat Shaughnessy replied on Thu, 2012/02/09 - 7:00am in response to: Shaun Abram

Hi Shaun - looks like some of the diagrams and artwork got lost in the translation to DZone... :(

Check out the original post on my site: http://patshaughnessy.net/2012/2/7/journey-to-the-center-of-jruby

Thanks for reading! 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.