Performance Zone is brought to you in partnership with:

Peter is a DZone MVB and is not an employee of DZone and has posted 161 posts at DZone. You can read more from them at their website. View Full User Profile

C++ or Java: Which is Faster for High Frequency Trading?

10.22.2012
| 8722 views |
  • submit to reddit

Overview

There are conflicting views as to what is the best solution for high frequency trading. Part of the problem is that what is high frequency trading varies more than you might expect, another part is what is meant by faster.

My View

If you have a typical Java programmer and typical C++ programmer, each with a few years experience writing a typical Object Oriented Program, and you give them the same amount of time, the Java programmer is likely to have a working program earlier and will have more time to tweak the application. In this situation it is likely the Java application will be faster. IMHO.

In my experience, Java performs better at C++ at detecting code which doesn't need to be done. esp micro-benchmarks which don't do anything useful. ;) If you tune Java and C++ as far as they can go given any amount of expertise and time, the C++ program will be faster. However, given limited resources and in changing environment a dynamic language will out perform. i.e. in real world applications.

In the equities space latency you need latencies sub-10 us to be seriously high frequency. Java and even standard OOP C++ on commodity hardware is not an option. You need C or a cut down version of C++ and specialist hardware like FPGAs, GPUs.

In FX, high frequency means a latencies of sub-100 us. In this space C++ or a cut down Java (low GC) with kernel bypass network adapter is an option. In this space, using one language or another will have pluses and minuses. Personally, I think Java gives more flexibility as the exchanges are constantly changing, assuming you believe you can use IT for competitive advantage.

In many cases, when people talk about high frequency, esp Banks, they are talking sub 1 ms or single digit ms. In this space, I would say the flexibility/dynamic programming of Java, Scala or C# etc would give you time to market, maintainability and reliability advantages over C/C++ or FPGA.

The problem Java faces

The problem is not in the language as such, but a lack of control over caches, context switches and interrupts. If you copy a block of memory, something which occurs in native memory, but using a different delay between runs, that copy gets slower depending on what has happened between copies.

The problem is not GC, or Java as neither of these play much of a part. The problem is that part of the cache has been swapped out and the copy itself takes longer. This is the same for any operation which accesses memory. e.g. accessing plain objects will also be slower.

private void doTest(Pauser delay) throws InterruptedException {
    int[] times = new int[1000 * 1000];
    byte[] bytes = new byte[32* 1024];
    byte[] bytes2 = new byte[32 * 1024];
    long end = System.nanoTime() + (long) 5e9;
    int i;
    for (i = 0; i < times.length; i++) {
        long start = System.nanoTime();
        System.arraycopy(bytes, 0, bytes2, 0, bytes.length);
        long time = System.nanoTime() - start;
        times[i] = (int) time;
        delay.pause();
        if (start > end) break;
    }
    Arrays.sort(times, 0, i);
    System.out.printf(delay + ": Copy memory latency 1/50/99%%tile %.1f/%.1f/%.1f us%n",
            times[i / 100] / 1e3,
            times[i / 2] / 1e3,
            times[i - i / 100 - 1] / 1e3
    );
}

The test does the same thing many times, with different delays between performing that test. The test spends most of its time in native methods and no objects are created or discarded as during the test.

YIELD: Copy memory latency 1/50/99%tile 1.6/1.6/2.3 us
NO_WAIT: Copy memory latency 1/50/99%tile 1.6/1.6/1.6 us
BUSY_WAIT_10: Copy memory latency 1/50/99%tile 2.8/3.5/4.4 us
BUSY_WAIT_3: Copy memory latency 1/50/99%tile 2.7/3.0/4.0 us
BUSY_WAIT_1: Copy memory latency 1/50/99%tile 1.6/1.6/2.5 us
SLEEP_10: Copy memory latency 1/50/99%tile 2.2/3.4/5.1 us
SLEEP_3: Copy memory latency 1/50/99%tile 2.2/3.4/4.4 us
SLEEP_1: Copy memory latency 1/50/99%tile 1.8/3.4/4.2 us

With -XX:+UseLargePages with Java 7

YIELD: Copy memory latency 1/50/99%tile 1.6/1.6/2.7 us
NO_WAIT: Copy memory latency 1/50/99%tile 1.6/1.6/1.8 us
BUSY_WAIT_10: Copy memory latency 1/50/99%tile 2.7/3.6/6.6 us
BUSY_WAIT_3: Copy memory latency 1/50/99%tile 2.7/2.8/5.0 us
BUSY_WAIT_1: Copy memory latency 1/50/99%tile 1.7/1.8/2.6 us
SLEEP_10: Copy memory latency 1/50/99%tile 2.4/4.0/5.2 us
SLEEP_3: Copy memory latency 1/50/99%tile 2.3/3.9/4.8 us
SLEEP_1: Copy memory latency 1/50/99%tile 2.1/3.3/3.7 us

The best of three runs was used.

The typical time (the middle value) it takes to perform the memory copy varies between 1.6 and 4.6 us depending on whether there was a busy wait or sleep for 1 to 10 ms. This is a ratio of about 3x which has nothing to do with Java, but something it has no real control over. Even the best times vary by about 2x.

The code

ThreadlatencyTest.java

Conclusion

In ultra-high frequency, the core engine will be more C, assembly and custom hardware than OOP C++ or Java. In markets where the latency requirements of the engine are less tight C++ and Low GC Java become an option. As latency requirement become less tight, Java and other dynamic languages can be more productive. In this situation, Java is faster to market so you can take advantages of changes in the market/requirements.

 

 

 

Published at DZone with permission of Peter Lawrey, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)