Concurrent Random in Java SE 7

January 12, 2014 - 4 minutes read - 845 words

Introduction

Last year I wrote a series of posts (Part 1, Part 2, Part 3) on the use of the new Java fork/join framework for a Monte Carlo simulation.

First, an update. Going back through the code I discovered a bug in the way the triangle distribution was implemented. Fortunately this is a toy example, as it made the results inaccurate. My fault for not unit testing. I would still not suggest using this code for anything other than learning about fork/join.

Thread Local Random Numbers

To move on to more interesting things: I was reading through Oracle’s release notes on Java SE 7 and noticed that they include a new facility for concurrent random numbers. Since a Monte Carlo simulation generates millions of random numbers, I was very interested to see how its performance would be impacted.

The Javadoc for ThreadLocalRandom mentions contention when multiple threads use regular Math.random(). Regular Math.random() uses atomic variables to hold the current seed so that two threads calling simultaneously get pseudo-random numbers in sequence rather than the same number twice. In my case, I am using a separate instance of Random for each random variable in the simulation, but these instances are being shared across all runs on the simulation. As a result, there is a strong possibility of contention, so we should expect an improvement.

ThreadLocalRandom is not instantiated directly; instead, there is a static method current() that makes a new instance the first time it is called from a given thread. So by changing from Math.random() to ThreadLocalRandom we are changing two things about the program:

We no longer have the contention of accessing a single random number generator instance from multiple threads.
Instead of instantiating a random number generator instance per random variable, we instantiate one per thread.

`Random` versus `ThreadLocalRandom`

To set a baseline, here’s a fresh run using regular Math.random():

StopWatch 'Monte Carlo NPV': running time (millis) = 44637
-----------------------------------------
ms     %     Task name
-----------------------------------------
12202  027%  Sequential
02576  006%  DivideByTwo (children=2, min fork size=100)
02465  006%  DivideByTwo (children=2, min fork size=500)
02615  006%  DivideByTwo (children=2, min fork size=1000)
02515  006%  DivideByTwo (children=2, min fork size=2000)
02502  006%  DivideByP (children=8, min fork size=100)
02490  006%  DivideByP (children=8, min fork size=500)
02445  005%  DivideByP (children=8, min fork size=1000)
02450  005%  DivideByP (children=8, min fork size=2000)
02477  006%  Sqrt(n) (children=-1, min fork size=100)
02458  006%  Sqrt(n) (children=-1, min fork size=500)
02466  006%  Sqrt(n) (children=-1, min fork size=1000)
02468  006%  Sqrt(n) (children=-1, min fork size=2000)
02508  006%  Parfor (children=20000, min fork size=500)

As discussed in the previous posts, the move from sequential to parallel is much more important than the way that the task is divided up.

The change is very minor:

// double u = r.nextDouble();
double u = ThreadLocalRandom.current().nextDouble();

The resulting change is substantial:

StopWatch 'Monte Carlo NPV': running time (millis) = 34942
-----------------------------------------
ms     %     Task name
-----------------------------------------
11347  032%  Sequential
02004  006%  DivideByTwo (children=2, min fork size=100)
01831  005%  DivideByTwo (children=2, min fork size=500)
01838  005%  DivideByTwo (children=2, min fork size=1000)
01784  005%  DivideByTwo (children=2, min fork size=2000)
01781  005%  DivideByP (children=8, min fork size=100)
01782  005%  DivideByP (children=8, min fork size=500)
01772  005%  DivideByP (children=8, min fork size=1000)
01776  005%  DivideByP (children=8, min fork size=2000)
01781  005%  Sqrt(n) (children=-1, min fork size=100)
01788  005%  Sqrt(n) (children=-1, min fork size=500)
01805  005%  Sqrt(n) (children=-1, min fork size=1000)
01799  005%  Sqrt(n) (children=-1, min fork size=2000)
01854  005%  Parfor (children=20000, min fork size=500)

The improvement is around 25%, which is well worth having.

Observations

It is interesting that the sequential case shows less of an improvement than the parallel case. This tends to show that there was genuine contention between different threads accessing the same random number generator, and not just overhead from the use of atomic variables.

It is also interesting that there is improvement in the sequential case. This shows that not all the performance gain was created by eliminating contention. This tends to suggest that even in regular Java code, there is an advantage to using ThreadLocalRandom if many random numbers will be needed. This is similar to the difference between StringBuffer and StringBuilder; by shifting thread safety to instantiation rather than during use, it is possible to improve performance. It is possible that there are other cases in Java programming where synchronized code blocks are used that would be more performant with separate ThreadLocal instances.

In the numbers above, the “divide by two” case with 2 children and the smallest chunk size appears to be significantly worse than other options. Note that this method spawns the most tasks, but because of the way the fork/join framework operates, it does not mean that it spawns more threads than the others. Subsequent runs still showed some difference, but not as marked, so this may not be significant.

Conclusion

The Java fork/join framework is, in my opinion, a valuable contribution to Java SE 7. The ThreadLocalRandom class may be a considerably smaller and less complex addition, but it appears to have a strong case for its usage, possibly even in “regular” Java code.

Introduction

Thread Local Random Numbers

Random versus ThreadLocalRandom

Observations

Conclusion

`Random` versus `ThreadLocalRandom`