Class Experiment11_PrimeThreads
Experiment 3 - Prime Numbers
Calculating Prime Numbers is always fun, and it's most fun when we try to compute prime numbers as fast as possible, or as many at once as we can. As we will see, Project Loom offers us no benefits in this use case, but it's interesting to experiment and see why.
As I have mentioned before, with Concurrent Programming, higher levels of abstraction are generally better
and safer because they have been designed and implemented by experts, and it's better to stand on the
shoulders of giants. Generally, for CPU Bound use cases, the Stream
Interface
is a good place to start as it has been specifically design for such use cases.
Streams
The Streams Architecture has three important phases
- Source
- Something that generates a stream of objects, such as a range of numbers, in our case, numbers that could be prime
- Pipeline
- A sequence of Functions on each object in the stream, in our case, testing if a number is prime,
which leverages the
Stream.filter(Predicate)
function. - Sink
- Ultimately, we need somewhere to Collect the Stream of objects. This is also known as *Terminating* the Stream.
Flow
is more akin to
Akka Streams, and in Akka Streams, they actually use the term Flow. Hopefully later we
can see the effects of running Java Flows with Project Loom...
There are only two hard things in Computer Science: cache invalidation and naming things.— Phil Karlton
Threads
For the rest of the experiments, we can conclude that that attempts to use Threads to do better than basic
BaseStream.parallel()
generally fail. We can also see that the code is substantially more
complicated too...
However, where threads really shine is when we are dealing with concurrency where there are a lot of blocking operations, such as transaction processing, network access, etc. Every blocking operation is an opportunity for some other task to run and progress.
For these experiments we warp our primes experiments into a pseudo networking application, where we simulate
farming isPrime() out to HTTP Endpoints. This simulation basically wraps the
Primes.isPrime(long, long, long)
calculation with network latency via
Thread.sleep(long)
, one to simulate Request Latency, another to simulate Response
Latency.
The spirit of the experiments here is that we might have some application that is computationally expensive and
takes a long time. While BaseStream.parallel()
works just fine on limited scale, we might imagine a magic
cloud that has some /isPrime endpoints where the work can be done faster, more efficiently, etc. Indeed,
this is a classic
MapReduce
model. To be sure, if we really wanted to generate prime numbers in a
High-Performance Computing
style, we might use something like Apache Spark,
but here we are trying to keep things simple, and make a point, not a perfect application.
Benchmarking
When trying to compare the performance of different Concurrent and Parallel approaches, it's important to conduct benchmarks, and one tool to use is the Java Microbenchmark Harness (JMH). However, it can take quite a long time for JMH to run benchmarks. By default, JMH runs 25 warmups, and 25 benchmarks, then averages the results. For quicker results, the experiments in this code take a rather casual approach to get a feel for things, leaving the heavy lifting to JMH.
Streams
See benchmarks/PrimeNumbers for the code that runs these the JMH Benchmarks.
Benchmark Mode Cnt Score Error PrimeNumbers.parallelPrimesTo_1000 avgt 25 46.620 ± 1.549 PrimeNumbers.parallelPrimesTo_10_000 avgt 25 379.036 ± 7.604 PrimeNumbers.parallelPrimesTo_10_000_000 avgt 25 1360362.823 ± 14723.084 PrimeNumbers.serialPrimesTo_1000 avgt 25 32.998 ± 0.315 PrimeNumbers.serialPrimesTo_10_000 avgt 25 644.467 ± 16.160 PrimeNumbers.serialPrimesTo_10_000_000 avgt 25 8199848.272 ± 84514.067
where the Score is the Average Microseconds (μS) to complete the run. Given we only test odd numbers we can conclude from testing numbers if they are prime, where throughput = tests per μS
Benchmark tested throughput ratio PrimeNumbers.parallelPrimesTo_1000 500 10.725 0.708 PrimeNumbers.parallelPrimesTo_10_000 5,000 13.191 1.700 PrimeNumbers.parallelPrimesTo_10_000_000 5,000,000 3.675 6.025 PrimeNumbers.serialPrimesTo_1000 500 15.152 1.413 PrimeNumbers.serialPrimesTo_10_000 5,000 7.758 0.588 PrimeNumbers.serialPrimesTo_10_000_000 5,000,000 0.610 0.166
From this we can conclude
As an aside, computing primes up to
- 1,000, there are 167 primes
- 10,000, there are 1228 primes
- 10,000,000 there are 664578 primes
Project Loom
Pure Computation
When I first started playing with these experiments I naïvely thought that using Project Loom could improve upon the previous benchmarks. Even though I had already watched Ron Pressler - Loom: Bringing Lightweight Threads and Delimited Continuations to the JVM, I had not really appreciated or internalized that knowledge. But, by naïvely concocting my own experiments, I had empirical evidence that supported what Ron Pressler had already stated.
Without using JMH, playing around here I discovered
BaseStream.parallel()
within a VirtualThread ExecutorService context did not improve
throughput. In most cases it was a little worse.
ExecutorService.invokeAll(Collection)
has about 1/3 the throughput of Stream.parallel()
for testing up to 10,000,000 primes.
ExecutorService.submit(Callable)
has about 1/2 the throughput of Stream.parallel(),
which is better than ExecutorService#.invokeAll(Collection)
Simulated Networking
The bottom line is, that for raw computation, stick with the Java Streams API, and Parallel Streams. However, how does Project Loom compare when simulating a Network Application, where we might be farming prime computations out to some HTTP Endpoint?
Using the same Primes.isPrime(long, long, long)
code as the previous benchmarks,
where we use isPrime(candiate, minimumLag, maximumLag) with minimumLag = 10 ms and
maximumLag = 30 ms, times 2, or a total of 20 ms minimum and 60 ms maximum, where the actual
lag is random; we simulate some network blocking overhead by using Thread.sleep(long)
before the prime
test and after the prime test to simulate the HTTP Request overhead and the HTTP Response overhead. More
importantly, these two sleep() requests give the Thread schedulers a chance to let other tasks run.
Implicitly, there is the actual lag of the isPrime(candidate) computation itself, which leads me to believe
that this is actually a pretty good simulation.
Benchmark Mode Cnt Score Error PrimeThreads.platformPrimesTo_1000 avgt 25 122.004 ± 4.888 PrimeThreads.platformPrimesTo_10_000 avgt 25 986.577 ± 55.197 PrimeThreads.platformPrimesTo_10_000_000 avgt 25 1043651.917 ± 139003.151 PrimeThreads.virtualPrimesTo_1000 avgt 25 33.420 ± 1.142 PrimeThreads.virtualPrimesTo_10_000 avgt 25 53.476 ± 0.610 PrimeThreads.virtualPrimesTo_10_000_000 avgt 25 33058.254 ± 1171.181 Benchmark tested throughput ratio PrimeThreads.platformPrimesTo_1000 500 4.098 0.274 PrimeThreads.platformPrimesTo_10_000 5,000 5.068 0.054 PrimeThreads.platformPrimesTo_10_000_000 5,000,000 4.791 0.032 PrimeThreads.virtualPrimesTo_1000 500 14.961 3.651 PrimeThreads.virtualPrimesTo_10_000 5,000 93.500 18.449 PrimeThreads.virtualPrimesTo_10_000_000 5,000,000 151.248 31.569
Based on the JMH results (after a time of 14:56:54), where we are using units of Milliseconds,
Many thanks to Ron Pressler who responded to my email regarding benchmarking Project Loom, and patiently clarified my thinking on how to benchmark, and more importantly how to interprest and explain benchmarks.
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic void
primeThreads
(long limit, ExecutorService executorService) primeThreadsOld
(long limit, ExecutorService executorService) static void
suite3
(long limit) static void
suite3Old
(long limit, ThreadFactory threadFactory) Hello Prime Threads PID = 31560 CPU Cores = 12 Heap Size = 17179869184 ______________________________________________________________________________ primes to 1,000 primeThreads: threadMaximum = 12 primeThreads: threadMaximum = 3 primeThreads: threadMaximum = 1 primeThreads: threadMaximum = 3 virtualCachedThreadPool = 72, 67, 5 virtualFixedThreadPool = 6, 5, 1 virtualSingleThreadTaskExecutor = 4, 4, 0 virtualThreadPerTaskExecutor = 18, 18, 0 primes to 10,000 primeThreads: threadMaximum = 3 primeThreads: threadMaximum = 3 primeThreads: threadMaximum = 1 primeThreads: threadMaximum = 3 virtualCachedThreadPool = 78, 76, 2 virtualFixedThreadPool = 15, 14, 1 virtualSingleThreadTaskExecutor = 9, 8, 1 virtualThreadPerTaskExecutor = 36, 35, 1 primes to 10,000,000 primeThreads: threadMaximum = 11 primeThreads: threadMaximum = 11 primeThreads: threadMaximum = 1 primeThreads: threadMaximum = 12 virtualCachedThreadPool = 14338, 14167, 171 virtualFixedThreadPool = 3162, 3071, 91 virtualSingleThreadTaskExecutor = 9349, 9297, 52 virtualThreadPerTaskExecutor = 4124, 3939, 185 primes to 50,000,000 primeThreads: threadMaximum = 12 primeThreads: threadMaximum = 12 primeThreads: threadMaximum = 1 primeThreads: threadMaximum = 12 virtualCachedThreadPool = 96528, 95790, 738 virtualFixedThreadPool = 18029, 17619, 410 virtualSingleThreadTaskExecutor = 81046, 80751, 295 virtualThreadPerTaskExecutor = 40608, 39787, 821
-
Constructor Details
-
Experiment11_PrimeThreads
public Experiment11_PrimeThreads()
-
-
Method Details
-
main
-
suite3
public static void suite3(long limit) -
suite3Old
Hello Prime Threads PID = 31560 CPU Cores = 12 Heap Size = 17179869184 ______________________________________________________________________________ primes to 1,000 primeThreads: threadMaximum = 12 primeThreads: threadMaximum = 3 primeThreads: threadMaximum = 1 primeThreads: threadMaximum = 3 virtualCachedThreadPool = 72, 67, 5 virtualFixedThreadPool = 6, 5, 1 virtualSingleThreadTaskExecutor = 4, 4, 0 virtualThreadPerTaskExecutor = 18, 18, 0 primes to 10,000 primeThreads: threadMaximum = 3 primeThreads: threadMaximum = 3 primeThreads: threadMaximum = 1 primeThreads: threadMaximum = 3 virtualCachedThreadPool = 78, 76, 2 virtualFixedThreadPool = 15, 14, 1 virtualSingleThreadTaskExecutor = 9, 8, 1 virtualThreadPerTaskExecutor = 36, 35, 1 primes to 10,000,000 primeThreads: threadMaximum = 11 primeThreads: threadMaximum = 11 primeThreads: threadMaximum = 1 primeThreads: threadMaximum = 12 virtualCachedThreadPool = 14338, 14167, 171 virtualFixedThreadPool = 3162, 3071, 91 virtualSingleThreadTaskExecutor = 9349, 9297, 52 virtualThreadPerTaskExecutor = 4124, 3939, 185 primes to 50,000,000 primeThreads: threadMaximum = 12 primeThreads: threadMaximum = 12 primeThreads: threadMaximum = 1 primeThreads: threadMaximum = 12 virtualCachedThreadPool = 96528, 95790, 738 virtualFixedThreadPool = 18029, 17619, 410 virtualSingleThreadTaskExecutor = 81046, 80751, 295 virtualThreadPerTaskExecutor = 40608, 39787, 821
- Parameters:
limit
-threadFactory
-
-
primeThreads
-
primeThreadsOld
-