Jun 15, 2016

Locking alternatives in Java 8


To provide synchronized data cache access, I discuss three alternatives in Java 8: synchronized() blocks, ReadWriteLock and StampedLock (new in Java 8). I show code snippets and compare the performance impact on a real world application.

The use case

Consider the following use case: A data cache that holds key-value pairs and needs to be accessed by several threads concurrently.

One option is to use a synchronized container like ConcurrentHashMap or Collections.synchronizedMap(map). Those have their own considerations, but will not be handled in this article.

In our use case, we want to store arbitrary objects into the cache and retrieve them by Integer keys in the range of 0..n. As memory usage and performance is critical in our application, we decided to use a good, old array instead of a more sophisticated container like a Map.
A naive implementation allowing multi-threaded access without any synchronization can cause subtle, hard to find data inconsistencies:
  • Memory visibility: Threads may see the array in different states (see explanation).
  • Race conditions: Writing at the same time may cause one thread's change to be lost (see explanation)
Thus, we need to provide some form of synchronization.

To fix the problem of memory visibility, Java's volatile keyword seems to be the perfect fit. However, making an array volatile has not the desired effect because it makes accesssing the array variable atomic, but not accessing the arrays content.

In case the array's payload is Integer or Long values, you might consider AtomicIntegerArray or AtomicLongArray. But in our case, we want to support arbitrary values, i.e. Objects.

Traditionally, there are two ways in Java to do synchronization: synchronized() blocks and ReadWriteLock. Java 8 provides another alternative called StampedLock. There are propably more exotic ways, but I will focus on these three relatively easy to implement and well understood ways.

For each approach, I will provide a short explanation and a code snippet for the cache's read and write methods.


synchronized is a Java keyword that can be used to restrict the execution of code blocks or methods to one thread at a time. Using synchronized is straight forward - just make sure to not miss any code that needs to be synchronized. The downside is, you can't differentiate between read and write access (the other two alternatives will). If one thread enters the synchronized block, everyone else will be locked. On the upside, as a core language feature, it is well optimized in the JVM.

public class Cache {
  private Object[] data;
  private final Object lock = new Object();

  public Object read(int key) {
    synchronized (lock) {
      if (data.length <= key) {
        return null;

      return data[key];

  public void write(int key, Object value) {
    synchronized (lock) {
        ensureRange(key); // enlarges the array if necessary
        data[key] = value;



ReadWriteLock is an interface. If I say ReadWriteLock, I mean its only standard library implementation ReentrantReadWriteLock. The basic idea is to have two locks: one for write access and one for read access. While writing locks out everyone else (like synchronized), multiple threads may read concurrently. If there are more readers than writers, this leads to less threads being blocked and therefor better performance.

public class Cache {
  private Object[] data;
  private final ReadWriteLock lock = new ReentrantReadWriteLock();

  public Object read(int key) {
    try {
      if (data.length <= key) {
        return null;

      return data[key];
    } finally {

public void write(int key, Object value) {
    try {
      ensureRange(key); // enlarges the array if necessary
      data[key] = value;
    } finally {


StampedLock is a new addition in Java 8. It is similiar to ReadWriteLock in that it also has separate read and write locks. The methods used to aquire locks return a "stamp" (long value), that represents a lock state. I like to think of the stamp as the "version" of the data in terms of data visibility. This makes a new locking strategy possible: the "optimistic read". An optimistic read means to aquire a stamp (but no actual lock), read without locking and afterwards validate the lock, i.e. check if it was ok to read without a lock. If we were too optimistic and it turns out someone else wrote in the meantime, the stamp would be invalid. In this case, we have no choice but to acquire a real read lock and read the value again.

Like ReadWriteLock, StampedLock is efficient if there is more read than write access. It can save a lot overhead to not have to acquire and release locks for every read access. On the other hand, if reading is expensive, reading twice from time to time may also hurt.

public class Cache {
  private Object[] data;
  private final StampedLock lock = new StampedLock();

  public Object read(int key) {
    long stamp = lock.tryOptimisticRead();

    // Read the value optimistically (may be outdated).
    Object value = null;
    if (data.length > key) {
      value = data[key];

    // Validate the stamp - if it is outdated,

    // acquire a read lock and read the value again.
    if (lock.validate(stamp)) {
      return value;
    } else {
      stamp = lock.readLock();

      try {
        if (data.length <= key) {
          return null;

        return data[key];
      } finally {

  public void write(int key, Object value) {
    long stamp = lock.writeLock();
    try {
      ensureRange(key); // enlarges the array if necessary
      data[key] = value;
    } finally {


All three alternatives are valid choices for our cache use case, because we expect more reads than writes. To find out which is best, I ran a benchmark with our application. The test machine is a Intel Core i7-5820K CPU which has 6 physical cores (12 logical cores with hyper threading). Our application spawns 12 threads that access the cache concurrently. The application is a "loader" that imports data from a database, makes calculations and stores the results into a database. The cache is not under stress 100% of the time. However it is vital enough to show a significant impact on the application's overall runtime.

As benchmark I executed our application with reduced data. To get a good average, I ran each locking strategy three times. Here are the results:

In our use case, StampedLock provides the best performance. While 15% difference to Synchronized and 24% difference to ReadWriteLock may not seem much, it is relevant enough to make the difference between making the nightly batch time frame or not (using full data). I want to stress that by no means this means that StampedLock is *the* best option in all cases. Here is a good article that has more detailed benchmarks for different reader/writer and thread combinations. Nevertheless I believe measuring the actual application is the best approach.


In Java 8, there are at least three good alternatives to handle locking in a concurrent read-write scenario: Synchronized, ReadWriteLock and StampedLock. Depending on the use case, the choice can make a substantial performance difference. As all three variants are quite simple to implement, it is good practice to measure and compare the performance.

Apr 23, 2016

How to use Docker within intelliJ

A short tutorial on how to use Docker within intelliJ and with a little help from Gradle. You can find the sample code & the full description here: https://github.com/adersberger/intellij-docker-tutorial


  • install intelliJ Docker plugin (https://plugins.jetbrains.com/plugin/7724)
  • check that there is a default Docker Machine: docker-machine ls. If there is no default machine create one: docker-machine create --driver virtualbox default.
  • start the default Docker Machine: docker-machine start default.
  • bind the env vars to the shell: eval "$(docker-machine env default)"
  • check if everything is correct: docker ps

Using Docker within intelliJ

1) Setup Docker cloud provider in intelliJ global preferences as shown below.

Tip: You can get the API URL by executing docker-machine ls and using the shown IP & port for the defaultmachine.

2) Check connection to Docker daemon in intelliJ "Docker" tab

3) Create new project from version control using github (https://github.com/adersberger/intellij-docker-tutorial.git)

4) Create a new run configuration to deploy application to Docker as shown on the following screenshots:

  • Be sure not to forget to add the Gradle tasks as a "before launch" action as shown at the very bottom of the screenshot.
  • The same is also possible for Docker Compose files. Just point the "Deployment" dropdown to the Docker Compose file.

5) Run the configuration and inspect the Docker container. A browser will open automagically and point to the REST endpoint. Within intelliJ you can access the containers console output, environment variables, port bindings etc.


Mar 18, 2016

Building a Solr-, Spark-, Zookeeper-Cloud with Intel NUC PCs

Part 1 - Hardware

If you work with Cluster- / Grid- or Cloud technologies like Mesos, Spark, Hadoop, Solr Cloud or Kubernetes, as a developer, architect or technical expert, you need your own private datacenter for testing and developing. There are several ways to build such an environment, each with its own drawbacks. To test real world scenarios like a failsafe and resilient Zookeeper cluster or a clustered Spark/Hadoop installation, you should have at least three independent machines. For the installation of Mesos/DCOS it is recommended that you have five machines in minimal setup.
There a several ways to build such an environment, each with it own drawbacks:

1) A virtualized environment running on a workstation laptop or PC

You can easily create a bunch von virtual machines and run them an a desktop or workstation. This approach works fine, is fast and cheap but has some problems:
  1. Your laptop may have only 16 Gigabyte of RAM - so each VM could get only 2-3 Gigabyte. For frameworks like Apache Spark which heavily uses caching this does not work well. 
  2. The performance of a virtualized environment is not predictable. The problem is that some resources like disk, network or memory access are shared between all VMs. So even if you have a workstation with an octa-core Intel Xenon processor, IO will behave different.

2) A cloud environment like the AWS EC2

This is the way most people work with these technologies but has also some specific disadvantages. If you experience any performance problem, you are likely not able to analyze the details. Cluster software is normally very sensitive in terms of network latency and network performance. Since AWS can't guarantee that all your machines are in the same rack, the performance between some nodes can differ.  

3) A datacenter with real hardware

You can build your own cluster but it is normally far too expensive. But even if you can afford real server hardware, you will have the problem that this solution is not portable. In most enterprises, you will not be allowed to run such a cluster. For testing and development it is much better when you have your own private cluster like your own laptop. 

So what is a feasible solution?

I decided to build my own 4 node cluster on Intel NUC mini PCs. Here are the technical facts:
  • NUC 6th Generation - Skylake
  • Intel Core I5 - Dual Core with Hyper-threading 
  • 32 GB DDR4 RAM
  • 256 GB Samsung M.2 SSD
  • Gigabit Ethernet
The Intel NUC has to be equipped with RAM and a M.2 SSD disk. All these parts have to be ordered separately.

This gives you a cluster with amazing capabilities
  • 16 Hyper Threading Units (8 Cores)
  • 128 GB DDR4 RAM
  • 1 TB Solid State Disk
Since I needed a portable solution, everything should be packed into a normal business case.  I found a very slim aluminium attaché case at Amazon with the right dimensions to include the NUC PCs and the network switch.

I decided to include a monitor and a keyboard to get direct access to the first node in the cluster. The monitor is used for visualization and monitoring when the software runs. I ordered a Gechic HDMI monitor which has the right dimensions to include the monitor in front of the case.

The NUC package includes screws for mounting. This also works in such a case when you drill a small hole for each screw. For the internal wiring you have to use flexible network cables. Otherwise you will get problems with the wiring. You also have to have a little talent to mount connectors for power and network in the case, but with a little patience it works. 

You can see the final result here:

This case will be my companion for the next year on all conferences, fairs and even in my office. The perfect presenter for any cluster / cloud technology. 

In the next part I will describe how to get a DCOS, Solr/Spark/Zeppelin Cloud installed and what you can do on top of such a hardware.

Have fun. 

Johannes Weigend

Mar 11, 2016

KubeCon 2016: Recap

All things Kubernetes: KubeCon 2016 in London (https://kubecon.io) revealed how attractive Kubernetes is to the community and how fast Kubernetes and its ecosystem are emerging. Evidence: 500 participants, most of them using Kubernetes in dev & production; impressive stats of the open source project and community; and profound talks reflecting real life experiences. My takeaways:

Kubernetes Roadmap 

By the end of march Kubernetes version 1.2 will be released with the following highlights:
  • New abstraction "Deployment": A deployment groups pod/rc/service definitions with additional deployment metadata. A deployment describes the desired target state of an application on a k8s cluster. When a deployment is applied k8s drives the current cluster state towards the desired state. This is performed on the server side and not on the client side (unlike in k8s < 1.2).
  • ConfigMaps & Secrets: Kubernetes can now handle configuration files & parameters as well as secrets and certificates cluster-wide. It stores the configs inside of etcd and makes it accessible through the k8s API. The configs are exposed by mounting the files into the pods (as tmpfs) and via env vars. They can also be referenced in the YAML files. They are updated live and atomically.
  • Brand new web UI: The Kubernetes Dashboard.
  • Improved scalability and support for multiple regions.
  • Better support for third-party extensions.
  • DaemonSet to better support the Sidekick pattern.
In about 16 weeks there'll be Kubernetes 1.3 with:
  • Better support for legacy applications with mechanisms like IP persistence.
  • Cluster federation (project Ubernetes) to join multiple k8s clusters together.
  • Further improved scalability.
  • Cluster autoscaling (automatically acquiring & releasing resources from the cloud provider).
  • In-Cluster IAM (LDAP / AM integration).
  • Scheduled jobs to better support batch processing on k8s.
  • Public cloud dashboard for Kubernetes-as-a-Service scenarios.
  • ... and more to come / to be discussed in the community.

Hot topics

The hot topics in my opinion were:
  • Higher-level abstractions & tools: Despite Kubernetes is a great advance in bridging the gap between devs and ops, there is the need for higher-level abstractions & tools - especially for the devs (cite: "Kubernetes should be an implementation detail for devs"). This is addressed by k8s itself (deployment abstraction) as well as by different approaches like kdeploy (https://github.com/kubernauts/kploy), Puppet Kubernetes (https://forge.puppetlabs.com/garethr/kubernetes), dgr (https://github.com/blablacar/dgr) or DEIS (http://deis.io). From a high-level point of view the community is putting the bricks on Kubernetes towards PaaS.
  • Continuous Delivery: Kubernetes is an enabler of continuous delivery (CD) and developing cloud native applications on k8s demands for CD. There were some industrial experience reports on using Kubernetes as execution environment for their CD workflows. Kubernetes handles scaling the CI/CD server as well as the application itself. Best practice here is to separate different applications and stages by using k8s namespaces and to use ChatOps tools like Hubot (https://hubot.github.com) to provide fast feedback to the devs & ops.
  • Stateful services: Kubernetes is great in running stateless Microservices. But a lot of applications have to deal with (persistent) state. But how to run stateful services and even databases on Kubernetes without loosing its benefits or even loosing data in case of a re-scheduling? K8S to the rescue! The answer is persistent volumes providing cluster-wide non-ephemeral storage. A couple of different cluster storage providers are available for persistent volumes in k8s: More classic ones like NFS and SCSI; cloud native ones like GlusterFS and Ceph; cloud provider specific ones for GCE and AWS and storage abstraction layers like Flocker. The competition is open!
  • Diagnosability: As applications and infrastructure is getting more and more fine-grained and distributed with platforms like k8s the problem of diagnosing failures and optimization potentials arises. Time for cluster-aware diagnosis tools like sysdig (http://www.sysdig.org), Scope (https://github.com/weaveworks/scope) and Kubernetes Dashboard (https://github.com/kubernetes/dashboard)! 
Learn more about the Kubernetes and other cloud native technologies on April 21, 2016 at our Cloud Native Night Meetup (RSVP) taking place in Mainz beside the JAX conference (http://www.meetup.com/cloud-native-night). 

Dec 30, 2015

WireSpock - Testing REST service client components with Spock and WireMock

In a previous post I have written about using the Spock framework for the exploratory testing of open source software. In this post I want to showcase a neat technology integration between Spock and the WireMock framework for testing your REST service client components. This is especially useful when testing micro service based architectures, since you want to test the individual service integrations without firing up all the collaborators.

Introducing WireMock

As stated on it's webpage, WireMock is "a web service test double for all occasions". It supports stubbing and mocking of HTTP calls, as well as request verification, record and playback of stubs, fault injection and much more. It actually fires up a small embedded HTTP server, so your code and test interacts with it on the protocol level.

The most convenient way to use WireMock in your test cases is via a JUnit 4.x rule that handles the lifecycle of starting and stopping the mock server before and after each test. There also is a class rule available in case it is sufficient to use the same WireMock instance for the lifetime of the whole test case. The official documentation for the rule can be found here.

The good thing is that you can use the WireMock rule in your Spock specification just like in an ordinary JUnit based test. No magic here. Let's have a look at the following example.
class BookServiceClientSpec extends Specification {

    WireMockRule wireMockRule = new WireMockRule(18080)

    def client = new BookServiceClient("http://localhost:18080")

    def "Find all books using a WireMock stub server"() {
        given: "a stubbed GET request for all books"
        // TODO

        when: "we invoke the REST client to find all books"
        def books = client.findAll()

        then: "we expect two books to be found"
        books.size() == 2

        and: "the mock to be invoked exactly once"
        // TODO
First, the JUnit WireMock rule is created and initialized to listen on port 18080. Next the REST client component under test is created and configured to access the local wire mock server. The test method itself does not do much yet. For it to work we need to stub the response for the findAll() query and we want to check that the mock has been invoked once. Before we continue, let's have a look at the test dependencies required to compile and run the example.
dependencies {
    testCompile 'junit:junit:4.12'
    testCompile 'org.spockframework:spock-core:1.0-groovy-2.4'

    testCompile 'com.github.tomakehurst:wiremock:1.57'
    testCompile 'com.github.tomjankes:wiremock-groovy:0.2.0'

Making WireMock Groovy

The last dependency is a small Groovy binding library for WireMock that plays together nicely with Spock. It allows for a more concise stubbing and verification syntax instead of using WireMock's default static imports API. Have a look at the following example to get the idea.
def wireMock = new WireMockGroovy(18080)

def "Find all books using a WireMock stub server"() {
    given: "a stubbed GET request for all books"
    wireMock.stub {
        request {
            method "GET"
            url "/book"
        response {
            status 200
            body """[
                      {"title": "Book 1", "isbn": "4711"},
                      {"title": "Book 2", "isbn": "4712"}
            headers { "Content-Type" "application/json" }

    when: "we invoke the REST client to find all books"
    def books = client.findAll()

    then: "we expect two books to be found"
    books.size() == 2

    and: "the mock to be invoked exactly once"
    1 == wireMock.count {
        method "GET"
        url "/book"
First, we create the WireMock Groovy binding to create stubbed requests and responses. The stub closure takes the definitions of the REST request and response using the WireMock JSON API. As you can see we can even specify the response body as inline JSON multiline GString. Finally, we check that the invocation count for the expected request is correct.

Clearly, specifying the responses inline is not very maintenance friendly especially for large response structures. So a better alternative is to externalize the response body in a separate file. The file needs to be located in a directory named __files within src/test/resources.
The bodyFileName value is relative to the __files directory and contain any content. You could even return binary files like JPEGs using this mechanism.
response {
    status 200
    bodyFileName "books.json"
    headers { "Content-Type" "application/json" }
A further way of specifying the response body is by using plain Java or Groovy objects that get serialized to JSON automatically.
response {
    status 200
    jsonBody new Book(title: "WireSpock Showcase", isbn: "4713")
    headers { "Content-Type" "application/json" }
The stubbing capabilities of WireMock are quite powerful. You can perform different matchings on the URL, request headers, query parameters or the request body to determine the correct response. Have a look at the WireMock stubbing documentation for a complete description of all features.

So there is only one thing left to say: Test long and prosper with Spock!


Dec 17, 2015

Open Source Project Chronix: An efficient and fast time series database based on Apache Solr.

We are pleased to announce the open source project Chronix. Chronix is a fast and efficient time series storage. It is based on Apache Solr, a distributed NoSQL database with impressive search capabilities. Chronix uses the features of Solr and enriches it with specialized concepts for storing time series data. Thereby Chronix allows you to store about 15 GB of raw time series data (csv files) in about 238 MB. An average query for a bunch of time series data needs 21 ms using a single Solr server and one core. On Benchmarks Chronix outperforms related time series databases like OpenTSDB, InfluxDB or Graphite in both storage demand and query times.

Why is Chronix Open Source?
We use Chronix in several applications like the Software EKG or as central time series storage in a research project called "Design for Diagnosability". We believe that the community can benefit from using Chronix in other projects and hope that their experiences flow back into Chronix. Thus download and use Chronix, fork it, improve it, and raise a pull request. :-)

How can I start?
The homepage of Chronix contains a 5 minute quick start guide. The guide uses an example JavaFX application for time series exploration. You can easily perform range queries, do some analyses and examine the results in real time.
Chronix Quick Start Guide - Time Series Exploration

The listing below shows an example integration using Chronix-API, the Chronix-Kassiopeia time series package, and the Chronix-Solr storage. All libraries are available on bintray. A build script for use in all Gradle versions is:
repositories {
    maven {
        url "http://dl.bintray.com/chronix/maven"
dependencies {
   compile 'de.qaware.chronix:chronix-api:0.1'
   compile 'de.qaware.chronix:chronix-server-client:0.1'
   compile 'de.qaware.chronix:chronix-kassiopeia-simple:0.1'
   compile 'de.qaware.chronix:chronix-kassiopeia-simple-converter:0.1'
Full Source build.gradle

The following snipped first constructs a Chronix client with a connection to Apache Solr, and then streams the maximum from all time series whose metric matches *Load*.
//Connection to Solr
SolrClient solr = new HttpSolrClient("http://host:8983/solr/chronix/");
//Define a group by function for the time series records
Function<MetricTimeSeries, String> groupBy = 
     ts -> ts.getMetric() + "-" + ts.attribute("host");

//Define a reduce function for the grouped time series records
BinaryOperator<MetricTimeSeries> reduce = (ts1, ts2) -> {
      MetricTimeSeries.Builder reduced = new MetricTimeSeries
         .data(concat(ts1.getTimestamps(), ts2.getTimestamps()),
               concat(ts1.getValues(), ts2.getValues()))
     return reduced.build();

//Instantiate a Chronix Client
ChronixClient<MetricTimeSeries,SolrClient,SolrQuery> chronix = 
  new ChronixClient(new KassiopeiaSimpleConverter(),
            new ChronixSolrStorage<>(200,groupBy,reduce));

//We want the maximum of all time series that metric matches *Load*.
SolrQuery query = new SolrQuery("metric:*Load*");

//The result is a Java Stream. We simply collect the result into a list.
List<MetricTimeSeries> maxTS = chronix.stream(solr, query)

//Just print it out.
LOGGER.info("Result for query {} is: {}", query, prettyPrint(maxTS));

But I want to use my own fancy time series implementation! No worries!
In the example above we use the default Chronix time series class but you can use Chronix to store your own time series. You only have to implement two simple methods of the TimeSeriesConverter interface shown below:
//Binary time series (data as blob) into your custom time series
T from(BinaryTimeSeries binaryTimeSeries, long queryStart, long queryEnd);

//Your custom time series into a binary time series
BinaryTimeSeries to(T document);

Afterwards you can use Chronix to store and stream you custom time series. For more details check the Chronix GitHub repository and website.

Can I contribute to Chronix?
You are highly welcome to contribute your improvements to the Chronix project. All you have to do is to fork the public GitHub repository, improve the code and issue a pull request.

What do I have do now?
Fork Chronix on GitHub and follow us on Twitter. :-)

Oct 28, 2015

Java 9 Jigsaw - The long-awaited Java Module System

Java 9 Jigsaw

The long-awaited Java Module System


This year at the JavaOne San Francisco, Mark Reinhold and his Team at Oracle presented the Java 9 Module System with the codename Jigsaw. The work on Jigsaw has a long history and finally arrives with Java 9. Since 1999 we develop acyclic software components with Java per convention and learned how to do this with build tools like Ant, Maven or Gradle. In 2005 at QAware, we developed a tool for static bytecode analysis to detect wrong dependencies between packages since the Java language had no idea of controlling package dependencies or avoiding cyclic dependencies between packages and jars. With Maven, things got better. Maven enforces acyclic dependencies between projects but has also no concept to hide certain packages in one Jar. If a public class is accessible, all other public classes are also accessible. Then came OSGi. It seemed to be the answer for this problem, but the bad tool support in IDEs and the over-engineered dynamic model made more problems than the architectural enforcements benefits for most applications. Even so applications exists where OSGi is a perfect choice. So finally Jigsaw arrived. We now get the long awaited Module System in Java 9. The Java language team had not only the architecture of applications in focus, they also divided the fat legacy runtime jar (rt.jar) into modules. Try to load a Java 8 rt.jar file in a tool like Structure101 and you will see a chaotic mess of cyclic and acyclic dependencies. That problem is solved in Java 9. The runtime itself has now a clean separated module dependency architecture. Another big focus of the Java Language team was compatibility and interoperability. Some odd looking design is useful for compatibility reasons between modular and non modular code. In this blog I will show a test-drive of the Java 9 Jigsaw Module System. The Java 9 preview build can be downloaded from here: https://jdk9.java.net/jigsaw

Email Sample Application

We use a simple modular application for sending emails. The application consist of two modules:
  • A mail module with a public interface and a private implementation class. The public part of that module is a package containing a Java interface for sending messages and a factory which returns the implementation to the caller. 
  • A mail client module wich uses the interface of the mail module to send emails. It should be guaranteed that the client can not access the private implementation of the mail module itself. This principle of encapsulation is long known. David Parnas described the principle of information hiding in modules in 1985. 
Our sample application looks like this:
In that situation, the Java 9 Module System can enforce the following:
  • The module MailClient can only use types of exported packages in the module Mail.
  • The module Mail can not have cyclic dependencies to a type located in the module MailClient.
  • Transitive dependencies from Mail to other components can be visible or not for the Module MailClient. This is controlled by declaring a dependency as public in the Mail module.
The Java 9 Module System can NOT enforce: Types of the exported package in module Mail can direct access implementation classes. There is no contract between public and hidden parts of a single module. If you want to write plugin modules which can be replaced, you are not allowed to call any implementation class from the public part of a module. If you want to do this, you have to write a separate API module which looks like this:

Looking at the Code

There is currently no IDE support for the Jigsaw available. Even the latest builds of Netbeans and IntelliJ do not recognize the module-info.java module descriptor file. Since the Java team decided to support multi-module builds with the javac compiler, they had to invent a complete new lookup mechanism: The module path which is separated from the normal classpath. This concept allows to use normal Jars as modules and vice versa. Jars which are located in the module path are called automatic modules. These are modules which can access all public parts of other modules without explicit declaration. This is done mostly to support smart and slow migration to the new module system without too much trouble for existing applications. At the time of writing, there was no support of Maven, Gradle or Ant. So we have to use shell scripts in this demonstration.

To use the Java 9 compiler multi-module build feature we have to make two subdirectories in the src directory: Mail and MailClient. The Mail directory contains the complete source code of the Mail module inclusive the private implementation which should be hidden from a caller. The MailClient directory contains the main class MailClient which uses methods from the public interface of the Mail module. The image shows the directory layout of our sample:

   ├── Mail 
   │    ├── de 
   │    │   └── qaware
   │    │       └── mail
   │    │            ├── MailSender.java
   │    │            ├── MailSenderFactory.java
   │    │            └── impl
   │    │                 └── MailSenderImpl.java
   │    └── module-info.java
   └── MailClient    
       ├── de    
       │   ├── qaware    
       │   └── mail    
       │         └── client                 
       │                └── MailClient.java    
       └── module-info.java

Both modules have the new module descriptor which is per convention in the root package and has the name „module-info.java“. The module descriptor is not a class, enum or interface. It uses the new keywords „module“, „exports“ and „requires“. The module descriptor for the mail and the client are quite simple:

module MailClient {
 requires Mail;

module Mail {
 exports de.qaware.mail;

In our example the code of client and mail implementation looks like this:

package de.qaware.mail.client;

import de.qaware.mail.MailSender;
import de.qaware.mail.MailSenderFactory;

public class MailClient {

 public static void main(String [] args) {
  MailSender mail = new MailSenderFactory().create();
                              "A message from JavaModule System");


package de.qaware.mail.impl;

import de.qaware.mail.MailSender;

public class MailSenderImpl implements MailSender {
 public void sendMail(String address, String message) {
  System.out.println("Sending mail to: " + address + 
                                          " message: " + message);

The Type MailSender and the MailSenderFactory are from the exported package de.qaware.mail of the module Mail. As you might expect, the MailSender Java interface has only one method: sendMail(). The real implementation is in the impl package which non exported in the Mail module and therefore invisible. If you try to access the MailSenderImpl class direct in the client code you get an compiler error:

../MailClient.java:9: error: MailSenderImpl is not visible because package de.qaware.mail.impl is not visible
1 error
Thats exactly what we want. Nobody can violate the access rules of our Mail module. Non exported packages are hidden.

Multi module building, packaging and running

The multi module build is a cool feature of the Jigsaw javac compiler. The compiler command for both modules in one step looks like this:

# compile
javac -d build -modulesourcepath src $(find src -name "*.java")

This command compiles all modules in the subpath of ./src and outputs the resulting classfiles in a identical structure in the folder ./build. So the content of the ./build folder can be simply packed in separate jar files: The jar command creates the module jar file from the compiled output. For the MailClient module we also specify the java main class for the next step.

jar --create --file mlib/Mail@1.0.jar --module-version 1.0 -C build/Mail .
jar --create --file mlib/MailClient@1.0.jar --module-version 1.0 --main-class de.qaware.mail.client.MailClient -C build/MailClient .

We can now run our modular application by using the module path where the generated jar files are stored. In our example this is the path ./mlib. In this path the both generated jar files are located. So we can now test our application with the following command:

# run
java -mp mlib -m MailClient


What does linking in Java mean? It means that only the required modules are packed together in a mini application together wich a platform depended starter script and a minified java runtime.

# link
jlink --modulepath $JAVA_HOME/jmods:mlib --addmods MailClient --output mailclient

So the application can be run direct without any knowledge of java or modules. The generated output directory looks like this:

├── bin
│   ├── MailClient
│   ├── java
│   └── keytool
├── conf
│   ├── net.properties
│   └── security
│       ├── java.policy
│       └── java.security
├── lib
│   ├── classlist
│   ├── jli
│   │   └── libjli.dylib
│   ├── jspawnhelper
│   ├── jvm.cfg
│   ├── libjava.dylib
│   ├── libjimage.dylib
│   ├── libjsig.diz
│   ├── libjsig.dylib
│   ├── libnet.dylib
│   ├── libnio.dylib
│   ├── libosxsecurity.dylib
│   ├── libverify.dylib
│   ├── libzip.dylib
│   ├── modules
│   │   └── bootmodules.jimage
│   ├── security
│   │   ├── US_export_policy.jar
│   │   ├── blacklist
│   │   ├── blacklisted.certs
│   │   ├── cacerts
│   │   ├── local_policy.jar
│   │   ├── trusted.libraries
│   │   └── unlimited_policy
│   │       ├── README.txt
│   │       ├── US_export_policy.jar
│   │       └── local_policy.jar
│   ├── server
│   │   ├── Xusage.txt
│   │   ├── libjsig.diz
│   │   ├── libjsig.dylib
│   │   ├── libjvm.diz
│   │   └── libjvm.dylib
│   └── tzdb.dat
└── release

The directory tree shows the complete minified runtime for the modular application. In the bin directory there is a generated MailClient starter which can be used to run the application direct without an explicit java command. The log directory contains only the required boot modules and native libraries. After linking you can start the application with the following command:

cd mailclient/bin 
Sending mail to: x@x.de message: A message from JavaModule System


Gratulation to Mark and his Team. You have done an awesome job. Since this feature is very invasive for tools and existing applications, it will take a long time until we can forget the classpath hell. For me personally it is the proof, that our design principles we used in the last 20 years of modular programming are finally arrived the java world as well. Thanks for that.