Apr 23, 2016

How to use Docker within intelliJ

A short tutorial on how to use Docker within intelliJ and with a little help from Gradle. You can find the sample code & the full description here: https://github.com/adersberger/intellij-docker-tutorial


  • install intelliJ Docker plugin (https://plugins.jetbrains.com/plugin/7724)
  • check that there is a default Docker Machine: docker-machine ls. If there is no default machine create one: docker-machine create --driver virtualbox default.
  • start the default Docker Machine: docker-machine start default.
  • bind the env vars to the shell: eval "$(docker-machine env default)"
  • check if everything is correct: docker ps

Using Docker within intelliJ

1) Setup Docker cloud provider in intelliJ global preferences as shown below.

Tip: You can get the API URL by executing docker-machine ls and using the shown IP & port for the defaultmachine.

2) Check connection to Docker daemon in intelliJ "Docker" tab

3) Create new project from version control using github (https://github.com/adersberger/intellij-docker-tutorial.git)

4) Create a new run configuration to deploy application to Docker as shown on the following screenshots:

  • Be sure not to forget to add the Gradle tasks as a "before launch" action as shown at the very bottom of the screenshot.
  • The same is also possible for Docker Compose files. Just point the "Deployment" dropdown to the Docker Compose file.

5) Run the configuration and inspect the Docker container. A browser will open automagically and point to the REST endpoint. Within intelliJ you can access the containers console output, environment variables, port bindings etc.


Mar 18, 2016

Building a Solr-, Spark-, Zookeeper-Cloud with Intel NUC PCs

Part 1 - Hardware

If you work with Cluster- / Grid- or Cloud technologies like Mesos, Spark, Hadoop, Solr Cloud or Kubernetes, as a developer, architect or technical expert, you need your own private datacenter for testing and developing. There are several ways to build such an environment, each with its own drawbacks. To test real world scenarios like a failsafe and resilient Zookeeper cluster or a clustered Spark/Hadoop installation, you should have at least three independent machines. For the installation of Mesos/DCOS it is recommended that you have five machines in minimal setup.
There a several ways to build such an environment, each with it own drawbacks:

1) A virtualized environment running on a workstation laptop or PC

You can easily create a bunch von virtual machines and run them an a desktop or workstation. This approach works fine, is fast and cheap but has some problems:
  1. Your laptop may have only 16 Gigabyte of RAM - so each VM could get only 2-3 Gigabyte. For frameworks like Apache Spark which heavily uses caching this does not work well. 
  2. The performance of a virtualized environment is not predictable. The problem is that some resources like disk, network or memory access are shared between all VMs. So even if you have a workstation with an octa-core Intel Xenon processor, IO will behave different.

2) A cloud environment like the AWS EC2

This is the way most people work with these technologies but has also some specific disadvantages. If you experience any performance problem, you are likely not able to analyze the details. Cluster software is normally very sensitive in terms of network latency and network performance. Since AWS can't guarantee that all your machines are in the same rack, the performance between some nodes can differ.  

3) A datacenter with real hardware

You can build your own cluster but it is normally far too expensive. But even if you can afford real server hardware, you will have the problem that this solution is not portable. In most enterprises, you will not be allowed to run such a cluster. For testing and development it is much better when you have your own private cluster like your own laptop. 

So what is a feasible solution?

I decided to build my own 4 node cluster on Intel NUC mini PCs. Here are the technical facts:
  • NUC 6th Generation - Skylake
  • Intel Core I5 - Dual Core with Hyper-threading 
  • 32 GB DDR4 RAM
  • 256 GB Samsung M.2 SSD
  • Gigabit Ethernet
The Intel NUC has to be equipped with RAM and a M.2 SSD disk. All these parts have to be ordered separately.

This gives you a cluster with amazing capabilities
  • 16 Hyper Threading Units (8 Cores)
  • 128 GB DDR4 RAM
  • 1 TB Solid State Disk
Since I needed a portable solution, everything should be packed into a normal business case.  I found a very slim aluminium attaché case at Amazon with the right dimensions to include the NUC PCs and the network switch.

I decided to include a monitor and a keyboard to get direct access to the first node in the cluster. The monitor is used for visualization and monitoring when the software runs. I ordered a Gechic HDMI monitor which has the right dimensions to include the monitor in front of the case.

The NUC package includes screws for mounting. This also works in such a case when you drill a small hole for each screw. For the internal wiring you have to use flexible network cables. Otherwise you will get problems with the wiring. You also have to have a little talent to mount connectors for power and network in the case, but with a little patience it works. 

You can see the final result here:

This case will be my companion for the next year on all conferences, fairs and even in my office. The perfect presenter for any cluster / cloud technology. 

In the next part I will describe how to get a DCOS, Solr/Spark/Zeppelin Cloud installed and what you can do on top of such a hardware.

Have fun. 

Johannes Weigend

Mar 11, 2016

KubeCon 2016: Recap

All things Kubernetes: KubeCon 2016 in London (https://kubecon.io) revealed how attractive Kubernetes is to the community and how fast Kubernetes and its ecosystem are emerging. Evidence: 500 participants, most of them using Kubernetes in dev & production; impressive stats of the open source project and community; and profound talks reflecting real life experiences. My takeaways:

Kubernetes Roadmap 

By the end of march Kubernetes version 1.2 will be released with the following highlights:
  • New abstraction "Deployment": A deployment groups pod/rc/service definitions with additional deployment metadata. A deployment describes the desired target state of an application on a k8s cluster. When a deployment is applied k8s drives the current cluster state towards the desired state. This is performed on the server side and not on the client side (unlike in k8s < 1.2).
  • ConfigMaps & Secrets: Kubernetes can now handle configuration files & parameters as well as secrets and certificates cluster-wide. It stores the configs inside of etcd and makes it accessible through the k8s API. The configs are exposed by mounting the files into the pods (as tmpfs) and via env vars. They can also be referenced in the YAML files. They are updated live and atomically.
  • Brand new web UI: The Kubernetes Dashboard.
  • Improved scalability and support for multiple regions.
  • Better support for third-party extensions.
  • DaemonSet to better support the Sidekick pattern.
In about 16 weeks there'll be Kubernetes 1.3 with:
  • Better support for legacy applications with mechanisms like IP persistence.
  • Cluster federation (project Ubernetes) to join multiple k8s clusters together.
  • Further improved scalability.
  • Cluster autoscaling (automatically acquiring & releasing resources from the cloud provider).
  • In-Cluster IAM (LDAP / AM integration).
  • Scheduled jobs to better support batch processing on k8s.
  • Public cloud dashboard for Kubernetes-as-a-Service scenarios.
  • ... and more to come / to be discussed in the community.

Hot topics

The hot topics in my opinion were:
  • Higher-level abstractions & tools: Despite Kubernetes is a great advance in bridging the gap between devs and ops, there is the need for higher-level abstractions & tools - especially for the devs (cite: "Kubernetes should be an implementation detail for devs"). This is addressed by k8s itself (deployment abstraction) as well as by different approaches like kdeploy (https://github.com/kubernauts/kploy), Puppet Kubernetes (https://forge.puppetlabs.com/garethr/kubernetes), dgr (https://github.com/blablacar/dgr) or DEIS (http://deis.io). From a high-level point of view the community is putting the bricks on Kubernetes towards PaaS.
  • Continuous Delivery: Kubernetes is an enabler of continuous delivery (CD) and developing cloud native applications on k8s demands for CD. There were some industrial experience reports on using Kubernetes as execution environment for their CD workflows. Kubernetes handles scaling the CI/CD server as well as the application itself. Best practice here is to separate different applications and stages by using k8s namespaces and to use ChatOps tools like Hubot (https://hubot.github.com) to provide fast feedback to the devs & ops.
  • Stateful services: Kubernetes is great in running stateless Microservices. But a lot of applications have to deal with (persistent) state. But how to run stateful services and even databases on Kubernetes without loosing its benefits or even loosing data in case of a re-scheduling? K8S to the rescue! The answer is persistent volumes providing cluster-wide non-ephemeral storage. A couple of different cluster storage providers are available for persistent volumes in k8s: More classic ones like NFS and SCSI; cloud native ones like GlusterFS and Ceph; cloud provider specific ones for GCE and AWS and storage abstraction layers like Flocker. The competition is open!
  • Diagnosability: As applications and infrastructure is getting more and more fine-grained and distributed with platforms like k8s the problem of diagnosing failures and optimization potentials arises. Time for cluster-aware diagnosis tools like sysdig (http://www.sysdig.org), Scope (https://github.com/weaveworks/scope) and Kubernetes Dashboard (https://github.com/kubernetes/dashboard)! 
Learn more about the Kubernetes and other cloud native technologies on April 21, 2016 at our Cloud Native Night Meetup (RSVP) taking place in Mainz beside the JAX conference (http://www.meetup.com/cloud-native-night). 

Dec 30, 2015

WireSpock - Testing REST service client components with Spock and WireMock

In a previous post I have written about using the Spock framework for the exploratory testing of open source software. In this post I want to showcase a neat technology integration between Spock and the WireMock framework for testing your REST service client components. This is especially useful when testing micro service based architectures, since you want to test the individual service integrations without firing up all the collaborators.

Introducing WireMock

As stated on it's webpage, WireMock is "a web service test double for all occasions". It supports stubbing and mocking of HTTP calls, as well as request verification, record and playback of stubs, fault injection and much more. It actually fires up a small embedded HTTP server, so your code and test interacts with it on the protocol level.

The most convenient way to use WireMock in your test cases is via a JUnit 4.x rule that handles the lifecycle of starting and stopping the mock server before and after each test. There also is a class rule available in case it is sufficient to use the same WireMock instance for the lifetime of the whole test case. The official documentation for the rule can be found here.

The good thing is that you can use the WireMock rule in your Spock specification just like in an ordinary JUnit based test. No magic here. Let's have a look at the following example.
class BookServiceClientSpec extends Specification {

    WireMockRule wireMockRule = new WireMockRule(18080)

    def client = new BookServiceClient("http://localhost:18080")

    def "Find all books using a WireMock stub server"() {
        given: "a stubbed GET request for all books"
        // TODO

        when: "we invoke the REST client to find all books"
        def books = client.findAll()

        then: "we expect two books to be found"
        books.size() == 2

        and: "the mock to be invoked exactly once"
        // TODO
First, the JUnit WireMock rule is created and initialized to listen on port 18080. Next the REST client component under test is created and configured to access the local wire mock server. The test method itself does not do much yet. For it to work we need to stub the response for the findAll() query and we want to check that the mock has been invoked once. Before we continue, let's have a look at the test dependencies required to compile and run the example.
dependencies {
    testCompile 'junit:junit:4.12'
    testCompile 'org.spockframework:spock-core:1.0-groovy-2.4'

    testCompile 'com.github.tomakehurst:wiremock:1.57'
    testCompile 'com.github.tomjankes:wiremock-groovy:0.2.0'

Making WireMock Groovy

The last dependency is a small Groovy binding library for WireMock that plays together nicely with Spock. It allows for a more concise stubbing and verification syntax instead of using WireMock's default static imports API. Have a look at the following example to get the idea.
def wireMock = new WireMockGroovy(18080)

def "Find all books using a WireMock stub server"() {
    given: "a stubbed GET request for all books"
    wireMock.stub {
        request {
            method "GET"
            url "/book"
        response {
            status 200
            body """[
                      {"title": "Book 1", "isbn": "4711"},
                      {"title": "Book 2", "isbn": "4712"}
            headers { "Content-Type" "application/json" }

    when: "we invoke the REST client to find all books"
    def books = client.findAll()

    then: "we expect two books to be found"
    books.size() == 2

    and: "the mock to be invoked exactly once"
    1 == wireMock.count {
        method "GET"
        url "/book"
First, we create the WireMock Groovy binding to create stubbed requests and responses. The stub closure takes the definitions of the REST request and response using the WireMock JSON API. As you can see we can even specify the response body as inline JSON multiline GString. Finally, we check that the invocation count for the expected request is correct.

Clearly, specifying the responses inline is not very maintenance friendly especially for large response structures. So a better alternative is to externalize the response body in a separate file. The file needs to be located in a directory named __files within src/test/resources.
The bodyFileName value is relative to the __files directory and contain any content. You could even return binary files like JPEGs using this mechanism.
response {
    status 200
    bodyFileName "books.json"
    headers { "Content-Type" "application/json" }
A further way of specifying the response body is by using plain Java or Groovy objects that get serialized to JSON automatically.
response {
    status 200
    jsonBody new Book(title: "WireSpock Showcase", isbn: "4713")
    headers { "Content-Type" "application/json" }
The stubbing capabilities of WireMock are quite powerful. You can perform different matchings on the URL, request headers, query parameters or the request body to determine the correct response. Have a look at the WireMock stubbing documentation for a complete description of all features.

So there is only one thing left to say: Test long and prosper with Spock!


Dec 17, 2015

Open Source Project Chronix: An efficient and fast time series database based on Apache Solr.

We are pleased to announce the open source project Chronix. Chronix is a fast and efficient time series storage. It is based on Apache Solr, a distributed NoSQL database with impressive search capabilities. Chronix uses the features of Solr and enriches it with specialized concepts for storing time series data. Thereby Chronix allows you to store about 15 GB of raw time series data (csv files) in about 238 MB. An average query for a bunch of time series data needs 21 ms using a single Solr server and one core. On Benchmarks Chronix outperforms related time series databases like OpenTSDB, InfluxDB or Graphite in both storage demand and query times.

Why is Chronix Open Source?
We use Chronix in several applications like the Software EKG or as central time series storage in a research project called "Design for Diagnosability". We believe that the community can benefit from using Chronix in other projects and hope that their experiences flow back into Chronix. Thus download and use Chronix, fork it, improve it, and raise a pull request. :-)

How can I start?
The homepage of Chronix contains a 5 minute quick start guide. The guide uses an example JavaFX application for time series exploration. You can easily perform range queries, do some analyses and examine the results in real time.
Chronix Quick Start Guide - Time Series Exploration

The listing below shows an example integration using Chronix-API, the Chronix-Kassiopeia time series package, and the Chronix-Solr storage. All libraries are available on bintray. A build script for use in all Gradle versions is:
repositories {
    maven {
        url "http://dl.bintray.com/chronix/maven"
dependencies {
   compile 'de.qaware.chronix:chronix-api:0.1'
   compile 'de.qaware.chronix:chronix-server-client:0.1'
   compile 'de.qaware.chronix:chronix-kassiopeia-simple:0.1'
   compile 'de.qaware.chronix:chronix-kassiopeia-simple-converter:0.1'
Full Source build.gradle

The following snipped first constructs a Chronix client with a connection to Apache Solr, and then streams the maximum from all time series whose metric matches *Load*.
//Connection to Solr
SolrClient solr = new HttpSolrClient("http://host:8983/solr/chronix/");
//Define a group by function for the time series records
Function<MetricTimeSeries, String> groupBy = 
     ts -> ts.getMetric() + "-" + ts.attribute("host");

//Define a reduce function for the grouped time series records
BinaryOperator<MetricTimeSeries> reduce = (ts1, ts2) -> {
      MetricTimeSeries.Builder reduced = new MetricTimeSeries
         .data(concat(ts1.getTimestamps(), ts2.getTimestamps()),
               concat(ts1.getValues(), ts2.getValues()))
     return reduced.build();

//Instantiate a Chronix Client
ChronixClient<MetricTimeSeries,SolrClient,SolrQuery> chronix = 
  new ChronixClient(new KassiopeiaSimpleConverter(),
            new ChronixSolrStorage<>(200,groupBy,reduce));

//We want the maximum of all time series that metric matches *Load*.
SolrQuery query = new SolrQuery("metric:*Load*");

//The result is a Java Stream. We simply collect the result into a list.
List<MetricTimeSeries> maxTS = chronix.stream(solr, query)

//Just print it out.
LOGGER.info("Result for query {} is: {}", query, prettyPrint(maxTS));

But I want to use my own fancy time series implementation! No worries!
In the example above we use the default Chronix time series class but you can use Chronix to store your own time series. You only have to implement two simple methods of the TimeSeriesConverter interface shown below:
//Binary time series (data as blob) into your custom time series
T from(BinaryTimeSeries binaryTimeSeries, long queryStart, long queryEnd);

//Your custom time series into a binary time series
BinaryTimeSeries to(T document);

Afterwards you can use Chronix to store and stream you custom time series. For more details check the Chronix GitHub repository and website.

Can I contribute to Chronix?
You are highly welcome to contribute your improvements to the Chronix project. All you have to do is to fork the public GitHub repository, improve the code and issue a pull request.

What do I have do now?
Fork Chronix on GitHub and follow us on Twitter. :-)

Oct 28, 2015

Java 9 Jigsaw - The long-awaited Java Module System

Java 9 Jigsaw

The long-awaited Java Module System


This year at the JavaOne San Francisco, Mark Reinhold and his Team at Oracle presented the Java 9 Module System with the codename Jigsaw. The work on Jigsaw has a long history and finally arrives with Java 9. Since 1999 we develop acyclic software components with Java per convention and learned how to do this with build tools like Ant, Maven or Gradle. In 2005 at QAware, we developed a tool for static bytecode analysis to detect wrong dependencies between packages since the Java language had no idea of controlling package dependencies or avoiding cyclic dependencies between packages and jars. With Maven, things got better. Maven enforces acyclic dependencies between projects but has also no concept to hide certain packages in one Jar. If a public class is accessible, all other public classes are also accessible. Then came OSGi. It seemed to be the answer for this problem, but the bad tool support in IDEs and the over-engineered dynamic model made more problems than the architectural enforcements benefits for most applications. Even so applications exists where OSGi is a perfect choice. So finally Jigsaw arrived. We now get the long awaited Module System in Java 9. The Java language team had not only the architecture of applications in focus, they also divided the fat legacy runtime jar (rt.jar) into modules. Try to load a Java 8 rt.jar file in a tool like Structure101 and you will see a chaotic mess of cyclic and acyclic dependencies. That problem is solved in Java 9. The runtime itself has now a clean separated module dependency architecture. Another big focus of the Java Language team was compatibility and interoperability. Some odd looking design is useful for compatibility reasons between modular and non modular code. In this blog I will show a test-drive of the Java 9 Jigsaw Module System. The Java 9 preview build can be downloaded from here: https://jdk9.java.net/jigsaw

Email Sample Application

We use a simple modular application for sending emails. The application consist of two modules:
  • A mail module with a public interface and a private implementation class. The public part of that module is a package containing a Java interface for sending messages and a factory which returns the implementation to the caller. 
  • A mail client module wich uses the interface of the mail module to send emails. It should be guaranteed that the client can not access the private implementation of the mail module itself. This principle of encapsulation is long known. David Parnas described the principle of information hiding in modules in 1985. 
Our sample application looks like this:
In that situation, the Java 9 Module System can enforce the following:
  • The module MailClient can only use types of exported packages in the module Mail.
  • The module Mail can not have cyclic dependencies to a type located in the module MailClient.
  • Transitive dependencies from Mail to other components can be visible or not for the Module MailClient. This is controlled by declaring a dependency as public in the Mail module.
The Java 9 Module System can NOT enforce: Types of the exported package in module Mail can direct access implementation classes. There is no contract between public and hidden parts of a single module. If you want to write plugin modules which can be replaced, you are not allowed to call any implementation class from the public part of a module. If you want to do this, you have to write a separate API module which looks like this:

Looking at the Code

There is currently no IDE support for the Jigsaw available. Even the latest builds of Netbeans and IntelliJ do not recognize the module-info.java module descriptor file. Since the Java team decided to support multi-module builds with the javac compiler, they had to invent a complete new lookup mechanism: The module path which is separated from the normal classpath. This concept allows to use normal Jars as modules and vice versa. Jars which are located in the module path are called automatic modules. These are modules which can access all public parts of other modules without explicit declaration. This is done mostly to support smart and slow migration to the new module system without too much trouble for existing applications. At the time of writing, there was no support of Maven, Gradle or Ant. So we have to use shell scripts in this demonstration.

To use the Java 9 compiler multi-module build feature we have to make two subdirectories in the src directory: Mail and MailClient. The Mail directory contains the complete source code of the Mail module inclusive the private implementation which should be hidden from a caller. The MailClient directory contains the main class MailClient which uses methods from the public interface of the Mail module. The image shows the directory layout of our sample:

   ├── Mail 
   │    ├── de 
   │    │   └── qaware
   │    │       └── mail
   │    │            ├── MailSender.java
   │    │            ├── MailSenderFactory.java
   │    │            └── impl
   │    │                 └── MailSenderImpl.java
   │    └── module-info.java
   └── MailClient    
       ├── de    
       │   ├── qaware    
       │   └── mail    
       │         └── client                 
       │                └── MailClient.java    
       └── module-info.java

Both modules have the new module descriptor which is per convention in the root package and has the name „module-info.java“. The module descriptor is not a class, enum or interface. It uses the new keywords „module“, „exports“ and „requires“. The module descriptor for the mail and the client are quite simple:

module MailClient {
 requires Mail;

module Mail {
 exports de.qaware.mail;

In our example the code of client and mail implementation looks like this:

package de.qaware.mail.client;

import de.qaware.mail.MailSender;
import de.qaware.mail.MailSenderFactory;

public class MailClient {

 public static void main(String [] args) {
  MailSender mail = new MailSenderFactory().create();
                              "A message from JavaModule System");


package de.qaware.mail.impl;

import de.qaware.mail.MailSender;

public class MailSenderImpl implements MailSender {
 public void sendMail(String address, String message) {
  System.out.println("Sending mail to: " + address + 
                                          " message: " + message);

The Type MailSender and the MailSenderFactory are from the exported package de.qaware.mail of the module Mail. As you might expect, the MailSender Java interface has only one method: sendMail(). The real implementation is in the impl package which non exported in the Mail module and therefore invisible. If you try to access the MailSenderImpl class direct in the client code you get an compiler error:

../MailClient.java:9: error: MailSenderImpl is not visible because package de.qaware.mail.impl is not visible
1 error
Thats exactly what we want. Nobody can violate the access rules of our Mail module. Non exported packages are hidden.

Multi module building, packaging and running

The multi module build is a cool feature of the Jigsaw javac compiler. The compiler command for both modules in one step looks like this:

# compile
javac -d build -modulesourcepath src $(find src -name "*.java")

This command compiles all modules in the subpath of ./src and outputs the resulting classfiles in a identical structure in the folder ./build. So the content of the ./build folder can be simply packed in separate jar files: The jar command creates the module jar file from the compiled output. For the MailClient module we also specify the java main class for the next step.

jar --create --file mlib/Mail@1.0.jar --module-version 1.0 -C build/Mail .
jar --create --file mlib/MailClient@1.0.jar --module-version 1.0 --main-class de.qaware.mail.client.MailClient -C build/MailClient .

We can now run our modular application by using the module path where the generated jar files are stored. In our example this is the path ./mlib. In this path the both generated jar files are located. So we can now test our application with the following command:

# run
java -mp mlib -m MailClient


What does linking in Java mean? It means that only the required modules are packed together in a mini application together wich a platform depended starter script and a minified java runtime.

# link
jlink --modulepath $JAVA_HOME/jmods:mlib --addmods MailClient --output mailclient

So the application can be run direct without any knowledge of java or modules. The generated output directory looks like this:

├── bin
│   ├── MailClient
│   ├── java
│   └── keytool
├── conf
│   ├── net.properties
│   └── security
│       ├── java.policy
│       └── java.security
├── lib
│   ├── classlist
│   ├── jli
│   │   └── libjli.dylib
│   ├── jspawnhelper
│   ├── jvm.cfg
│   ├── libjava.dylib
│   ├── libjimage.dylib
│   ├── libjsig.diz
│   ├── libjsig.dylib
│   ├── libnet.dylib
│   ├── libnio.dylib
│   ├── libosxsecurity.dylib
│   ├── libverify.dylib
│   ├── libzip.dylib
│   ├── modules
│   │   └── bootmodules.jimage
│   ├── security
│   │   ├── US_export_policy.jar
│   │   ├── blacklist
│   │   ├── blacklisted.certs
│   │   ├── cacerts
│   │   ├── local_policy.jar
│   │   ├── trusted.libraries
│   │   └── unlimited_policy
│   │       ├── README.txt
│   │       ├── US_export_policy.jar
│   │       └── local_policy.jar
│   ├── server
│   │   ├── Xusage.txt
│   │   ├── libjsig.diz
│   │   ├── libjsig.dylib
│   │   ├── libjvm.diz
│   │   └── libjvm.dylib
│   └── tzdb.dat
└── release

The directory tree shows the complete minified runtime for the modular application. In the bin directory there is a generated MailClient starter which can be used to run the application direct without an explicit java command. The log directory contains only the required boot modules and native libraries. After linking you can start the application with the following command:

cd mailclient/bin 
Sending mail to: x@x.de message: A message from JavaModule System


Gratulation to Mark and his Team. You have done an awesome job. Since this feature is very invasive for tools and existing applications, it will take a long time until we can forget the classpath hell. For me personally it is the proof, that our design principles we used in the last 20 years of modular programming are finally arrived the java world as well. Thanks for that.

Aug 31, 2015

Exploratory Open Source Software Testing with Spock

Exploratory software testing is a technique every agile developer should know about. It's about test driving your application without a predetermined course of action. Although this seems like random free style testing, in the hands of an experienced developer this proves to be a powerful technique to find bugs and undesired behaviour.

But in this article I will talk about exploratory software testing of open source software components, libraries or whole frameworks. So why would you want to do this? Think about this: the amount of hand written code in any modern application is somewhere between 3 to 10 percent of the overall byte code instructions. The rest are usually 3rd party open source libraries and frameworks used by the application such as Apache Commons or the Spring Framework.

But how do you as a software developer or architect decide which open source component to use for a certain required functionality? How do you know that this fancy framework you read about in a programming magazine suites your requirements? How do you evaluate how a library is integrated best into your application?

This is when exploratory testing of open source software comes into play. In summary, the goals of exploratory testing of open source components are:
  • To gain an understanding of how the library or framework works, what its interface looks like, and what functionality it implements: The goal here is to explore the functionality of the open source component in-depth and to find new unexplored functionality.
  • To force the open source software to exhibit its capabilities: This will provide evidence that the software performs the function for which it was designed and that it satisfies your requirements.
  • To find bugs or analyse the performance: Exploring the edges of the open source component and hitting potential soft spots and weaknesses.
  • To act as a safeguard when upgrading the library to a new version: This allows for easier maintenance of your application and its dependencies. The exploratory test detect regressions a new version might introduce.

Scenario based software exploration

If you want to use a new open source component in your project and application you usually already have a rough vision of what you expect and want to gain from its usage. So the idea of scenario based software exploration is: describe your visions and expectations in the form of usage scenarios. These scenarios are your map of the uncharted terrain of the libraries' functionality. And the scenarios will guide you through the test process. In general, a useful scenario will do one or more of the following:
  • Tell a user story and describe the requirements
  • Demonstrate how a certain functionality works
  • Demonstrate an integration scenario
  • Describe cautions and things that could go wrong

Exploratory Testing with Spock

Of course you can write exploratory tests with more traditional xUnit frameworks like JUnit or TestNG. So why use Spock instead? I think the Spock Framework is way better suited to write exploratory tests because it supports the scenario based software exploration by its very nature: 
  • Specification as documentation
  • Reduced, beautiful and highly expressive specification language syntax
  • Support for Stubbing and Mocking
  • Good integration into IDEs and build tools
The following sections will showcase these points in more detail by using Spock to write exploratory tests for the Kryo serialization library.

Specification as documentation

The good thing about Spock is that it allows to use natural language in your specification classes. Have a look at the following example. Currently it does not test anything, it is pure documentation. Even if you do not know Spock at all, I think you can understand what the test is supposed to do just by reading the specification. Awesome.
@Title('Exploratory test for the shallow/deep copy functionality of Kryo')
   Making object copies in Java is an often required functionality.
   Writing explicit copy constructors is fast but also laborious. 
   Instead the Java Serialization is often misused to make copies. 
   Kryo performs fast automatic deep and shallow copying by copying 
   from object to object.
class KryoShallowAndDeepCopySpec extends Specification {

    def kryo = new Kryo()

    def "Making a shallow copy of a semi complex POJO"() {
        given: "a single semi complex POJO instance"

        when: "we make a shallow copy using Kryo"

        then: "the object is a copy, all nested instances are references"

    def "Making a deep copy of a semi complex POJO construct"() {
        given: "a semi complex POJO instance construct"

        when: "we make a deep copy using Kryo"

        then: "the object and all nested instances are copies"

Reduced, beatiful and highly expressive language syntax

The reduced syntax offered by Spock mainly comes from its Groovy nature. Exploratory tests really benefit from this because it helps you to focus on the important bits: the open source component you want to explore. In addition to this Spock brings along its own DSL to make your specification even more expressive. Every feature method in a specification is structured into so-called blocks.
see Spock Primer for more details
These blocks not only allow you to express the different phases of your test. By using these blocks you can demonstrate how an open source component works and how it can be integrated into your codebase. The setup: or given: block sets up the required input using classes from your application domain. The when: and then: blocks will exhibit how a certain functionality works by interacting with the test subject and asserting the desired behaviour. Again, due to the Groovy nature of Spock your assertions only need to evaluate to true or false. And for the last bit of expressiveness you can use your good old Hamcrest matchers.
def "Deserialize a GCustomer object from a temporary data file"() {
    given: "a Kryo input for a temporary data file"
    def input = new Input(new FileInputStream(temporaryFile))

    when: "we deserialize the object"
    def customer = kryo.readObject input, GCustomer

    then: "the customer POJO is initialized correctly"
    expect customer, notNullValue()

    customer.name == 'Mr. Spock'
    expect customer.name, equalTo('Mr. Spock')

Support for Stubbing and Mocking

The Spock Framework also brings its own support for Mocks and Stubs to provide the means for interaction based testing. This testing technique focuses on the behaviour of the object under test and helps us to explore how a component interacts with its collaborators, by calling methods on them, and how this influences the component's behavior. Being able to define mocks or stubs for every interface and almost any class also alleviates you from having to manually implement fake objects. Stubs only provide you with the ability to return predefined responses for defined interactions, whereas Mocks additionally provide you with the ability to verify the interactions.
def "Explore writing an object using a Serializer mock"() {
    given: "a Kryo Serializer mock and dummy output"
    def serializer = Mock(Serializer)
    def output = new Output(new byte[1])

    when: "serializing the string Mr. Spock"
    kryo.writeObject(output, 'Mr. Spock', serializer)

    then: "we expect 1 interaction with the serializer"
    1 * serializer.write(kryo, output, 'Mr. Spock')

def "Explore reading an object using a Serializer stub"() {
    given: "a dummy input and a Kryo Serializer stub"
    def input = new Input([1] as byte[])
    def serializr = Stub(Serializer)

    and: "a stubbed Customer response"
    serializr.read(kryo, _, GCustomer) >> new GCustomer(name: 'Mr. Spock')

    when: "deserializing the input"
    def customer = kryo.readObject(input, GCustomer, serializr)

    then: "we get Mr. Spock again"
    customer.name == 'Mr. Spock'

Good Integration into IDEs and Build Tools

A good IDE and build tool integration is an important feature, since we want our exploratory tests to be an integral part of our applications' codebase. Fortunately, the Spock support is already quite good, mainly due to the fact that Spock tests are essentially translated to JUnit tests. To get proper syntax high lighting and code completion you can install dedicated plugins for your favourite IDE. For IntelliJ there is the Spock Framework Enhancements Plugin and for Eclipse there is the Spock Plugin available.

The build tool integration is also pretty straight forward. If you are using Gradle for your build, the integration is only a matter of specifying the correct dependencies and applying the groovy plugin as shown in the follow snippet.
apply plugin: 'groovy'

dependencies {
    // mandatory dependencies for using Spock
    compile 'org.codehaus.groovy:groovy-all:2.4.1'
    testCompile 'org.spockframework:spock-core:1.0-groovy-2.4'
    testCompile 'junit:junit:4.12'
    testCompile 'org.mockito:mockito-all:1.10.19'

    // optional dependencies for using Spock
    // only necessary if Hamcrest matchers are used
    testCompile 'org.hamcrest:hamcrest-core:1.3' 
    // allows mocking of classes (in addition to interfaces)
    testRuntime 'cglib:cglib-nodep:3.1'
    // allows mocking of classes without default constructor
    testRuntime 'org.objenesis:objenesis:2.1'
For Maven you have to do a little more than just specifying the required Spock dependencies in your POM file. Because Spock tests are written in Groovy, you will also have to include the GMavenPlus Plugin into your build so that your Spock tests get compiled. You may also have to tweak the Surefire Plugin configuration to include **/*Spec.java as valid tests.
        <!-- Mandatory plugins for using Spock -->
        <!-- Optional plugins for using Spock -->

    <!-- Mandatory dependencies for using Spock -->

    <!-- Optional dependencies for using Spock -->
    <!-- only required if Hamcrest matchers are used -->
    <!-- enables mocking of classes (in addition to interfaces) -->
    <!-- enables mocking of classes without default constructor -->


That's all folks!